### 🔀 [#252](https://github.com/ikawrakow/ik_llama.cpp/pull/252) - MLA-2: Allow usage of q8_0 for KV cache  on CUDA

| **Author** | `ikawrakow` |
| :--- | :--- |
| **State** | ❌ **Closed** |
| **Created** | 2025-03-12 |
| **Updated** | 2025-03-12 |

---

#### Description

Performance is slightly lower than `f16` KV cache but not too bad.