### 🔀 [#252](https://github.com/ikawrakow/ik_llama.cpp/pull/252) - MLA-2: Allow usage of q8_0 for KV cache on CUDA | **Author** | `ikawrakow` | | :--- | :--- | | **State** | ❌ **Closed** | | **Created** | 2025-03-12 | | **Updated** | 2025-03-12 | --- #### Description Performance is slightly lower than `f16` KV cache but not too bad.