From ed17aa101584ca0c11af2dc25641b007d2adc525 Mon Sep 17 00:00:00 2001 From: Saood Karim Date: Wed, 18 Jun 2025 12:35:55 -0500 Subject: [PATCH] move thing --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 718c029e..5a51b156 100644 --- a/README.md +++ b/README.md @@ -76,6 +76,7 @@ Cuda implementations: `IQ4_KS_R4` and `IQ5_KS_R4` [PR 493](https://github.com/i ### Flash-MLA +* May 7 2025: 🚀 FlashMLA-3 for DeepSeek models on CUDA. [PR 386](https://github.com/ikawrakow/ik_llama.cpp/pull/386). Caveat: Ampere or newer Nvidia GPU required * March 21 2025: 🚀 FlashMLA-3: fastest CPU-only inference for DeepSeek models [PR 273](https://github.com/ikawrakow/ik_llama.cpp/pull/273) * March 17 2025: 🚀 FlashMLA-2 performance improvements [PR 253](https://github.com/ikawrakow/ik_llama.cpp/pull/253) * March 12 2025: Allow `Q8_0` KV cache with FlashMLA-2 on CUDA [PR 265](https://github.com/ikawrakow/ik_llama.cpp/pull/265)