From ae1c06df663d1c2ecb6703f6f10f7068ff7d85e4 Mon Sep 17 00:00:00 2001 From: Saood Karim Date: Thu, 12 Jun 2025 11:52:25 -0500 Subject: [PATCH] Add more old PRs --- README.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 5e8e6436..0ec2c2d6 100644 --- a/README.md +++ b/README.md @@ -54,11 +54,11 @@ Information and the original CUDA implementation in [PR 113](https://github.com/ * May 12 2025: User can now control if/which operations with tensors held in RAM are offloaded to the GPU. See [PR 405](https://github.com/ikawrakow/ik_llama.cpp/pull/405) * May 12 2025: Compatibility issues with mainline `llama.cpp` GGUFs for DeepSeek models with MLA enabled were resolved in [PR 394](https://github.com/ikawrakow/ik_llama.cpp/pull/394). The lower prompt processing performance resulting from using `llama.cpp`-style MLA GGUFs was recovered in [PR 409](https://github.com/ikawrakow/ik_llama.cpp/pull/409). * April 21 2025: ik_llama.cpp builds and runs successfully on Android (using termux), see [PR 336](https://github.com/ikawrakow/ik_llama.cpp/pull/336) -* March 1 2025: Smart Expert Reduction for faster DeepSeek inference -* Feb 25 2025: Tensor overrides for better control where model weights are stored (GPU or CPU) -* Feb 23 2025: `sweep-bench` - better performance benchmarking -* Feb 19 2025: `Q8_KV` - new type for 8-bit KV-cache quantization -* March 7 2025: Custom quantization mixes using regular expressions +* March 1 2025: Smart Expert Reduction for faster DeepSeek inference [PR 239](https://github.com/ikawrakow/ik_llama.cpp/pull/239) +* Feb 25 2025: Tensor overrides for better control where model weights are stored (GPU or CPU) [PR 232](https://github.com/ikawrakow/ik_llama.cpp/pull/232) +* Feb 23 2025: `sweep-bench` - better performance benchmarking [PR 225](https://github.com/ikawrakow/ik_llama.cpp/pull/225) +* Feb 19 2025: `Q8_KV` - new type for 8-bit KV-cache quantization [PR 208](https://github.com/ikawrakow/ik_llama.cpp/pull/208) +* March 7 2025: Custom quantization mixes using regular expressions [PR 244](https://github.com/ikawrakow/ik_llama.cpp/pull/244) ### Performance improvements