Add more old PRs

2026-02-24 07:04:11 +00:00 · 2025-06-12 11:52:25 -05:00
parent 77ad5bbed1
commit ae1c06df66
1 changed files with 5 additions and 5 deletions
--- a/README.md
+++ b/README.md
@@ -54,11 +54,11 @@ Information and the original CUDA implementation in [PR 113](https://github.com/
 * May 12 2025: User can now control if/which operations with tensors held in RAM are offloaded to the GPU. See [PR 405](https://github.com/ikawrakow/ik_llama.cpp/pull/405) 
 * May 12 2025: Compatibility issues with mainline `llama.cpp` GGUFs for DeepSeek models with MLA enabled were resolved in [PR 394](https://github.com/ikawrakow/ik_llama.cpp/pull/394). The lower prompt processing performance resulting from using `llama.cpp`-style MLA GGUFs was recovered in [PR 409](https://github.com/ikawrakow/ik_llama.cpp/pull/409).
 * April 21 2025: ik_llama.cpp builds and runs successfully on Android (using termux), see [PR 336](https://github.com/ikawrakow/ik_llama.cpp/pull/336)
-* March 1 2025: Smart Expert Reduction for faster DeepSeek inference 
-* Feb 25 2025: Tensor overrides for better control where model weights are stored (GPU or CPU)
-* Feb 23 2025: `sweep-bench` - better performance benchmarking
-* Feb 19 2025: `Q8_KV` - new type for 8-bit KV-cache quantization
-* March 7 2025: Custom quantization mixes using regular expressions
+* March 1 2025: Smart Expert Reduction for faster DeepSeek inference [PR 239](https://github.com/ikawrakow/ik_llama.cpp/pull/239) 
+* Feb 25 2025: Tensor overrides for better control where model weights are stored (GPU or CPU) [PR 232](https://github.com/ikawrakow/ik_llama.cpp/pull/232)
+* Feb 23 2025: `sweep-bench` - better performance benchmarking [PR 225](https://github.com/ikawrakow/ik_llama.cpp/pull/225)
+* Feb 19 2025: `Q8_KV` - new type for 8-bit KV-cache quantization [PR 208](https://github.com/ikawrakow/ik_llama.cpp/pull/208)
+* March 7 2025: Custom quantization mixes using regular expressions [PR 244](https://github.com/ikawrakow/ik_llama.cpp/pull/244)

 ### Performance improvements