mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-02-24 07:04:11 +00:00
Add more old PRs
This commit is contained in:
10
README.md
10
README.md
@@ -54,11 +54,11 @@ Information and the original CUDA implementation in [PR 113](https://github.com/
|
||||
* May 12 2025: User can now control if/which operations with tensors held in RAM are offloaded to the GPU. See [PR 405](https://github.com/ikawrakow/ik_llama.cpp/pull/405)
|
||||
* May 12 2025: Compatibility issues with mainline `llama.cpp` GGUFs for DeepSeek models with MLA enabled were resolved in [PR 394](https://github.com/ikawrakow/ik_llama.cpp/pull/394). The lower prompt processing performance resulting from using `llama.cpp`-style MLA GGUFs was recovered in [PR 409](https://github.com/ikawrakow/ik_llama.cpp/pull/409).
|
||||
* April 21 2025: ik_llama.cpp builds and runs successfully on Android (using termux), see [PR 336](https://github.com/ikawrakow/ik_llama.cpp/pull/336)
|
||||
* March 1 2025: Smart Expert Reduction for faster DeepSeek inference
|
||||
* Feb 25 2025: Tensor overrides for better control where model weights are stored (GPU or CPU)
|
||||
* Feb 23 2025: `sweep-bench` - better performance benchmarking
|
||||
* Feb 19 2025: `Q8_KV` - new type for 8-bit KV-cache quantization
|
||||
* March 7 2025: Custom quantization mixes using regular expressions
|
||||
* March 1 2025: Smart Expert Reduction for faster DeepSeek inference [PR 239](https://github.com/ikawrakow/ik_llama.cpp/pull/239)
|
||||
* Feb 25 2025: Tensor overrides for better control where model weights are stored (GPU or CPU) [PR 232](https://github.com/ikawrakow/ik_llama.cpp/pull/232)
|
||||
* Feb 23 2025: `sweep-bench` - better performance benchmarking [PR 225](https://github.com/ikawrakow/ik_llama.cpp/pull/225)
|
||||
* Feb 19 2025: `Q8_KV` - new type for 8-bit KV-cache quantization [PR 208](https://github.com/ikawrakow/ik_llama.cpp/pull/208)
|
||||
* March 7 2025: Custom quantization mixes using regular expressions [PR 244](https://github.com/ikawrakow/ik_llama.cpp/pull/244)
|
||||
|
||||
### Performance improvements
|
||||
|
||||
|
||||
Reference in New Issue
Block a user