mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-05-03 12:51:53 +00:00
Add GitHub data (#637)
This commit is contained in:
17
github-data/pull_requests/391-Fix DeepSeek q8_0 cache.md
Normal file
17
github-data/pull_requests/391-Fix DeepSeek q8_0 cache.md
Normal file
@@ -0,0 +1,17 @@
|
||||
### 🐛 [#391](https://github.com/ikawrakow/ik_llama.cpp/pull/391) - Fix DeepSeek q8_0 cache
|
||||
|
||||
| **Author** | `ikawrakow` |
|
||||
| :--- | :--- |
|
||||
| **State** | ❌ **Closed** |
|
||||
| **Created** | 2025-05-07 |
|
||||
| **Updated** | 2025-05-07 |
|
||||
|
||||
---
|
||||
|
||||
#### Description
|
||||
|
||||
Nobody has used `ik_llama.cpp` with a DeepSeek model and `Q8_0` KV cache since PR #351?
|
||||
|
||||
This PR fixes the assert one gets when one tries to use a DeepSeek model on the CPU using `Q8_0` KV cache.
|
||||
|
||||
Also, it seems the optimization I added in #351 to repack the `K` cache to `Q8_0_R8` seems to lower TG performance for DeepSeek models, so disabling it.
|
||||
Reference in New Issue
Block a user