mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-01-26 09:09:50 +00:00
912 B
912 B
🐛 #213 - Fix NEON gemm/gemv for legacy quants when row size is not divisible by 128
| Author | ikawrakow |
|---|---|
| State | ❌ Closed |
| Created | 2025-02-20 |
| Updated | 2025-02-20 |
Description
I have broken it quite a while ago when I changed the NEON implementation to do two rows at a time. I haven't noticed as all models I typically use have row sizes that are multiple of 128. But as I was working on the IQ1_S NEON implementation for PR #212, I was testing with DeepSeek-Lite (where K cache row size is 576, so not divisible by 128), using Q8_0 for K cache (but no FA, where it works), and was getting NaNs or gibberish. I lost so much time until I finally realized that the issue is with the K cache Q8_0 matrix multiplication rather than my IQ1_S implementation.
This PR fixes this.