Files
ik_llama.cpp/github-data/pull_requests/81 - Cleanup scale fudge factors.md
2025-07-23 13:31:53 +02:00

13 lines
736 B
Markdown

### 🔀 [#81](https://github.com/ikawrakow/ik_llama.cpp/pull/81) - Cleanup scale fudge factors
| **Author** | `ikawrakow` |
| :--- | :--- |
| **State** | ❌ **Closed** |
| **Created** | 2024-10-04 |
| **Updated** | 2024-10-04 |
---
#### Description
Low-bit quants often benefit from a fudge factor applied to the (super-)block scale. When I was developing `IQ2_K` and `IQ3_K` it was faster to change the fudge factor in `ggml-cuda/convert.cu` and recompile than to change it in the quantization function and re-quantize. But when I was ready, I forgot to move the `IQ2_K` and `IQ3_K` fudge factors to quantization, so they remained in the CUDA dequantization function (and hence weren't applied anywhere else). This PR fixes this.