### 🔀 [#517](https://github.com/ikawrakow/ik_llama.cpp/pull/517) - IQ1_S: much faster CPU prompt processing | **Author** | `ikawrakow` | | :--- | :--- | | **State** | ❌ **Closed** | | **Created** | 2025-06-11 | | **Updated** | 2025-06-11 | --- #### Description This PR is a follow up of #515 and #516, and applies the same technique to `IQ1_S`. We see nearly 2X increase in prompt processing speed compared to `IQ1_S` and `IQ1_S_R4. Sweep-bench for `IQ1_S` quantization of LlaMA-3.1-8B on a Ryzen-7950X CPU: ### IQ1_S, main branch | PP | TG | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s | |-------|--------|--------|----------|----------|----------|----------| | 512 | 128 | 0 | 3.272 | 156.47 | 4.605 | 27.79 | | 512 | 128 | 512 | 3.351 | 152.77 | 5.092 | 25.14 | | 512 | 128 | 1024 | 3.402 | 150.52 | 5.084 | 25.18 | | 512 | 128 | 1536 | 3.677 | 139.25 | 5.201 | 24.61 | | 512 | 128 | 2048 | 3.586 | 142.79 | 5.515 | 23.21 | ### IQ1_S_R4, main branch | PP | TG | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s | |-------|--------|--------|----------|----------|----------|----------| | 512 | 128 | 0 | 3.101 | 165.10 | 4.543 | 28.18 | | 512 | 128 | 512 | 3.166 | 161.74 | 4.836 | 26.47 | | 512 | 128 | 1024 | 3.309 | 154.75 | 5.282 | 24.23 | | 512 | 128 | 1536 | 3.348 | 152.92 | 5.093 | 25.13 | | 512 | 128 | 2048 | 3.447 | 148.55 | 5.265 | 24.31 | ### IQ1_S, PR | PP | TG | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s | |-------|--------|--------|----------|----------|----------|----------| | 512 | 128 | 0 | 1.855 | 275.94 | 4.643 | 27.57 | | 512 | 128 | 512 | 1.940 | 263.87 | 5.056 | 25.32 | | 512 | 128 | 1024 | 2.188 | 234.05 | 5.099 | 25.10 | | 512 | 128 | 1536 | 2.097 | 244.20 | 5.112 | 25.04 | | 512 | 128 | 2048 | 2.184 | 234.42 | 5.368 | 23.85 |