mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-05-01 11:51:53 +00:00
46 lines
2.1 KiB
Markdown
46 lines
2.1 KiB
Markdown
### 🔀 [#517](https://github.com/ikawrakow/ik_llama.cpp/pull/517) - IQ1_S: much faster CPU prompt processing
|
|
|
|
| **Author** | `ikawrakow` |
|
|
| :--- | :--- |
|
|
| **State** | ❌ **Closed** |
|
|
| **Created** | 2025-06-11 |
|
|
| **Updated** | 2025-06-11 |
|
|
|
|
---
|
|
|
|
#### Description
|
|
|
|
This PR is a follow up of #515 and #516, and applies the same technique to `IQ1_S`. We see nearly 2X increase in prompt processing speed compared to `IQ1_S` and `IQ1_S_R4.
|
|
|
|
Sweep-bench for `IQ1_S` quantization of LlaMA-3.1-8B on a Ryzen-7950X CPU:
|
|
|
|
### IQ1_S, main branch
|
|
|
|
| PP | TG | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s |
|
|
|-------|--------|--------|----------|----------|----------|----------|
|
|
| 512 | 128 | 0 | 3.272 | 156.47 | 4.605 | 27.79 |
|
|
| 512 | 128 | 512 | 3.351 | 152.77 | 5.092 | 25.14 |
|
|
| 512 | 128 | 1024 | 3.402 | 150.52 | 5.084 | 25.18 |
|
|
| 512 | 128 | 1536 | 3.677 | 139.25 | 5.201 | 24.61 |
|
|
| 512 | 128 | 2048 | 3.586 | 142.79 | 5.515 | 23.21 |
|
|
|
|
### IQ1_S_R4, main branch
|
|
|
|
| PP | TG | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s |
|
|
|-------|--------|--------|----------|----------|----------|----------|
|
|
| 512 | 128 | 0 | 3.101 | 165.10 | 4.543 | 28.18 |
|
|
| 512 | 128 | 512 | 3.166 | 161.74 | 4.836 | 26.47 |
|
|
| 512 | 128 | 1024 | 3.309 | 154.75 | 5.282 | 24.23 |
|
|
| 512 | 128 | 1536 | 3.348 | 152.92 | 5.093 | 25.13 |
|
|
| 512 | 128 | 2048 | 3.447 | 148.55 | 5.265 | 24.31 |
|
|
|
|
|
|
### IQ1_S, PR
|
|
|
|
| PP | TG | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s |
|
|
|-------|--------|--------|----------|----------|----------|----------|
|
|
| 512 | 128 | 0 | 1.855 | 275.94 | 4.643 | 27.57 |
|
|
| 512 | 128 | 512 | 1.940 | 263.87 | 5.056 | 25.32 |
|
|
| 512 | 128 | 1024 | 2.188 | 234.05 | 5.099 | 25.10 |
|
|
| 512 | 128 | 1536 | 2.097 | 244.20 | 5.112 | 25.04 |
|
|
| 512 | 128 | 2048 | 2.184 | 234.42 | 5.368 | 23.85 | |