mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-05-05 22:02:38 +00:00
Add GitHub data: filename sanitization (#640)
This commit is contained in:
@@ -0,0 +1,46 @@
|
||||
### 🔀 [#517](https://github.com/ikawrakow/ik_llama.cpp/pull/517) - IQ1_S: much faster CPU prompt processing
|
||||
|
||||
| **Author** | `ikawrakow` |
|
||||
| :--- | :--- |
|
||||
| **State** | ❌ **Closed** |
|
||||
| **Created** | 2025-06-11 |
|
||||
| **Updated** | 2025-06-11 |
|
||||
|
||||
---
|
||||
|
||||
#### Description
|
||||
|
||||
This PR is a follow up of #515 and #516, and applies the same technique to `IQ1_S`. We see nearly 2X increase in prompt processing speed compared to `IQ1_S` and `IQ1_S_R4.
|
||||
|
||||
Sweep-bench for `IQ1_S` quantization of LlaMA-3.1-8B on a Ryzen-7950X CPU:
|
||||
|
||||
### IQ1_S, main branch
|
||||
|
||||
| PP | TG | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s |
|
||||
|-------|--------|--------|----------|----------|----------|----------|
|
||||
| 512 | 128 | 0 | 3.272 | 156.47 | 4.605 | 27.79 |
|
||||
| 512 | 128 | 512 | 3.351 | 152.77 | 5.092 | 25.14 |
|
||||
| 512 | 128 | 1024 | 3.402 | 150.52 | 5.084 | 25.18 |
|
||||
| 512 | 128 | 1536 | 3.677 | 139.25 | 5.201 | 24.61 |
|
||||
| 512 | 128 | 2048 | 3.586 | 142.79 | 5.515 | 23.21 |
|
||||
|
||||
### IQ1_S_R4, main branch
|
||||
|
||||
| PP | TG | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s |
|
||||
|-------|--------|--------|----------|----------|----------|----------|
|
||||
| 512 | 128 | 0 | 3.101 | 165.10 | 4.543 | 28.18 |
|
||||
| 512 | 128 | 512 | 3.166 | 161.74 | 4.836 | 26.47 |
|
||||
| 512 | 128 | 1024 | 3.309 | 154.75 | 5.282 | 24.23 |
|
||||
| 512 | 128 | 1536 | 3.348 | 152.92 | 5.093 | 25.13 |
|
||||
| 512 | 128 | 2048 | 3.447 | 148.55 | 5.265 | 24.31 |
|
||||
|
||||
|
||||
### IQ1_S, PR
|
||||
|
||||
| PP | TG | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s |
|
||||
|-------|--------|--------|----------|----------|----------|----------|
|
||||
| 512 | 128 | 0 | 1.855 | 275.94 | 4.643 | 27.57 |
|
||||
| 512 | 128 | 512 | 1.940 | 263.87 | 5.056 | 25.32 |
|
||||
| 512 | 128 | 1024 | 2.188 | 234.05 | 5.099 | 25.10 |
|
||||
| 512 | 128 | 1536 | 2.097 | 244.20 | 5.112 | 25.04 |
|
||||
| 512 | 128 | 2048 | 2.184 | 234.42 | 5.368 | 23.85 |
|
||||
Reference in New Issue
Block a user