mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-04-22 23:49:23 +00:00
Add GitHub data (#637)
This commit is contained in:
39
github-data/pull_requests/162-IQ3_S_R4.md
Normal file
39
github-data/pull_requests/162-IQ3_S_R4.md
Normal file
@@ -0,0 +1,39 @@
|
||||
### 🔀 [#162](https://github.com/ikawrakow/ik_llama.cpp/pull/162) - IQ3_S_R4
|
||||
|
||||
| **Author** | `ikawrakow` |
|
||||
| :--- | :--- |
|
||||
| **State** | ❌ **Closed** |
|
||||
| **Created** | 2024-12-23 |
|
||||
| **Updated** | 2024-12-23 |
|
||||
|
||||
---
|
||||
|
||||
#### Description
|
||||
|
||||
Sub-4 bpw i-quants have a terrible CPU performance, so I was curious to see if we can improve by interleaving rows.
|
||||
|
||||
This PR adds `IQ3_S_R4`, a 4-row interleaved version of `IQ3_S`.
|
||||
|
||||
We get significant performance gains. Here is `PP-512` for LLaMA-3.1-8B on `Zen4` (Ryzen-7950X), `ARM_NEON` (M2-Max) and `AVX2` (Ryzen-5975WX)
|
||||
|
||||
| Platform | Threads | IQ3_S | IQ3_S_R4 | Speedup |
|
||||
| ---: | ---: | ---: | ---: | ---: |
|
||||
| ARM_NEON | 8 | 42.97 ± 1.28 | 80.61 ± 0.41 | 1.876 |
|
||||
| Zen4 | 16 | 104.66 ± 0.68 | 159.08 ± 0.57 | 1.520 |
|
||||
| AVX2 | 32 | 132.50 ± 0.37 | 231.41 ± 0.45 | 1.746 |
|
||||
|
||||
We get decent performance gains for TG as well, especially on `AVX2`.
|
||||
Here results for TG-128 on LLaMA-3.1-8B with different numbers of threads:
|
||||
|
||||
| Platform | Threads | IQ3_S | IQ3_S_R4 | Speedup |
|
||||
| ---: | ---: | ---: | ---: | ---: |
|
||||
| ARM_NEON | 2 | 3.00 ± 0.00 | 3.40 ± 0.00 | 1.133 |
|
||||
| | 4 | 5.74 ± 0.02 | 6.60 ± 0.01 | 1.150 |
|
||||
| | 8 | 9.25 ± 0.83 | 12.27 ± 0.33 | 1.326 |
|
||||
| Zen4 | 2 | 4.17 ± 0.00 | 4.38 ± 0.01 | 1.050 |
|
||||
| | 4 | 7.82 ± 0.05 | 8.14 ± 0.01 | 1.041 |
|
||||
| | 8 | 14.29 ± 0.02 | 14.41 ± 0.02 | 1.008 |
|
||||
| AVX2 | 2 | 1.98 ± 0.00 | 3.31 ± 0.00 | 1.672 |
|
||||
| | 4 | 3.87 ± 0.00 | 6.49 ± 0.00 | 1.677 |
|
||||
| | 8 | 7.13 ± 0.01 | 11.63 ± 0.02 | 1.631 |
|
||||
| | 16 | 12.97 ± 0.00 | 15.81 ± 0.00 | 1.219 |
|
||||
Reference in New Issue
Block a user