mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-01-26 17:20:01 +00:00
693 B
693 B
🔀 #120 - Q8_0_R4
| Author | ikawrakow |
|---|---|
| State | ❌ Closed |
| Created | 2024-12-03 |
| Updated | 2024-12-03 |
Description
Following PR #118, #119: Q8_0 repacked with 4 interleaved rows.
PP-512 for LLaMA-3.1-8B for ARM_NEON (M2-Max), Zen4 (Ryzen-7950X) and AVX2 (Risen-5975WX):
| Platform | Threads | Q8_0 | Q8_0_R4 | Speedup |
|---|---|---|---|---|
| ARM_NEON | 8 | 83.69 ± 1.53 | 112.95 ± 0.17 | 1.350 |
| Zen4 | 16 | 175.61 ± 0.71 | 268.98 ± 0.31 | 1.532 |
| AVX2 | 32 | 213.95 ± 0.44 | 234.40 ± 0.60 | 1.096 |