Files
ik_llama.cpp/github-data/pull_requests/120 - Q8_0_R4.md
2025-07-23 13:31:53 +02:00

693 B

🔀 #120 - Q8_0_R4

Author ikawrakow
State Closed
Created 2024-12-03
Updated 2024-12-03

Description

Following PR #118, #119: Q8_0 repacked with 4 interleaved rows.

PP-512 for LLaMA-3.1-8B for ARM_NEON (M2-Max), Zen4 (Ryzen-7950X) and AVX2 (Risen-5975WX):

Platform Threads Q8_0 Q8_0_R4 Speedup
ARM_NEON 8 83.69 ± 1.53 112.95 ± 0.17 1.350
Zen4 16 175.61 ± 0.71 268.98 ± 0.31 1.532
AVX2 32 213.95 ± 0.44 234.40 ± 0.60 1.096