Files
ik_llama.cpp/github-data/pull_requests/122 - Q6_0_R4.md
2025-07-23 13:31:53 +02:00

682 B

🔀 #122 - Q6_0_R4

Author ikawrakow
State Closed
Created 2024-12-03
Updated 2024-12-03

Description

Follow up of #118, #119, #120, #121 for Q6_0.

Here is PP-512 for LLaMA-3.1-8B on Zen4 (Risen-7950X), ARM_NEON (M2-Max) and AVX2 (Ryzen-5975WX)

Platform Threads Q6_0 Q6_0_R4 Speedup
ARM_NEON 8 73.21 ± 1.10 94.96 ± 0.90 1.297
Zen4 16 159.04 ± 0.58 257.25 ± 0.26 1.638
AVX2 32 174.19 ± 0.58 231.53 ± 0.60 1.329