Files
ik_llama.cpp/github-data/pull_requests/186 - iq1_s_r4_ slightly faster NEON gemm_gemv.md
2025-07-23 13:31:53 +02:00

896 B

🔀 #186 - iq1_s_r4: slightly faster NEON gemm/gemv

Author ikawrakow
State Closed
Created 2025-02-05
Updated 2025-02-05

Description

DeepSeek-Lite on M2-Max CPU:

model threads test t/s (main) t/s (PR) Speedup
deepseek2 16B IQ1_S_R4 2 tg128 22.76 ± 0.15 24.07 ± 0.19 1.058
deepseek2 16B IQ1_S_R4 4 tg128 37.83 ± 0.00 39.58 ± 0.02 1.046
deepseek2 16B IQ1_S_R4 8 tg128 62.01 ± 0.02 65.26 ± 0.82 1.052
deepseek2 16B IQ1_S_R4 8 pp512 251.97 ± 0.09 283.20 ± 0.54 1.124