ik_llama.cpp/150 - IQ4_KS_R4.md at main - ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-01-26 17:20:01 +00:00

Files

Thomas eaa2510a28 Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

Adding IQ4_KS with 4 interleaved rows.

We get very signifiant performance gains on ARM_NEON and good gains on AVX2/Zen4.

Here is PP-512 for LLaMA-3.1-8B on Zen4 (Ryzen-7950X), ARM_NEON (M2-Max) and AVX2 (Ryzen-5975WX)

Platform	Threads	IQ4_KS	IQ4_KS_R4	Speedup
ARM_NEON	8	67.29 ± 1.02	124.91 ± 0.62	1.856
Zen4	16	180.42 ± 0.68	266.05 ± 0.45	1.475
AVX2	32	201.79 ± 0.48	245.37 ± 0.52	1.216

We get decent performance gains for TG as well. Here results for TG-128 on LLaMA-3.1-8B with different numbers of threads:

Platform	Threads	IQ4_KS	IQ4_KS_R4	Speedup
ARM_NEON	2	10.84 ± 0.01	12.55 ± 0.00	1.158
	4	19.81 ± 0.12	22.06 ± 0.06	1.114
	8	25.74 ± 0.47	26.47 ± 0.21	1.039
Zen4	1	6.18 ± 0.00	7.97 ± 0.11	1.290
	2	11.73 ± 0.02	13.43 ± 0.00	1.145
	4	13.09 ± 1.13	14.46 ± 0.00	1.105
AVX2	2	4.74 ± 0.00	7.30 ± 0.00	1.540
	4	8.75 ± 0.00	11.39 ± 0.00	1.302
	8	12.38 ± 0.01	12.73 ± 0.00	1.028