ik_llama.cpp/134 - Q3_K_R4.md at 993cb00a347fc77632b73126f614092d659727de - ik_llama.cpp

ikawrakow/ik_llama.cpp

Fork 0

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-02-02 20:48:03 +00:00

Files

Thomas eaa2510a28 Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

1.4 KiB

Raw Blame History

🔀 #134 - Q3_K_R4

Author	`ikawrakow`
State	❌ Closed
Created	2024-12-11
Updated	2024-12-11

Description

Follow up of #118, #119, #120, #121, #122, #123, #129, #130, #132 for Q3_K.

We get a massive speedup on ARM_NEON and non-negligible gains on AVX2/Zen4. Here is PP-512 for LLaMA-3.1-8B on Zen4 (Ryzen-7950X), ARM_NEON (M2-Max) and AVX2 (Ryzen-5975WX)

Platform	Threads	Q3_K	Q3_K_R4	Speedup
ARM_NEON	8	55.42 ± 1.00	106.89 ± 1.14	1.929
Zen4	16	193.89 ± 0.43	236.77 ± 0.35	1.221
AVX2	32	199.22 ± 0.41	262.34 ± 0.50	1.317

On AVX2/Zen4 we gain even for TG. Here results for TG-128 on LLaMA-3.1-8B with different numbers of threads:

Platform	Threads	Q3_K	Q3_K_R4	Speedup
Zen4	1	5.47 ± 0.01	6.78 ± 0.00	1.239
	2	10.25 ± 0.00	12.46 ± 0.00	1.216
	4	15.21 ± 0.59	17.02 ± 0.09	1.119
AVX2	2	5.02 ± 0.01	8.21 ± 0.00	1.635
	4	9.33 ± 0.00	13.67 ± 0.00	1.465
	8	14.85 ± 0.02	16.67 ± 0.00	1.123

1.4 KiB Raw Blame History

🔀 #134 - Q3_K_R4

Description

1.4 KiB

Raw Blame History