ik_llama.cpp/54 - Improve Q4_0 and Q8_0 performance on AVX2_Zen4.md at main - ik_llama.cpp

ikawrakow/ik_llama.cpp

Fork 0

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-01-26 09:09:50 +00:00

Files

Thomas eaa2510a28 Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

1.6 KiB

Raw Permalink Blame History

🔀 #54 - Improve Q4_0 and Q8_0 performance on AVX2/Zen4

Author	`ikawrakow`
State	❌ Closed
Created	2024-09-14
Updated	2024-09-14

Description

This PR improves Q4_0 and Q8_0 performance on AVX2 and Zen4. The table shows comparisons to llama.cpp for LLaMA-3.1-8B on a Ryzen-7950X (Zen4) and a Ryzen-5975WX (AVX2) CPU.

model	backend	threads	test	t/s (llama.cpp)	t/s (PR)	Speedup
llama 8B Q4_0	Zen4	16	pp512	123.46 ± 0.09	165.26 ± 0.54	1.339
llama 8B Q8_0	Zen4	16	pp512	141.30 ± 0.86	169.26 ± 0.57	1.200
llama 8B Q4_0	Zen4	4	tg128	11.25 ± 0.02	13.88 ± 0.01	1.234
llama 8B Q8_0	Zen4	4	tg128	7.56 ± 0.01	7.79 ± 0.02	1.030
llama 8B Q4_0	AVX2	32	pp512	139.09 ± 0.62	212.70 ± 0.82	1.529
llama 8B Q8_0	AVX2	32	pp512	162.21 ± 0.42	217.14 ± 0.65	1.339
llama 8B Q4_0	AVX2	8	tg128	11.90 ± 0.00	11.99 ± 0.00	1.008
llama 8B Q8_0	AVX2	8	tg128	8.13 ± 0.00	8.21 ± 0.00	1.010

1.6 KiB Raw Permalink Blame History

🔀 #54 - Improve Q4_0 and Q8_0 performance on AVX2/Zen4

Description

1.6 KiB

Raw Permalink Blame History