ik_llama.cpp/github-data/pull_requests/171 - Fix lower FA performance for even batch sizes.md at 3afd0600a1cdcd4b53238bc46695e66ec378cdcb - ik_llama.cpp

ikawrakow/ik_llama.cpp

Fork 0

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-04-26 09:29:27 +00:00

Files

Thomas 0451f10a42 Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

1011 B

Raw Blame History

🐛 #171 - Fix lower FA performance for even batch sizes

Author	`ikawrakow`
State	❌ Closed
Created	2025-01-12
Updated	2025-01-12

Description

This PR fixes the lower performance for even batch sizes reported in #164. The graph shows a t/s comparison between the main branch and this PR using

./bin/llama-batched-bench -m some_model.gguf -pps -t 16 -npp 256 -ntg 128 -npl 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16 -c 4096 -rtr -fa

for LLaMA-3.1-8B-Instruct quantized with IQ4_XS on a Ryzen-7950X CPU. We see the strange zig zag behavior with FA enabled is no longer there. For fun I have also added the latest llama.cpp performance for this model on this CPU (llama.cpp build: 4465 (9a483999)). The performance difference for a batch size of 16 is a factor of 2.7X.

1011 B Raw Blame History

🐛 #171 - Fix lower FA performance for even batch sizes

Description

1011 B

Raw Blame History