mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-04-26 09:29:27 +00:00
1011 B
1011 B
🐛 #171 - Fix lower FA performance for even batch sizes
| Author | ikawrakow |
|---|---|
| State | ❌ Closed |
| Created | 2025-01-12 |
| Updated | 2025-01-12 |
Description
This PR fixes the lower performance for even batch sizes reported in #164. The graph shows a t/s comparison between the main branch and this PR using
./bin/llama-batched-bench -m some_model.gguf -pps -t 16 -npp 256 -ntg 128 -npl 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16 -c 4096 -rtr -fa
for LLaMA-3.1-8B-Instruct quantized with IQ4_XS on a Ryzen-7950X CPU. We see the strange zig zag behavior with FA enabled is no longer there. For fun I have also added the latest llama.cpp performance for this model on this CPU (llama.cpp build: 4465 (9a483999)). The performance difference for a batch size of 16 is a factor of 2.7X.