mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-02-10 16:30:12 +00:00
* Adding BF16 support for AVX2 PP performance is the same as fp16 (~153 t/s on Ryzen-5975WX), but TG is quite a bit lower (3.65 t/s vs 4.72 t/s at 8 threads). Why? * Slightly faster fp16/bf16 gemv on AVX2 It still saturates at the same lower peformance for bf16 --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>