Files
ik_llama.cpp/ggml
Kawrakow dbf5d31d01 Better BF16 support on AVX2 (#175)
* Adding BF16 support for AVX2

PP performance is the same as fp16 (~153 t/s on Ryzen-5975WX),
but TG is quite a bit lower (3.65 t/s vs 4.72 t/s at 8 threads).
Why?

* Slightly faster fp16/bf16 gemv on AVX2

It still saturates at the same lower peformance for bf16

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2025-01-22 12:13:55 +02:00
..
2024-07-27 07:55:01 +02:00
2025-01-22 12:13:55 +02:00
2024-07-27 07:55:01 +02:00
2024-10-04 14:43:26 +03:00