ik_llama.cpp/github-data/pull_requests/39 - Add support for bf16 to iqk_mul_mat.md at 993cb00a347fc77632b73126f614092d659727de - ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-04-28 10:21:48 +00:00

Files

Thomas eaa2510a28 Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

Only when natively supported (e.g., Zen4), else left to ggml to handle.

For LLaMA-3.1-8B we get PP512 = 205 t/s vs 74 t/s in llama.cpp on my Ryzen-7950X CPU.

I get 204 t/s with llamafile, so I guess Justine Tunney has not contributed the more recent tinyBLAS improvements to llama.cpp.