mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-03-09 21:40:22 +00:00
I was worried that we don't have enough vector registrers on AVX2, but it looks like it handles it just fine. We get PP-512(LLaMA-3.1-8B) = 354 t/s on a Ryzen-5975WX. Slightly slower than the Zen4 version with double the threads, but still a huge upgrade compared to Q8_0_R4.