Files
ik_llama.cpp/ggml
Iwan Kawrakow 6d6d12fc86 q8_k_r8: AVX2
I was worried that we don't have enough vector registrers on
AVX2, but it looks like it handles it just fine. We get
PP-512(LLaMA-3.1-8B) = 354 t/s on a Ryzen-5975WX.
Slightly slower than the Zen4 version with double the threads,
but still a huge upgrade compared to Q8_0_R4.
2024-12-13 18:55:14 +02:00
..
2024-07-27 07:55:01 +02:00
2024-12-13 18:55:14 +02:00
2024-07-27 07:55:01 +02:00
2024-10-04 14:43:26 +03:00