Files
ik_llama.cpp/ggml
Iwan Kawrakow bfe625f6ea iq3_k: AVX512 iqk_mul_mat
We get PP-512 = 180 t/s, TG-128(4 threads) = 16.35 on the Ryzen-7950X
for LLaMA-3.1-8B.
In comparison, iq3_s has PP-512 = 96 t/s, TG-128 = 7.6 t/s with
iqk_mul_mat, and PP-512 = 28 t/s, TG-128 = 6.8 t/s in mainline llama.cpp
2024-07-30 18:40:10 +03:00
..
2024-07-27 07:55:01 +02:00
2024-07-30 16:11:25 +03:00
2024-07-30 18:40:10 +03:00
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00