Files
ik_llama.cpp/ggml
Iwan Kawrakow db87f766e8 iq4_k: AVX2 implementation
For LLaMA-3.1-8B we get PP-512 = 203.1 t/s, TG-128 = 12.9 t/s
on the Ryzen-5975X.
2024-07-27 21:10:22 +03:00
..
2024-07-27 07:55:01 +02:00
2024-07-27 17:05:31 +03:00
2024-07-27 21:10:22 +03:00
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00