Files
ik_llama.cpp/ggml
Iwan Kawrakow d89c88e8df iq4_k: NEON implementation
For LLaMA-3.1-8B we get PP-512 = 60.7 t/s, TG-128 = 25.0 t/s
on the M2-Max. TG is on par with q4_K_S, PP is ~10% slower.
2024-07-28 08:36:20 +02:00
..
2024-07-27 07:55:01 +02:00
2024-07-27 17:05:31 +03:00
2024-07-28 08:36:20 +02:00
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00