Files
ik_llama.cpp/ggml
Iwan Kawrakow 780929a6d0 iq4_knn: ARM_NEON
Pretty good performance - on M2-Max we get
PP-512(LLaMA-3.1-8B) = 89.5 t/s
TG-128(LLaMA-3.1-8B) = 27.65 t/s
2024-10-18 10:47:59 +02:00
..
2024-07-27 07:55:01 +02:00
2024-10-18 11:46:28 +03:00
2024-10-18 10:47:59 +02:00
2024-07-27 07:55:01 +02:00
2024-10-04 14:43:26 +03:00