Files
ik_llama.cpp/ggml
Iwan Kawrakow d1b4b34a79 q4_k
58.2 t/s -> 114.8 t/s. iq4_k_r4 is at 130.9 t/s.

As I had to add a new implementation for q8_1-quantized
activations, TG became slightly faster too
(25.1 -> 25.9 t/s).
2025-06-24 10:16:04 +02:00
..
2024-07-27 07:55:01 +02:00
2025-06-08 17:27:00 +03:00
2025-06-24 10:16:04 +02:00
2024-07-27 07:55:01 +02:00