mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-04-25 17:09:22 +00:00
58.2 t/s -> 114.8 t/s. iq4_k_r4 is at 130.9 t/s. As I had to add a new implementation for q8_1-quantized activations, TG became slightly faster too (25.1 -> 25.9 t/s).