mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-01-26 17:20:01 +00:00
* iq4_ks 203 t/s -> 357 t/s. iq4_ks_r4 is 242 t/s. * iq4_k 175 t/s -> 353 t/s. iq4_k_r4 is 208 t/s. PPL is actually lower! * iq5_ks 180 t/s -> 359 t/s. iq5_ks_r4 is 210 t/s. PPL is actually lower - 7.4160 vs 7.4494 for LlaMA-3.1-8B-Instruct * iq5_k - accuracy loss is too big * iq5_k - there was a bug with the shifts ...and that's why PPL was so high. It is also high on main. This fixes it. * iq6_k 148 t/s -> 350 t/s. There is no iq6_k_r4 PPL is actually lower because we have a bug in the existing implementation! * iq3_k 169 t/s -> 363 t/s. iq3_k_r4 is at 200 t/s. * iq2_k 190 t/s -> 364 t/s. iq2_k_r4 is at 232 t/s. * iq2_ks 200 t/s -> 367 t/s. There is no iq2_ks_r4. --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>