Files
ik_llama.cpp/ggml
Kawrakow 57d283de02 Much faster CPU prompt processing (part 2) (#533)
* iq4_ks

203 t/s -> 357 t/s. iq4_ks_r4 is 242 t/s.

* iq4_k

175 t/s -> 353 t/s. iq4_k_r4 is 208 t/s.

PPL is actually lower!

* iq5_ks

180 t/s -> 359 t/s. iq5_ks_r4 is 210 t/s.

PPL is actually lower - 7.4160 vs 7.4494 for LlaMA-3.1-8B-Instruct

* iq5_k - accuracy loss is too big

* iq5_k - there was a bug with the shifts

...and that's why PPL was so high. It is also high on main.
This fixes it.

* iq6_k

148 t/s -> 350 t/s. There is no iq6_k_r4

PPL is actually lower because we have a bug in the existing
implementation!

* iq3_k

169 t/s -> 363 t/s. iq3_k_r4 is at 200 t/s.

* iq2_k

190 t/s -> 364 t/s. iq2_k_r4 is at 232 t/s.

* iq2_ks

200 t/s -> 367 t/s. There is no iq2_ks_r4.

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2025-06-18 07:29:33 +03:00
..
2024-07-27 07:55:01 +02:00
2025-06-08 17:27:00 +03:00
2024-07-27 07:55:01 +02:00