Files
ik_llama.cpp/ggml
Kawrakow c5f58e0270 CUDA: faster IQ3_K, IQ3_KS, IQ3_K_R4 (#714)
* Use bperm trick for iq3_ks - 5% PP performance gain

* Use bperm trick for iq3_k -> 5% PP performance gain

* Use bperm trick for iq3_k -> 8% PP performance gain

* Use bperm trick for iq3_k_r4 gemv -> ~5% faster

* Use bperm trick for iq3_k gemv -> ~3% faster

* Use bperm trick for iq3_k gemv -> 4.5% gain

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2025-08-21 19:08:57 +03:00
..
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00