Files
ik_llama.cpp/ggml
Kawrakow dfa6e2b5fa CUDA: faster IQ2_K, IQ2_KS, IQ2_K_R4 (#716)
* Use bperm trick for iq2_ks gemm -> 7% gain

* Use bperm trick for iq2_k gemm -> ~5% gain

* Use bperm trick for iq2_k_r4 gemm -> ~3% gain

* Use bperm trick for iq2_ks gemv -> ~7% gain

* Use bperm trick for iq2_k gemv -> ~3% gain

* Use bperm trick for iq2_k_r4 gemv -> ~7% gain

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2025-08-22 07:25:35 +03:00
..
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00