Files
ik_llama.cpp/ggml
Kawrakow 89728ab03c CUDA implementation for IQ2_K_R4, IQ3_K_R4, IQ4_K_R4, IQ5_K_R4 (#461)
* CUDA: iq4_k_r4 dequantize

* CUDA: iq4_k_r4 GEMV

~10% slower than iq4_k.

* CUDA: slightly faster iq4_k_r4 GEMV

* CUDA: slightly faster iq4_k_r4 GEMV

We are now within 3% of iq4_k

* CUDA: iq5_k_r4 dequantize

* CUDA: iq5_k_r4 GEMV

~3% slower than iq5_k.

* CUDA: iq3_k_r4 dequantize

* CUDA: iq3_k_r4 GEMV

* CUDA: slightly faster iq3_k_r4 GEMV

* CUDA: iq2_k_r4 GEMV

* CUDA: faster iq2_k_r4 GEMV

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2025-05-26 19:34:54 +03:00
..
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00