Files
ik_llama.cpp/ggml
Kawrakow 4e2afbcd90 CUDA: Faster prompt processing for several quantization types (#595)
* cuda: slightly faster MMQ for iq3_k, iq3_k_r4

* cuda: slightly faster MMQ for iq4_k, iq4_k_r4

* cuda: slightly faster MMQ for iq4_ks_r4

* cuda: slightly faster MMQ for iq4_ks

* cuda: slightly faster MMQ for iq4_xs

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2025-07-10 09:27:28 +02:00
..
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00