Files
ik_llama.cpp/ggml
Kawrakow 45cd1bcd59 CUDA: MMQ for IQ4_KS (#374)
* WIP

* WIP: still getting illegal memory access

* CUDA: MMQ for iq4_ks now works

~25% faster than dequantize+cuBLAS, ~10% slower than Q4_0 MMQ.

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2025-05-04 12:45:00 +03:00
..
2024-07-27 07:55:01 +02:00
2025-04-07 10:43:26 +02:00
2025-05-04 12:45:00 +03:00
2024-07-27 07:55:01 +02:00