mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-01-26 17:20:01 +00:00
* CUDA: iq4_k_r4 dequantize * CUDA: iq4_k_r4 GEMV ~10% slower than iq4_k. * CUDA: slightly faster iq4_k_r4 GEMV * CUDA: slightly faster iq4_k_r4 GEMV We are now within 3% of iq4_k * CUDA: iq5_k_r4 dequantize * CUDA: iq5_k_r4 GEMV ~3% slower than iq5_k. * CUDA: iq3_k_r4 dequantize * CUDA: iq3_k_r4 GEMV * CUDA: slightly faster iq3_k_r4 GEMV * CUDA: iq2_k_r4 GEMV * CUDA: faster iq2_k_r4 GEMV --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>