ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-03-13 23:40:09 +00:00

Files

Kawrakow 89728ab03c CUDA implementation for IQ2_K_R4, IQ3_K_R4, IQ4_K_R4, IQ5_K_R4 (#461 )

* CUDA: iq4_k_r4 dequantize

* CUDA: iq4_k_r4 GEMV

~10% slower than iq4_k.

* CUDA: slightly faster iq4_k_r4 GEMV

* CUDA: slightly faster iq4_k_r4 GEMV

We are now within 3% of iq4_k

* CUDA: iq5_k_r4 dequantize

* CUDA: iq5_k_r4 GEMV

~3% slower than iq5_k.

* CUDA: iq3_k_r4 dequantize

* CUDA: iq3_k_r4 GEMV

* CUDA: slightly faster iq3_k_r4 GEMV

* CUDA: iq2_k_r4 GEMV

* CUDA: faster iq2_k_r4 GEMV

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

2025-05-26 19:34:54 +03:00

cmake

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

include

Trellis quants with CPU inference (#441 )

2025-05-23 09:17:52 +03:00

src

CUDA implementation for IQ2_K_R4, IQ3_K_R4, IQ4_K_R4, IQ5_K_R4 (#461 )

2025-05-26 19:34:54 +03:00

.gitignore

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

CMakeLists.txt

Option to enable disable the IQK CPU FA kernels (#429 )

2025-05-17 11:21:58 +03:00