ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-03-09 05:20:01 +00:00

Files

Kawrakow 7e79665a31 CUDA implementation for IQ1_S_R4 (#492 )

* iq1_s_r4: CUDA dequantize

* iq1_s_r4: CUDA GEMV

* iq1_s_r4: MMQ on CUDA

Requires Turing or better (will fall back to dequantize+cuBLAS on older cards).

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

2025-06-05 07:24:31 +03:00

cmake

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

include

Trellis quants with CPU inference (#441 )

2025-05-23 09:17:52 +03:00

src

CUDA implementation for IQ1_S_R4 (#492 )

2025-06-05 07:24:31 +03:00

.gitignore

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

CMakeLists.txt

Option to enable disable the IQK CPU FA kernels (#429 )

2025-05-17 11:21:58 +03:00