ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-05-11 16:40:16 +00:00

Files

Kawrakow 45cd1bcd59 CUDA: MMQ for IQ4_KS (#374 )

* WIP

* WIP: still getting illegal memory access

* CUDA: MMQ for iq4_ks now works

~25% faster than dequantize+cuBLAS, ~10% slower than Q4_0 MMQ.

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

2025-05-04 12:45:00 +03:00

cmake

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

include

Add copyright notices (#317 )

2025-04-07 10:43:26 +02:00

src

CUDA: MMQ for IQ4_KS (#374 )

2025-05-04 12:45:00 +03:00

.gitignore

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

CMakeLists.txt

Compile time option to use bf16 for qunts without MMQ kernels (#261 )

2025-03-18 07:37:10 +01:00