ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-02-26 08:04:09 +00:00

Files

Iwan Kawrakow 3da565c9c9 iq1_kt: CUDA dequantize

Testing with LlaMA-3.1-8B-Instruct, we get almost the same PPL
as iq2_xxs, so about 0.2 bpw fewer bits for the same quality.

2025-07-16 16:55:20 +03:00

CMakeLists.txt

2024-12-17 14:16:34 +01:00

llama-grammar.cpp

2024-08-12 15:14:32 +02:00

llama-grammar.h

2024-07-27 07:55:01 +02:00

llama-impl.h

add dry sampler (#513 )

2025-06-19 10:24:53 +03:00

llama-sampling.cpp

add dry sampler (#513 )

2025-06-19 10:24:53 +03:00

llama-sampling.h

add dry sampler (#513 )

2025-06-19 10:24:53 +03:00

llama-vocab.cpp

2025-07-14 18:43:52 +02:00

llama-vocab.h

add dry sampler (#513 )

2025-06-19 10:24:53 +03:00

llama.cpp

iq1_kt: CUDA dequantize

2025-07-16 16:55:20 +03:00

unicode-data.cpp

2024-07-27 07:55:01 +02:00

unicode-data.h

2024-07-27 07:55:01 +02:00

unicode.cpp

2025-07-14 18:43:52 +02:00

unicode.h

2025-07-14 18:43:52 +02:00