ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-02-24 15:14:10 +00:00

Files

Iwan Kawrakow e558992f0c New iq4_kt trellis

The new trellis generates int8_t values via
sum_as_uint8_t[(ka * idx + kb) & 0x3f33f3f3f] - 126.
CUDA dequantize works.
AVX2 case Ny > 32 works, and we get 273 t/s for L3-8B.
PPL is on par or even slightly lower than original QTIP trellis.

2025-06-18 15:34:25 +03:00

cmake

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

include

Fix non rpc build error (#506 )

2025-06-08 17:27:00 +03:00

src

New iq4_kt trellis

2025-06-18 15:34:25 +03:00

.gitignore

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

CMakeLists.txt

Better strategy for GPU offload (#520 )

2025-06-12 19:25:11 +03:00