ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-04-25 17:09:22 +00:00

Files

Iwan Kawrakow d1b4b34a79 q4_k

58.2 t/s -> 114.8 t/s. iq4_k_r4 is at 130.9 t/s.

As I had to add a new implementation for q8_1-quantized
activations, TG became slightly faster too
(25.1 -> 25.9 t/s).

2025-06-24 10:16:04 +02:00

cmake

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

include

Fix non rpc build error (#506 )

2025-06-08 17:27:00 +03:00

src

q4_k

2025-06-24 10:16:04 +02:00

.gitignore

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

CMakeLists.txt

Better strategy for GPU offload (#520 )

2025-06-12 19:25:11 +03:00