ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-03-02 10:00:07 +00:00

Files

Iwan Kawrakow e10f7d1f10 q3_K: repack to q8_k_r8 instead of q8_0_r8

With that we hit 360 t/s for LlaMA-3.1-8B on a Ryzen-7950X.
q8_k_r8 is 386 t/s, so for a batch size of 512 repacking costs
~7% of the time taken by the actual GEMM.

2025-06-15 10:37:12 +03:00

cmake

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

include

Fix non rpc build error (#506 )

2025-06-08 17:27:00 +03:00

src

q3_K: repack to q8_k_r8 instead of q8_0_r8

2025-06-15 10:37:12 +03:00

.gitignore

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

CMakeLists.txt

Better strategy for GPU offload (#520 )

2025-06-12 19:25:11 +03:00