ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-02-24 23:24:13 +00:00

Files

Kawrakow 0551e7630b Moving 4D gemm logic from ggml.c to iqk_mul_mat.cpp (#207 )

This allows us to optimize TG performance for GQA models.
E.g., for IQ4_XS L3-8B with 8k TG-64 goes from 8.6 to 10.26 t/s.

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

2025-02-15 08:45:45 +02:00

cmake

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

include

Use Q8_K_128 for IQ1_S_R4 and IQ1_M_R4 matrix multiplications (#194 )

2025-02-09 09:14:52 +02:00

src

Moving 4D gemm logic from ggml.c to iqk_mul_mat.cpp (#207 )

2025-02-15 08:45:45 +02:00

.gitignore

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

CMakeLists.txt

FA: Add option to build all FA kernels (#197 )

2025-02-09 18:59:33 +02:00