ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-04-22 23:49:23 +00:00

Files

Kawrakow 699c9cb7f6 Faster MoE token generation on CUDA (#248 )

* This gives us ~20% TG speedup for DeepSeek on CUDA

* Slightly better

* Also do it for plain (not fused) mul_mat_id

* Guard against numerical precision issues for MLA on CUDA

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

2025-03-10 16:16:51 +02:00

cmake

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

include

SER - Smart Expert Reduction (#239 )

2025-03-02 13:47:38 +02:00

src

Faster MoE token generation on CUDA (#248 )

2025-03-10 16:16:51 +02:00

.gitignore

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

CMakeLists.txt

FA: Add option to build all FA kernels (#197 )

2025-02-09 18:59:33 +02:00