ik_llama.cpp

mirror of https://github.com/ikawrakow/ik_llama.cpp.git synced 2026-03-06 12:00:29 +00:00

Files

Iwan Kawrakow 001abccf73 Fusing MoE up * unary(gate): CUDA

We get ~13% speedup for PP-512 and ~2% for TG-128
for DeepSeek-Lite

2025-02-23 11:37:31 +02:00

2024-07-27 07:55:01 +02:00

2025-02-23 11:37:31 +02:00

2025-02-23 11:37:31 +02:00

.gitignore

2024-07-27 07:55:01 +02:00

CMakeLists.txt

2025-02-09 18:59:33 +02:00