Files
ik_llama.cpp/src
Kawrakow fcd1e124e0 Faster MoE token generation on CUDA (#248)
* This gives us ~20% TG speedup for DeepSeek on CUDA

* Slightly better

* Also do it for plain (not fused) mul_mat_id

* Guard against numerical precision issues for MLA on CUDA

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2025-03-10 16:16:51 +02:00
..
2024-07-27 07:55:01 +02:00
2024-09-28 17:59:47 +03:00
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00
2025-01-23 18:24:10 +02:00
2024-07-27 07:55:01 +02:00