Files
ik_llama.cpp/ggml
Kawrakow 98a264a2ea CUDA: better MoE implementation (#283)
* Make fused MoE reproducible

As a bonus, peak performance at pp2048 with u_batch = 2048 is
~8% better.

* Slightly better

* Also do it for non-fused mul_mat_id

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2025-03-25 07:47:10 +01:00
..
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00