Files
ik_llama.cpp/ggml/src
Kawrakow f22a9ef95a CUDA: prompt processing optimizations for MoE models (#739)
* Skip the row id computation for the ffn_down op

Sadly, almost negligible performance gain.

* Also this doesn't do much

* Also this barely moves the needle

* This is slightly better

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2025-08-30 12:09:41 +03:00
..
2025-08-29 09:08:15 +03:00
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00
2025-08-09 08:40:18 +03:00
2025-08-09 08:40:18 +03:00
2025-08-09 08:40:18 +03:00
2025-08-27 08:03:47 +03:00
2025-07-15 08:03:13 +02:00
2025-08-27 08:03:47 +03:00