Files
ik_llama.cpp/ggml
Kawrakow f22a9ef95a CUDA: prompt processing optimizations for MoE models (#739)
* Skip the row id computation for the ffn_down op

Sadly, almost negligible performance gain.

* Also this doesn't do much

* Also this barely moves the needle

* This is slightly better

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2025-08-30 12:09:41 +03:00
..
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00