Files
ik_llama.cpp/ggml
Kawrakow 33646fc409 Fuse MoE up and gate matrix multiplications (#219)
* This seems to be a better way

to do the attention matrix multiplications in the TG case.

* Cleanup

* Fuse up and gate gemms in MoE models

Small (~1-2%) but measurable performan ce gain

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2025-02-22 09:41:40 +02:00
..
2024-07-27 07:55:01 +02:00
2024-07-27 07:55:01 +02:00