mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-03-11 06:20:09 +00:00
* This seems to be a better way to do the attention matrix multiplications in the TG case. * Cleanup * Fuse up and gate gemms in MoE models Small (~1-2%) but measurable performan ce gain --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>