mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-01-31 11:39:52 +00:00
* Fused up+gate+unary for regular (not MoE) FFN - CPU * WIP CUDA * Seems to be working on CUDA For a dense model we get 2-3% speedup for PP and ~0.6% for TG. * Add command line option This time the option is ON by default, and one needs to turn it off via -no-fug or --no-fused-up-gate --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>