mirror of
https://github.com/ikawrakow/ik_llama.cpp.git
synced 2026-03-05 11:30:09 +00:00
Fuse add + fused_rms_norm (CUDA) (#852)
* Combine all calls to llm_build_norm to a single line so more easily check what kind of arguments are being passed by simply using grep. * Combine add + fused_rms_norm For many models this happens at each layer: the result of the layer is added to the ayer input, which then becomes the input to the next layer, which then is typically normalized via fused_rms_norm. --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
This commit is contained in: