Fuse add + fused_rms_norm (CUDA) (#852)

* Combine all calls to llm_build_norm to a single line

so more easily check what kind of arguments are being passed
by simply using grep.

* Combine add + fused_rms_norm

For many models this happens at each layer: the result of the
layer is added to the ayer input, which then becomes the input
to the next layer, which then is typically normalized via
fused_rms_norm.

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
This commit is contained in:
Kawrakow
2025-10-21 14:29:50 +03:00
committed by GitHub
parent 92231460cf
commit caf9759c97
4 changed files with 258 additions and 531 deletions

File diff suppressed because it is too large Load Diff