Fuse add+add+fused_rms (#853)

* Fuse add+add+fused_rms

* Try this

* Macro to easily enable/disable fusion

* Various:

* Check that all tensors involved are on the same device before applying fusion
* Fuse sigmoid+scale+sum_rows+div
* Fix the fused bailingmoe2 experts selection

The issue there was that the bias was not per row, but per
expert group, so only the first n_per_group biases were used
for al experts.

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
This commit is contained in:
Kawrakow
2025-10-22 16:18:11 +03:00
committed by GitHub
parent af5bf60cc8
commit ed4e1a6588
8 changed files with 281 additions and 54 deletions

View File

@@ -7793,6 +7793,7 @@ ggml_cgraph * llm_build_context::build_openai_moe() {
cur = ffn_inp;
cur = llm_build_norm(ctx0, cur, hparams, model.layers[il].attn_post_norm, nullptr, LLM_NORM_RMS, cb, il);
ggml_build_forward_expand(gf, cur);
cb(cur, "attn_post_norm", il);
bool use_dup_bias = cur->ne[1] < 32 && model.layers[il].ffn_up_exps_b_dup &&