Bitnet: use the fused mul-silu in the FFN network (#110)

I had forgotten that build_bitnet() does not use the standerd
llm_build_ffn function, so the fused mul-silu didn't get used
for Bitnet when I added it to llm_build_ffn.

This gives us another ~1% speedup for TG-128.

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
This commit is contained in:
Kawrakow
2024-10-26 17:40:32 +02:00
committed by GitHub
parent 8ccd9bc7e5
commit cd96f6c4e5

View File

@@ -13399,12 +13399,7 @@ struct llm_build_context {
cb(cur, "ffn_gate", il);
// combine this with the above scale into ggml_scaled_silu
cur = ggml_silu(ctx0, cur);
cb(cur, "ffn_silu", il);
cur = ggml_mul(ctx0, cur, tmp);
cur = ggml_fused_mul_unary(ctx0, cur, tmp, GGML_UNARY_OP_SILU);
cb(cur, "ffn_gate_par", il);
cur = llm_build_norm(ctx0, cur, hparams,