Fused mul + multi_add op (#858)

* Adding fused mul+multi_add + CPU implementation

* fused mul+multi_add: CUDA

* fused mul+multi_add: command line argument to disable it

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
This commit is contained in:
Kawrakow
2025-10-24 07:40:35 +03:00
committed by GitHub
parent 856c6da9c1
commit 0549be76e5
15 changed files with 211 additions and 38 deletions

View File

@@ -422,6 +422,7 @@ extern "C" {
bool fused_moe_up_gate; // whether to use fused MoE up/gate op
bool grouped_expert_routing; // whether to use grouped expert routing (BailingMoeV2 arch)
bool fused_up_gate; // whether to use fused up/gate op [EXPERIMENTAL]
bool fused_mmad; // whether to use fused mul+multi_add op [EXPERIMENTAL]
int min_experts;
float thresh_experts;
bool only_active_experts;