Grouped expert routing (CPU only) (#836)

* Better argsort (CPU)

* Attemt at grouped topk

* This seems to do the trick for grouped experts routing

* Cleanup

* Trying to merge, something is not right

* Working merged grouped top_k (CPU)

* Add command line option to enable grouped expert routing

* Add grouped expert routing option to llama-bench

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
This commit is contained in:
Kawrakow
2025-10-16 14:57:02 +03:00
committed by GitHub
parent e66d307e13
commit cde642e591
11 changed files with 221 additions and 44 deletions

View File

@@ -420,6 +420,7 @@ extern "C" {
int mla_attn; // whether to use MLA attention [EXPERIMENTAL]
int attn_max_batch; // maximum batch size for attention computations [EXPERIMENTAL]
bool fused_moe_up_gate; // whether to use fused MoE up/gate op
bool grouped_expert_routing; // whether to use grouped expert routing (BailingMoeV2 arch)
bool fused_up_gate; // whether to use fused up/gate op [EXPERIMENTAL]
int min_experts;
float thresh_experts;