mirror of
https://github.com/microsoft/mscclpp.git
synced 2026-05-12 01:10:22 +00:00
Each local expert sends one copy per dispatched token back to its owner, so the bytes actually on the wire during combine match dispatch. The previous num_tokens×hidden under-counted by ~num_topk×, making combine BW look artificially low next to dispatch.