Files
mscclpp/examples
Binyang Li 865c2bc795 Optimize MNNVL allreduce without symmetric memory
Run the tuning example with symmetric memory disabled, make allreduce tuning use the same symmetric-memory mode as execution, and narrow the MNNVL small-message candidate set to avoid slower packet/NVLS choices. Increase packet and RSAG channel parallelism so non-symmetric CUDA-IPC paths can use 112-block packet and 128-block RSAG configs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-28 07:55:52 +00:00
..