mirror of
https://github.com/microsoft/mscclpp.git
synced 2026-05-13 09:46:00 +00:00
Allow default_allreduce_nvls_zero_copy to run when the public symmetric_memory flag is false; the algorithm already binds the concrete input and output allocations in its context. Include that fast path in MNNVL tuning and bound allpair/NVLS packet candidates to small sizes so large-message no-symmetric tuning avoids slow or unsafe packet variants. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>