mscclpp

mirror of https://github.com/microsoft/mscclpp.git synced 2026-05-25 15:24:43 +00:00

Author	SHA1	Message	Date
Qinghua Zhou	594dc79657	Address NVLS review feedback Handle unsupported FP8 NVLS paths safely, tighten IPC-domain guards, align IPC-domain naming, and add IPC-domain fabric hash logging. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-16 23:19:25 +00:00
Binyang Li	0744e806fc	detect ipc domain automaticlly	2026-05-16 00:39:49 +00:00
Binyang Li	45a651b2c8	Decouple IPC-domain hint from bootstrap nRanksPerNode Replace MSCCLPP_MNNVL_NRANKS_PER_NODE (which overrode TcpBootstrap and silently changed getNranksPerNode() for every consumer) with a single algorithm-level helper getIpcDomainNranks(comm) backed by a new MSCCLPP_IPC_DOMAIN_NRANKS env. The neutral IPC name covers both NVLink/ MNNVL on NV and XGMI on AMD. Bootstrap is unchanged and continues to report physical-host detection. Collapse the two getCollectiveDomainNranksPerNode overloads into one canonical helper and route all six allreduce algos (packet, allpair_packet, nvls_packet, nvls_zero_copy, rsag, rsag_zero_copy) through it. Update the standalone tuning example to use the new env name; drop the undeclared MSCCLPP_ENABLE_MNNVL gate; fix multi_host_mnnvl detection now that nranks_per_node is no longer overridden by the bootstrap. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-01 18:27:17 +00:00
Binyang Li	893a08e69c	Enable MNNVL allreduce tuning Add an MNNVL rank-domain override so MSCCL++ collectives can treat multi-host NVLink fabrics as a single CUDA IPC/NVLS peer group. Update packet, RSAG, and NVLS allreduce paths to use the collective domain size and teach the torch integration tuning example to select MNNVL-capable allreduce algorithms. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-28 05:38:59 +00:00
Binyang Li	a707273701	Torch integration (#692 ) Reorganize current native algorithm implementation and DSL algorithm implementation. Provide unified API for DSL algo and native algo and provide interface to tune the algo Provide interface for pytorch integration with native API and DSL --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>	2026-01-21 20:32:24 -08:00

5 Commits