mscclpp

mirror of https://github.com/microsoft/mscclpp.git synced 2026-05-13 01:36:10 +00:00

Author	SHA1	Message	Date
Qinghua Zhou	520c890df5	Add debug variable MSCCLPP_DEBUG_ALLTOALLV_to print	2026-04-02 04:39:48 +00:00
Qinghua Zhou	36940dbacf	Match the message size for EP bench HT of 16 GPUs in test 6	2026-03-30 03:40:05 +00:00
Qinghua Zhou	62ab8883a6	Update multinode mode selection logic for IB and NVSwitch; Add tests of EP equivalent workloads	2026-03-30 01:34:53 +00:00
Qinghua Zhou	ec011f14ea	Add detection of torch.baseline and debug info	2026-03-25 01:52:24 +00:00
Qinghua Zhou	7e1cb7b8cf	Support cross-node CudaIPC	2026-03-21 10:41:32 +00:00
Qinghua Zhou	9ef1fb7cee	Run pass the multinode test	2026-03-18 17:08:22 +00:00
Qinghua Zhou	bdb30b56a5	Broadcast UniqueId via TCP; Detect whether torch comparison is possible	2026-03-16 10:01:35 +00:00
Qinghua Zhou	f47e97659d	Update the benchmark to improve the rank mapping, communicator creation, backend selection	2026-03-16 09:25:34 +00:00
Qinghua Zhou	d00713d3c2	Add more real moe workloads for alltoallv	2026-03-02 12:51:21 +00:00
Qinghua Zhou	ee843d445f	Add test of real MoE workloads	2026-02-25 12:39:48 +00:00
Qinghua Zhou	ae59eab6a2	Add unified benchmarking function to test all_to_all_single of mscclpp and torch	2026-02-24 07:17:17 +00:00
Qinghua Zhou	715ecd91cf	Add baseline test of torch.distributed.all_to_all_single	2026-02-24 06:51:10 +00:00
Qinghua Zhou	98be0def08	Use variable sizes in the peformance test	2026-02-24 06:29:46 +00:00
Qinghua Zhou	6292b6ab33	Report undirectional bandwidth	2026-02-24 06:02:33 +00:00
Qinghua Zhou	21e3f1ebb3	Get correct remote receive displacements for peers	2026-02-23 14:22:30 +00:00
Qinghua Zhou	7ba83e20dd	PyTorch-compatible all_to_all_single API using mscclpp kernels	2026-02-23 09:51:51 +00:00

16 Commits