mscclpp

mirror of https://github.com/microsoft/mscclpp.git synced 2026-05-05 06:01:26 +00:00

Author	SHA1	Message	Date
Binyang Li	fcb2e46cb1	NVLS support for NCCL API (#410 ) Co-authored-by: Qinghua Zhou <qinghuazhou@microsoft.com> Co-authored-by: Changho Hwang <changhohwang@microsoft.com>	2024-12-18 09:55:35 +00:00
Binyang Li	ee75caf365	Reduce memory usage for scratch buffer (#403 ) In the executor, we allocate the scratch buffer based on `sendMemRange`. However, for certain execution plans, this allocation may be unsuitable, as the plan does not support messages of this size. To avoid allocating to much data and cause OOM error, set scratch buffer size to `min(scratchBufferSize(maxMessageSizeSupportedForPlan), scratchBufferSize(sendMemRange))`	2024-12-13 13:00:04 -08:00
Caio Rocha	01fd813f1b	Exception Max Number Operation per Tb (#405 )	2024-12-11 16:06:15 -08:00
Changho Hwang	756f24c697	Revised ProxyChannel interfaces (#400 ) * Renamed `ProxyChannel` -> `BaseProxyChannel` and `SimpleProxyChannel` -> `ProxyChannel`. It makes the interface more consistent by defining channels to be associated with a certain src/dst memory region: `ProxyChannel` as "sema + src/dst + fifo" and `SmChannel` as "sema + src/dst". BaseProxyChannel is not associated with any memory regions, as "sema + fifo". * `ProxyChannelDeviceHandle` now inherits from `BaseProxyChannelDeviceHandle`, instead of having one as a member.	2024-12-06 10:53:34 -08:00
Binyang Li	28a57b0610	NVLS support for msccl++ executor (#375 ) - Support mote datatype for multicast operation - Add new OP MULTI_LOAD_REDUCE_STORE to support NVLS - Modify allocSharedPhysicalCuda, which return std::shared_ptr<T> instead of std::shared_ptr<PhysicalCudaMemory> - Add Python support for allocSharedPhysicalCuda Test passed for `allreduce_nvls.json`	2024-11-20 06:43:28 +00:00
Binyang Li	1baea89fa0	Fix light load bug (#379 ) Fix lightLoadExecutionPlan issue. An execution context many have multi device execution plans. These plans share the channel connections which are constructed before. A deviceExecutionPlanKey is introduced to identify these plans. We can get the current device execution plan key via: `contexts.currentDevicePlan`	2024-11-13 07:58:43 +00:00
Caio Rocha	c6e06cfad7	Executor AllGather In-Place Support (#365 )	2024-10-21 05:45:56 -07:00
Caio Rocha	08a0cec2eb	Fixing RegisterMemory Allocation for ProxyChannels (#353 ) Co-authored-by: Binyang Li <binyli@microsoft.com> Co-authored-by: Changho Hwang <changhohwang@microsoft.com>	2024-09-24 23:01:41 -07:00
Binyang Li	b30bb260e3	Tune threads per block for mscclpp executor (#345 )	2024-09-18 17:21:47 -07:00
Binyang Li	26a87535f9	Fix bug for construct sempaphore (#341 ) Current semaphore construction requires two-way communication, e.g., to construct a semaphore signaling from rank 0 to rank 1, both rank 0 and rank 1 need to send a message to each other. This PR fixes an executor bug that fails to conduct two-way communication for constructing such one-way semaphores, and instead hangs during the semaphore construction. In the future, we may need to change the implementation to construct semaphore via one-way communication. --------- Co-authored-by: Changho Hwang <changhohwang@microsoft.com>	2024-09-04 19:42:03 +08:00
Caio Rocha	1af62ea43d	ProxyChannel Support in Executor (#342 ) Co-authored-by: Changho Hwang <changhohwang@microsoft.com>	2024-08-27 10:09:44 -07:00
caiomcbr	67eb9b04cc	NCCL API Executor Integration (#331 ) Co-authored-by: Changho Hwang <changhohwang@microsoft.com>	2024-07-25 15:05:02 -07:00
Ziyue Yang	b5a48f836c	Separate NPKit CPU timestamp access from different blocks for AMD platform (#321 ) Reference: https://github.com/ROCm/rccl/pull/1229	2024-07-02 19:36:48 +08:00
Ziyue Yang	76328fe623	Add NPKit GPU event support (#310 )	2024-06-13 13:59:50 +08:00
Binyang Li	80aefe55bc	Cumulative Updates (#309 ) Bug fix: Unable to execute communication primitives with the same execution plan but varying message sizes. Add reduce_packets OP	2024-06-12 19:17:57 +08:00
Binyang Li	6226556ce2	Optimized the execution kernel (#294 )	2024-05-03 11:54:50 -07:00
Binyang Li	64d837f9ab	Add executor to execute schedule-plan file (#283 ) Add executor to execute the JSON schedule file generated by msccl-tools --------- Co-authored-by: Changho Hwang <changhohwang@microsoft.com>	2024-04-18 19:10:41 +00:00

17 Commits