Caio Rocha
ff18bb8d0b
Providing reduce-scatter test support ( #390 )
2024-11-28 09:19:30 -08:00
Binyang Li
28a57b0610
NVLS support for msccl++ executor ( #375 )
...
- Support mote datatype for multicast operation
- Add new OP MULTI_LOAD_REDUCE_STORE to support NVLS
- Modify allocSharedPhysicalCuda, which return std::shared_ptr<T>
instead of std::shared_ptr<PhysicalCudaMemory>
- Add Python support for allocSharedPhysicalCuda
Test passed for `allreduce_nvls.json`
2024-11-20 06:43:28 +00:00
Ziyue Yang
9526d76fc7
Add kernel-based verification for executor_test ( #378 )
...
Add kernels to fill and test data for correctness test in
executor_test.py.
2024-11-07 14:14:20 +08:00
Ziyue Yang
95ab1088ef
Fix in-place all-gather input buffer in executor_test ( #372 )
2024-10-24 23:04:11 +08:00
Caio Rocha
c6e06cfad7
Executor AllGather In-Place Support ( #365 )
2024-10-21 05:45:56 -07:00
Caio Rocha
08a0cec2eb
Fixing RegisterMemory Allocation for ProxyChannels ( #353 )
...
Co-authored-by: Binyang Li <binyli@microsoft.com >
Co-authored-by: Changho Hwang <changhohwang@microsoft.com >
2024-09-24 23:01:41 -07:00
Binyang Li
b30bb260e3
Tune threads per block for mscclpp executor ( #345 )
2024-09-18 17:21:47 -07:00
Ziyue Yang
faadc75649
Fix missing import in executor test ( #334 )
2024-08-06 14:24:50 -07:00
Ziyue Yang
76328fe623
Add NPKit GPU event support ( #310 )
2024-06-13 13:59:50 +08:00
Changho Hwang
1f62dfd7cd
Add C++ executor test ( #304 )
...
- Add C++ executor test
- Fix executor bugs for packet operation
- Enhance executor_test.py
---------
Co-authored-by: Binyang Li <binyli@microsoft.com >
2024-05-29 10:54:36 +00:00
Binyang Li
64d837f9ab
Add executor to execute schedule-plan file ( #283 )
...
Add executor to execute the JSON schedule file generated by msccl-tools
---------
Co-authored-by: Changho Hwang <changhohwang@microsoft.com >
2024-04-18 19:10:41 +00:00