Changho Hwang
2127a3ba29
Improve CMake options ( #376 )
...
* Let all CMake option names start with `MSCCLPP_`
* Explain the `MSCCLPP_BUILD_PYTHON_BINDINGS` option in readme
---------
Co-authored-by: Binyang Li <binyli@microsoft.com >
2024-11-22 01:54:11 +00:00
Binyang Li
28a57b0610
NVLS support for msccl++ executor ( #375 )
...
- Support mote datatype for multicast operation
- Add new OP MULTI_LOAD_REDUCE_STORE to support NVLS
- Modify allocSharedPhysicalCuda, which return std::shared_ptr<T>
instead of std::shared_ptr<PhysicalCudaMemory>
- Add Python support for allocSharedPhysicalCuda
Test passed for `allreduce_nvls.json`
2024-11-20 06:43:28 +00:00
Ziyue Yang
3e51e9b359
Fix missing packet parameter for executor ( #385 )
2024-11-19 08:36:37 +08:00
Binyang Li
1baea89fa0
Fix light load bug ( #379 )
...
Fix lightLoadExecutionPlan issue.
An execution context many have multi device execution plans. These plans
share the channel connections which are constructed before.
A deviceExecutionPlanKey is introduced to identify these plans. We can
get the current device execution plan key via:
`contexts.currentDevicePlan`
2024-11-13 07:58:43 +00:00
Caio Rocha
d5d608abdc
Fixing Bug Const Offset in Execution Plan ( #380 )
...
The offset was not differentiating between the buffer types, causing the
offset to be incorrect when the buffer type was not `SCRATCH`.
---------
Co-authored-by: Changho Hwang <changhohwang@microsoft.com >
2024-11-11 20:02:02 -08:00
Changho Hwang
85fdde7a73
Lazily create the context stream ( #381 )
...
Create the context stream only when needed.
2024-11-11 10:39:32 +08:00
Caio Rocha
c6e06cfad7
Executor AllGather In-Place Support ( #365 )
2024-10-21 05:45:56 -07:00
Changho Hwang
0c150e5166
Fix copyright messages ( #367 )
2024-10-17 21:25:46 -07:00
Caio Rocha
08a0cec2eb
Fixing RegisterMemory Allocation for ProxyChannels ( #353 )
...
Co-authored-by: Binyang Li <binyli@microsoft.com >
Co-authored-by: Changho Hwang <changhohwang@microsoft.com >
2024-09-24 23:01:41 -07:00
Ziyue Yang
5c4e105814
Fix NPKit exit event offset ( #356 )
2024-09-19 13:35:44 +08:00
Binyang Li
b30bb260e3
Tune threads per block for mscclpp executor ( #345 )
2024-09-18 17:21:47 -07:00
Binyang Li
7bedb25054
Add proxy channel related operations ( #351 )
...
Add Flush, PutWithSignal, PutWithFlushAndSignal operation
2024-09-15 13:24:57 -07:00
Binyang Li
26a87535f9
Fix bug for construct sempaphore ( #341 )
...
Current semaphore construction requires two-way communication, e.g., to
construct a semaphore signaling from rank 0 to rank 1, both rank 0 and
rank 1 need to send a message to each other. This PR fixes an executor
bug that fails to conduct two-way communication for constructing such
one-way semaphores, and instead hangs during the semaphore construction.
In the future, we may need to change the implementation to construct
semaphore via one-way communication.
---------
Co-authored-by: Changho Hwang <changhohwang@microsoft.com >
2024-09-04 19:42:03 +08:00
Changho Hwang
72b99a4229
Fix for ROCm 6.0 ( #347 )
2024-09-01 20:22:33 -07:00
Caio Rocha
4eca6f1e95
Support executors to send packets over ProxyChannel ( #344 )
...
Co-authored-by: Binyang Li <binyli@microsoft.com >
2024-08-30 22:10:33 +00:00
Caio Rocha
1af62ea43d
ProxyChannel Support in Executor ( #342 )
...
Co-authored-by: Changho Hwang <changhohwang@microsoft.com >
2024-08-27 10:09:44 -07:00
Changho Hwang
1e82dd444f
Make ibverbs optional at compile time ( #340 )
...
Co-authored-by: Caio Rocha <caiorocha@microsoft.com >
2024-08-21 12:47:05 -07:00
Caio Rocha
ead4efc315
Dynamically load libibverbs ( #337 )
2024-08-13 23:48:39 -07:00
Changho Hwang
8c6fb429e9
bfloat16 support ( #336 )
...
* Add bfloat16 support for executor and NCCL interface
* Changed `gpu_data_types.hpp` into an internal header file
2024-08-12 15:41:58 -07:00
caiomcbr
67eb9b04cc
NCCL API Executor Integration ( #331 )
...
Co-authored-by: Changho Hwang <changhohwang@microsoft.com >
2024-07-25 15:05:02 -07:00
Roshan Dathathri
f131fae3ec
Add support for different vector sizes in multimem instructions ( #332 )
2024-07-25 10:14:02 -07:00
Ziyue Yang
b5a48f836c
Separate NPKit CPU timestamp access from different blocks for AMD platform ( #321 )
...
Reference: https://github.com/ROCm/rccl/pull/1229
2024-07-02 19:36:48 +08:00
Ziyue Yang
f29095b3b1
Fix NPKit support for AMD ( #312 )
2024-06-14 16:22:14 +08:00
Ziyue Yang
76328fe623
Add NPKit GPU event support ( #310 )
2024-06-13 13:59:50 +08:00
Binyang Li
80aefe55bc
Cumulative Updates ( #309 )
...
Bug fix: Unable to execute communication primitives with the same
execution plan but varying message sizes.
Add reduce_packets OP
2024-06-12 19:17:57 +08:00
Changho Hwang
1f62dfd7cd
Add C++ executor test ( #304 )
...
- Add C++ executor test
- Fix executor bugs for packet operation
- Enhance executor_test.py
---------
Co-authored-by: Binyang Li <binyli@microsoft.com >
2024-05-29 10:54:36 +00:00
Binyang Li
3a18068cd4
Fix security issue ( #305 )
...
Change sprintf to snprintf to avoid potential security issue
2024-05-25 23:12:57 -07:00
Binyang Li
6226556ce2
Optimized the execution kernel ( #294 )
2024-05-03 11:54:50 -07:00
Binyang Li
5628362715
Resolve multi-nodes test failure issue ( #295 )
...
Fix bug, resolve multi-nodes test failure issue.
2024-04-26 13:06:57 +08:00
Changho Hwang
d4ede480f4
Ethernet support ( #284 )
...
Co-authored-by: Binyang Li <binyli@microsoft.com >
Co-authored-by: Caio Rocha <caiorocha@microsoft.com >
2024-04-25 11:06:43 -07:00
Changho Hwang
89896ff94f
Include GPU data types only for kernel code ( #292 )
2024-04-24 20:55:02 -07:00
Changho Hwang
6c1fa5307c
Refactoring NVLS interfaces ( #293 )
...
Move NVLS details from the core to a separate interface
2024-04-24 10:05:41 -07:00
Changho Hwang
9934c982a8
Seperate headers for GPU data types ( #291 )
...
Prevent unnecessarily including data type headers in everywhere.
2024-04-19 05:52:43 +00:00
Roshan Dathathri
41e0964d93
Allow binding allocated memory to NVLS multicast pointer ( #290 )
...
And change NVLS multimem instructions to static functions
2024-04-18 17:11:31 -07:00
Binyang Li
64d837f9ab
Add executor to execute schedule-plan file ( #283 )
...
Add executor to execute the JSON schedule file generated by msccl-tools
---------
Co-authored-by: Changho Hwang <changhohwang@microsoft.com >
2024-04-18 19:10:41 +00:00
Changho Hwang
9406123711
Fix a typo name ( #286 )
2024-04-17 23:45:46 +00:00
Changho Hwang
5ba6ce00c7
Fix bootstrapping mechanism ( #278 )
...
Co-authored-by: Binyang Li <binyli@microsoft.com >
Co-authored-by: Pashupati Kumar <74680231+pash-msft@users.noreply.github.com >
2024-03-27 10:24:24 +08:00
Changho Hwang
d34e097b40
Fix wrong offset calculation ( #257 )
2024-02-06 08:55:43 +08:00
Saeed Maleki
91d592dcc0
NVLS support. ( #250 )
...
Co-authored-by: Saeed Maleki <saemal@microsoft.com >
Co-authored-by: Binyang Li <binyli@microsoft.com >
Co-authored-by: Changho Hwang <changhohwang@microsoft.com >
2024-02-04 20:46:10 -08:00
Binyang Li
163cba08c8
Update interface to let user change fifo size ( #243 )
...
Related with this issue:
https://github.com/microsoft/mscclpp/issues/242 . The user may use more
threads than the number specified in `fifo_size` to interact with the
FIFO. In this case, there will be unexpected behavior.
Update the interface to let user change fifo size on their demands.
2024-01-09 22:14:36 -08:00
Binyang Li
e7d3e2d44b
Fix crash in static variable deconstructor ( #238 )
...
According to https://en.cppreference.com/w/cpp/utility/program/exit , `The last destructor for thread-local objects is [sequenced-before](https://en.cppreference.com/w/cpp/language/eval_order ) the first destructor for a static object.`
Change the code to avoid this case.
---------
Co-authored-by: Changho Hwang <changhohwang@microsoft.com >
2023-12-25 14:01:28 +00:00
Changho Hwang
5fa5bd2706
Check nvidia_peermem during runtime ( #234 )
2023-12-25 12:02:10 +08:00
Changho Hwang
544ff0c21d
ROCm support ( #213 )
...
Co-authored-by: Binyang Li <binyli@microsoft.com >
2023-11-24 16:41:56 +08:00
Changho Hwang
e710701728
Warning ahead of CQ being full ( #202 )
2023-11-15 08:03:29 +00:00
Changho Hwang
7686e15fbd
Allow infinite waiting ( #200 )
2023-10-23 12:28:05 +08:00
Saeed Maleki
85e8017535
Atomic for semaphores instead of fences ( #188 )
...
Co-authored-by: Pratyush Patel <pratyushpatel.1995@gmail.com >
Co-authored-by: Esha Choukse <eschouks@microsoft.com >
Co-authored-by: Changho Hwang <changhohwang@microsoft.com >
2023-10-13 18:57:08 +08:00
Saeed Maleki
c4785c9591
Improve debugging messages ( #195 )
...
Debugging information to understand what connections are being made.
---------
Co-authored-by: Changho Hwang <changhohwang@microsoft.com >
2023-10-13 16:55:52 +08:00
Saeed Maleki
148681b4bc
Fix a pytest bug ( #196 )
2023-10-13 16:39:43 +08:00
Changho Hwang
8c0f9e84d0
v0.3.0 ( #171 )
2023-10-11 22:35:54 +08:00
Changho Hwang
6c0ee72916
Construct ProxyChannel with shared pointers ( #184 )
2023-09-18 05:46:23 +00:00