aashaka
0650371b54
Allow obtaining cuda stream handle from PyTorch stream when launching kernel ( #297 )
...
Use `cuda_stream` attribute of a torch stream if the stream is not an
instance of the cupy stream.
2024-05-04 04:57:07 +00:00
Changho Hwang
6c1fa5307c
Refactoring NVLS interfaces ( #293 )
...
Move NVLS details from the core to a separate interface
2024-04-24 10:05:41 -07:00
Roshan Dathathri
41e0964d93
Allow binding allocated memory to NVLS multicast pointer ( #290 )
...
And change NVLS multimem instructions to static functions
2024-04-18 17:11:31 -07:00
Binyang Li
64d837f9ab
Add executor to execute schedule-plan file ( #283 )
...
Add executor to execute the JSON schedule file generated by msccl-tools
---------
Co-authored-by: Changho Hwang <changhohwang@microsoft.com >
2024-04-18 19:10:41 +00:00
Changho Hwang
9406123711
Fix a typo name ( #286 )
2024-04-17 23:45:46 +00:00
Changho Hwang
1a7cb98e3a
v0.4.3 ( #279 )
2024-03-27 11:53:09 -07:00
Changho Hwang
5ba6ce00c7
Fix bootstrapping mechanism ( #278 )
...
Co-authored-by: Binyang Li <binyli@microsoft.com >
Co-authored-by: Pashupati Kumar <74680231+pash-msft@users.noreply.github.com >
2024-03-27 10:24:24 +08:00
Saeed Maleki
a3d0799963
Fix the comm.py for nvls ( #267 )
...
Fix the comm.py for nvls
2024-02-19 10:39:21 +08:00
Binyang Li
5971508eed
Remove cuda-python from project ( #245 )
...
Remove cuda-python and use CuPy APIs instead
---------
Co-authored-by: Changho Hwang <changhohwang@microsoft.com >
2024-02-13 21:44:11 +08:00
aashaka
d97fef4395
Allow semaphores and memory to be registered separately in ProxyService ( #264 )
...
This is needed in use cases where SimpleProxyChannel does not suffice.
For example, when a single semaphore is to be used for multiple tensors
or when multiple semaphores should be associated with a tensor.
2024-02-08 09:55:29 -08:00
Binyang Li
7c229fbdd8
Fix multi-nodes test failure ( #262 )
...
fix multi-nodes CI pipeline
Co-authored-by: Changho Hwang <changhohwang@microsoft.com >
2024-02-07 18:21:05 -08:00
aashaka
2101f5251e
Allow MSCCL++ CommGroup to take PyTorch tensors in args ( #255 )
...
Obtain data_ptr and tensor_size accordingly for torch.Tensor
Co-authored-by: Binyang Li <binyli@microsoft.com >
2024-02-06 19:47:25 -08:00
Changho Hwang
6a19b19ece
Fix NVLS support ( #258 )
...
* Do not compile nvls_test with ROCm
* Fix multi-node tests
2024-02-06 23:24:13 +00:00
Saeed Maleki
91d592dcc0
NVLS support. ( #250 )
...
Co-authored-by: Saeed Maleki <saemal@microsoft.com >
Co-authored-by: Binyang Li <binyli@microsoft.com >
Co-authored-by: Changho Hwang <changhohwang@microsoft.com >
2024-02-04 20:46:10 -08:00
Binyang Li
422c81f0f8
remove make pylib-copy command ( #249 )
...
Fix #216
Remove `make pylib-copy`
2024-01-19 12:29:15 -08:00
Binyang Li
163cba08c8
Update interface to let user change fifo size ( #243 )
...
Related with this issue:
https://github.com/microsoft/mscclpp/issues/242 . The user may use more
threads than the number specified in `fifo_size` to interact with the
FIFO. In this case, there will be unexpected behavior.
Update the interface to let user change fifo size on their demands.
2024-01-09 22:14:36 -08:00
Changho Hwang
544ff0c21d
ROCm support ( #213 )
...
Co-authored-by: Binyang Li <binyli@microsoft.com >
2023-11-24 16:41:56 +08:00
Changho Hwang
dab19e00c1
Templatize Dockerfiles & update workflows ( #223 )
...
Now build images by a script with a shared Dockerfile template
---------
Co-authored-by: Binyang Li <binyli@microsoft.com >
Co-authored-by: Saeed Maleki <saemal@microsoft.com >
2023-11-22 13:29:12 -08:00
Changho Hwang
15f6dcca49
Update documentation ( #217 )
...
Co-authored-by: Saeed Maleki <saemal@microsoft.com >
2023-11-22 12:58:04 -08:00
Changho Hwang
7bd66a938c
Robust correctness test ( #221 )
...
Co-authored-by: Aashaka Shah <aashaka96@gmail.com >
2023-11-22 12:06:50 +08:00
Saeed Maleki
70eb6d7328
Fixing the bug in allreduce1 ( #220 )
2023-11-18 10:34:52 -08:00
Saeed Maleki
1d1199703a
Auto-tune single-node AllReduce ( #219 )
...
single node auto-tuner + graph plotter + bug fix for illegal memory access
---------
Co-authored-by: Changho Hwang <changhohwang@microsoft.com >
2023-11-17 21:42:05 +08:00
Changho Hwang
060fda12e6
mscclpp-test in Python ( #204 )
...
Co-authored-by: Binyang Li <binyli@microsoft.com >
Co-authored-by: Saeed Maleki <saemal@microsoft.com >
Co-authored-by: Esha Choukse <eschouks@microsoft.com >
2023-11-16 12:45:25 +08:00
Changho Hwang
4cdb100265
Release GIL for Python APIs with wait ( #190 )
2023-11-14 21:11:01 +08:00
Changho Hwang
3521fb0280
Clear minor warnings ( #214 )
...
Clear warnings from the clang compiler.
2023-11-14 09:28:48 +08:00
Saeed Maleki
85e8017535
Atomic for semaphores instead of fences ( #188 )
...
Co-authored-by: Pratyush Patel <pratyushpatel.1995@gmail.com >
Co-authored-by: Esha Choukse <eschouks@microsoft.com >
Co-authored-by: Changho Hwang <changhohwang@microsoft.com >
2023-10-13 18:57:08 +08:00
Saeed Maleki
148681b4bc
Fix a pytest bug ( #196 )
2023-10-13 16:39:43 +08:00
Changho Hwang
8c0f9e84d0
v0.3.0 ( #171 )
2023-10-11 22:35:54 +08:00
Changho Hwang
11ac824cc7
Align interfaces of put/get/putPackets/getPackets ( #185 )
2023-10-07 22:18:26 +08:00
Changho Hwang
6c0ee72916
Construct ProxyChannel with shared pointers ( #184 )
2023-09-18 05:46:23 +00:00
Changho Hwang
3aa72098d9
Add poll() for semaphores ( #181 )
2023-09-15 07:40:44 +00:00
Binyang2014
097aa8843a
Fix pytest unstable issue. ( #170 )
...
- remove `#include <cstdint>` from `poll.hpp`. To make it only contains
device-side code
- Fix compilation issue, which will cause pytest fail randomly. Reuse
the compiled result for same kernel with different arguments
2023-09-06 17:09:04 -07:00
Olli Saarikivi
828be48b21
Add Context and Endpoint classes to enable non-Communicator use-cases ( #166 )
...
This PR implements and closes #137 . The new `Endpoint` and `Context`
classes expose the connection establishing functionality from
`Communicator`, which now is only responsible for tying together the
bootstrapper with a context.
The largest breaking change here is that
`Communicator.connectOnSetup(...)` now returns the `Connection` wrapped
inside a `NonblockingFuture`. This is because with the way `Context` is
implemented a `Connection` is now fully initialized on construction.
Some smaller breaking API changes from this change are that
`RegisteredMemory` no longer has a `rank()` function (as there maybe no
concept of rank), and similarly `Connection` has no `remoteRank()` and
`tag()` functions. The latter are replaced by `remoteRankOf` and `tagOf`
functions in `Communicator`.
A new `EndpointConfig` class is introduced to avoid duplication of the
IB configuration parameters in the APIs of `Context` and `Communicator`.
The usual usage pattern of just passing in a `Transport` still works due
to an implicit conversion into `EndpointConfig`.
Miscellaneous changes:
-Cleans up how the PIMPL pattern is applied by making both the `Impl`
struct and the `pimpl_` pointers private for all relevant classes in the
core API.
-Enables ctest to be run from the build root directory.
2023-09-06 13:10:04 +08:00
Binyang2014
858e381829
Pytest ( #162 )
...
Port python tests to mscclpp.
Please run
`mpirun -tag-output -np 8 pytest ./python/test/test_mscclpp.py -x` to start pytest
---------
Co-authored-by: Saeed Maleki <saemal@microsoft.com >
Co-authored-by: Changho Hwang <changhohwang@microsoft.com >
Co-authored-by: Saeed Maleki <30272783+saeedmaleki@users.noreply.github.com >
2023-09-01 21:22:11 +08:00
Saeed Maleki
8d1b984bed
Change device handle interfaces & others ( #142 )
...
* Changed device handle interfaces
* Changed proxy service interfaces
* Move device code into separate files
* Fixed FIFO polling issues
* Add configuration arguments in several interface functions
---------
Co-authored-by: Changho Hwang <changhohwang@microsoft.com >
Co-authored-by: Binyang Li <binyli@microsoft.com >
Co-authored-by: root <root@a100-saemal0.qxveptpukjsuthqvv514inp03c.gx.internal.cloudapp.net >
2023-08-16 20:00:56 +08:00
Olli Saarikivi
4865b2017b
Add Python get_include() ( #141 )
...
Introduces a mscclpp.get_include() in the Python module.
The extension module is now named _mscclpp so that we can have
Python code in the mscclpp module.
Also does some miscellaneous cleanup.
2023-07-25 10:23:16 -07:00
Binyang2014
9a488f0da2
update python binding ( #136 )
...
update pythons binding for `device_handle`
2023-07-24 03:00:33 +00:00
Saeed Maleki
e7d5e652df
Python bindings ( #125 )
...
Co-authored-by: Olli Saarikivi <olsaarik@microsoft.com >
Co-authored-by: Changho Hwang <changhohwang@microsoft.com >
Co-authored-by: Binyang Li <binyli@microsoft.com >
2023-07-19 15:35:54 +08:00
Madan Musuvathi
c042d9af54
Merge branch 'cpp-api' into saemal/api-extension
2023-04-13 22:32:38 +00:00
Crutcher Dunnavant
272097fb9b
[python] switch to python setup.py build and wheels
2023-04-12 12:40:25 -07:00
Crutcher Dunnavant
d9077e5795
[python] switch to setup.py to build package
2023-04-12 12:29:17 -07:00
Felipe Petroski Such
516349c282
fix write_all
2023-04-12 12:29:17 -07:00
Crutcher Dunnavant
b93cfa3ca4
registeredmemory wip
2023-04-12 12:29:17 -07:00
Crutcher Dunnavant
19962a8002
format
2023-04-12 12:29:17 -07:00
Crutcher Dunnavant
b25bcf5f93
register buffers
2023-04-12 12:29:17 -07:00
Crutcher Dunnavant
00d382dbf7
format
2023-04-07 19:12:05 -07:00
Crutcher Dunnavant
34464b40bb
register buffers
2023-04-07 19:11:50 -07:00
Crutcher Dunnavant
44a8a539ad
types
2023-04-07 12:08:32 -07:00
Crutcher Dunnavant
d014693288
cleanup tests
2023-04-07 11:37:24 -07:00
Crutcher Dunnavant
68eff98bbc
update ci.sh
2023-04-07 11:27:45 -07:00