Saeed Maleki
91d592dcc0
NVLS support. ( #250 )
...
Co-authored-by: Saeed Maleki <saemal@microsoft.com >
Co-authored-by: Binyang Li <binyli@microsoft.com >
Co-authored-by: Changho Hwang <changhohwang@microsoft.com >
2024-02-04 20:46:10 -08:00
Saeed Maleki
c4785c9591
Improve debugging messages ( #195 )
...
Debugging information to understand what connections are being made.
---------
Co-authored-by: Changho Hwang <changhohwang@microsoft.com >
2023-10-13 16:55:52 +08:00
Olli Saarikivi
828be48b21
Add Context and Endpoint classes to enable non-Communicator use-cases ( #166 )
...
This PR implements and closes #137 . The new `Endpoint` and `Context`
classes expose the connection establishing functionality from
`Communicator`, which now is only responsible for tying together the
bootstrapper with a context.
The largest breaking change here is that
`Communicator.connectOnSetup(...)` now returns the `Connection` wrapped
inside a `NonblockingFuture`. This is because with the way `Context` is
implemented a `Connection` is now fully initialized on construction.
Some smaller breaking API changes from this change are that
`RegisteredMemory` no longer has a `rank()` function (as there maybe no
concept of rank), and similarly `Connection` has no `remoteRank()` and
`tag()` functions. The latter are replaced by `remoteRankOf` and `tagOf`
functions in `Communicator`.
A new `EndpointConfig` class is introduced to avoid duplication of the
IB configuration parameters in the APIs of `Context` and `Communicator`.
The usual usage pattern of just passing in a `Transport` still works due
to an implicit conversion into `EndpointConfig`.
Miscellaneous changes:
-Cleans up how the PIMPL pattern is applied by making both the `Impl`
struct and the `pimpl_` pointers private for all relevant classes in the
core API.
-Enables ctest to be run from the build root directory.
2023-09-06 13:10:04 +08:00
Saeed Maleki
8d1b984bed
Change device handle interfaces & others ( #142 )
...
* Changed device handle interfaces
* Changed proxy service interfaces
* Move device code into separate files
* Fixed FIFO polling issues
* Add configuration arguments in several interface functions
---------
Co-authored-by: Changho Hwang <changhohwang@microsoft.com >
Co-authored-by: Binyang Li <binyli@microsoft.com >
Co-authored-by: root <root@a100-saemal0.qxveptpukjsuthqvv514inp03c.gx.internal.cloudapp.net >
2023-08-16 20:00:56 +08:00
Saeed Maleki
e7d5e652df
Python bindings ( #125 )
...
Co-authored-by: Olli Saarikivi <olsaarik@microsoft.com >
Co-authored-by: Changho Hwang <changhohwang@microsoft.com >
Co-authored-by: Binyang Li <binyli@microsoft.com >
2023-07-19 15:35:54 +08:00
Changho Hwang
21eed722af
Add license comments ( #106 )
2023-06-25 12:40:12 +08:00
Changho Hwang
60b3dd5a61
Bug fixes & resolve warnings ( #107 )
...
* Fix a bug in host hashing
* Fix a bug in `HostEpoch::wait()`
* Remove misc warnings
2023-06-16 09:31:23 +00:00
Changho Hwang
c4a5958dfc
Fix hanging bootstrap issues ( #100 )
...
* Renew socket interfaces and error handling into C++ style
* Fix bootstrap hanging bugs
* Misc code cleanup
---------
Co-authored-by: Binyang Li <binyli@microsoft.com >
Co-authored-by: Saeed Maleki <saemal@microsoft.com >
2023-06-15 11:29:49 +08:00
Changho Hwang
9cee6c4a74
Cleanup old files and functions ( #86 )
2023-06-01 17:34:57 +08:00
Olli Saarikivi
4e4d1972e3
Cuda smart pointers
2023-05-16 16:16:00 -07:00
Saeed Maleki
2691784b88
working -- at least for single node
2023-05-12 20:21:58 +00:00
Olli Saarikivi
9f6c48cbf9
Format all files
2023-05-11 00:23:14 +00:00
Olli Saarikivi
ccf45b33a2
Delete old init code and other C-style code
2023-05-10 22:03:42 +00:00
Olli Saarikivi
beaf2aea39
Move public headers under include/
2023-05-10 20:46:49 +00:00
Olli Saarikivi
75a2af8de2
Add GoogleTest with CTest integration + some tests
...
Also rename addSetup to onSetup to unify naming.
2023-05-10 18:46:55 +00:00
Saeed Maleki
1769138568
Host Epoch + Error code
2023-05-09 23:10:12 +00:00
Binyang2014
8650dbaff8
Add exception class for mscclpp ( #67 )
...
Add exception class for mscclpp
2023-05-06 16:27:25 +08:00
Olli Saarikivi
4a41c19e72
Fix performance bug and base pointer offset
2023-05-03 19:40:23 +00:00
Olli Saarikivi
39666f999f
Quick fix
2023-05-03 19:20:45 +00:00
Saeed Maleki
54d1e1872c
testing writes with signal is passing
2023-05-02 23:53:31 +00:00
Saeed Maleki
a4e6ffe2bc
epoch creation
2023-05-02 21:39:43 +00:00
Olli Saarikivi
358c3d62b8
Generalize connectionSetup() into setup()
2023-05-02 20:06:30 +00:00
Olli Saarikivi
04e878489d
Work on a channel service
2023-04-28 22:50:38 +00:00
Binyang Li
750c40b987
Fix
2023-04-28 10:48:56 +00:00
Binyang Li
cbefe38fd4
aad conn write test
2023-04-28 09:12:21 +00:00
Saeed Maleki
cbfc21851d
registered buffer test
2023-04-27 22:25:03 +00:00
Saeed Maleki
82c27625e6
ipc uses a base ptr now
2023-04-27 21:33:15 +00:00
Saeed Maleki
afc5887da2
moving the debug info into other levels
2023-04-27 20:32:06 +00:00
Saeed Maleki
e18e26dcc7
tests for host hash
2023-04-27 20:09:47 +00:00
Saeed Maleki
aaa3f0e945
host hashes in communicator
2023-04-27 19:17:19 +00:00
Olli Saarikivi
06c6df2350
Separate out Transport and TransportFlags
2023-04-27 19:06:35 +00:00
Saeed Maleki
8eda6369ee
testing connection setup
2023-04-27 06:08:35 +00:00
Changho Hwang
b0c7e86909
Communicator owns IB contexts
2023-04-27 05:01:07 +00:00
Saeed Maleki
c24896b62f
bootstrap to the communicator
2023-04-27 04:23:44 +00:00
Saeed Maleki
7913d90158
Merge branch 'olli/api-extension' of https://github.com/microsoft/mscclpp into olli/api-extension
2023-04-27 04:16:41 +00:00
Saeed Maleki
7641038246
wip
2023-04-27 04:15:24 +00:00
Changho Hwang
08e80f1754
IB: completely replaced with C++ interfaces
2023-04-27 04:01:46 +00:00
Olli Saarikivi
47d4606f13
Add registerMemory
2023-04-27 00:33:24 +00:00
Olli Saarikivi
7c87ca3005
Missing functions and TODOs
2023-04-27 00:01:38 +00:00
Olli Saarikivi
5443ed1ec2
ConnectionSetup stuff
2023-04-26 18:07:17 +00:00
Olli Saarikivi
d746201287
WIP builds, but doesn't link
2023-04-26 17:46:47 +00:00
Olli Saarikivi
e4ee2eba25
WIP Connection in C++
2023-04-25 00:41:45 +00:00
Changho Hwang
35ade686ff
IB in cpp style WIP
2023-04-23 14:47:07 +00:00
Olli Saarikivi
9fbb0debdd
C++ API changes
2023-04-19 22:02:23 +00:00
Olli Saarikivi
83c7ba1afb
C++ API working, allgather_test_cpp passing
2023-04-19 17:11:21 +00:00
Olli Saarikivi
a0f1d36026
Start HostConnection implementation
...
Add declarations on the C-side for functions to enable multiple buffer
registrations per connection.
2023-04-14 15:57:47 +00:00
Olli Saarikivi
45172bec88
Implement mscclpp::Communicator using C-style API
2023-04-14 14:21:53 +00:00
Olli Saarikivi
0eec1d438b
Move over C++ API work to new branch
2023-04-13 18:38:38 +00:00