Binyang Li
163cba08c8
Update interface to let user change fifo size ( #243 )
...
Related with this issue:
https://github.com/microsoft/mscclpp/issues/242 . The user may use more
threads than the number specified in `fifo_size` to interact with the
FIFO. In this case, there will be unexpected behavior.
Update the interface to let user change fifo size on their demands.
2024-01-09 22:14:36 -08:00
Changho Hwang
544ff0c21d
ROCm support ( #213 )
...
Co-authored-by: Binyang Li <binyli@microsoft.com >
2023-11-24 16:41:56 +08:00
Saeed Maleki
8d1b984bed
Change device handle interfaces & others ( #142 )
...
* Changed device handle interfaces
* Changed proxy service interfaces
* Move device code into separate files
* Fixed FIFO polling issues
* Add configuration arguments in several interface functions
---------
Co-authored-by: Changho Hwang <changhohwang@microsoft.com >
Co-authored-by: Binyang Li <binyli@microsoft.com >
Co-authored-by: root <root@a100-saemal0.qxveptpukjsuthqvv514inp03c.gx.internal.cloudapp.net >
2023-08-16 20:00:56 +08:00
Changho Hwang
4114d65c60
Documents & minor updates ( #119 )
...
Co-authored-by: Saeed Maleki <saemal@microsoft.com >
Co-authored-by: Binyang Li <binyli@microsoft.com >
2023-07-07 17:35:05 +08:00
Changho Hwang
6ec585f3d8
Packet copy for IB ( #109 )
...
* Extend channels to support LL with IB
* Rename classes and interfaces
2023-06-28 10:39:31 -07:00
Changho Hwang
21eed722af
Add license comments ( #106 )
2023-06-25 12:40:12 +08:00
Changho Hwang
798631bd52
Update unit tests ( #81 )
2023-06-08 09:58:05 +00:00
Changho Hwang
9cee6c4a74
Cleanup old files and functions ( #86 )
2023-06-01 17:34:57 +08:00
Binyang2014
a3cf48cc5d
Rewrite mscclpp-test with cpp style API ( #77 )
...
- Rewrite mscclpp-test with cpp style API
- Add SM copy
- add new sendRecv test
2023-05-19 14:14:19 +08:00
Olli Saarikivi
9f6c48cbf9
Format all files
2023-05-11 00:23:14 +00:00
Olli Saarikivi
ccf45b33a2
Delete old init code and other C-style code
2023-05-10 22:03:42 +00:00
Changho Hwang
08e80f1754
IB: completely replaced with C++ interfaces
2023-04-27 04:01:46 +00:00
Changho Hwang
dd0883b84f
Lint
2023-04-12 09:25:35 +00:00
Changho Hwang
bc729cd481
Move MRs / MR infos to mscclppHostIBConn & cleanup
2023-04-12 09:05:42 +00:00
Changho Hwang
fd3f928108
remove hostFifo & rename devFifo to just fifo
2023-04-12 08:08:19 +00:00
Madan Musuvathi
9124856ea4
first version hostConn
2023-04-12 01:36:06 +00:00
Changho Hwang
7a0e64813a
Add fifo for host connections
2023-04-11 12:28:45 +00:00
Changho Hwang
35acdf796c
Add mscclppProxyFifo
2023-04-11 11:28:40 +00:00
Saeed Maleki
ee6c2deb44
Merge branch 'main' into saemal/api-extension
2023-04-11 01:43:13 +00:00
Saeed Maleki
b6179224aa
lint
2023-04-11 01:36:37 +00:00
Saeed Maleki
48102a0858
removing unnecessary flags
2023-04-11 01:22:40 +00:00
Changho Hwang
a1ae982c61
Merge signalEpochId with proxySignalEpochId
2023-04-10 14:05:25 +00:00
Saeed Maleki
426e78997c
name changes + documentation for clarity
2023-04-09 02:20:54 +00:00
Ziyue Yang
5f0b58abda
fix lint
2023-04-08 07:16:32 +00:00
Ziyue Yang
09de60854e
fix lint
2023-04-08 07:15:25 +00:00
Ziyue Yang
748d3d1596
separate flag and data
2023-04-08 07:12:46 +00:00
Ziyue Yang
f68eeba2d4
change clock collection approach
2023-04-08 05:29:34 +00:00
Changho Hwang
949a9cd0a3
Optional use of gdrcopy ( #48 )
...
Co-authored-by: Saeed Maleki <saemal@microsoft.com >
2023-04-07 13:36:59 +08:00
Ziyue Yang
352a10a33d
NPKit: improve event collection for async requests ( #45 )
2023-04-06 16:21:34 +08:00
Saeed Maleki
1731911d00
removing extra stream and destroying created ones
2023-04-02 02:07:41 +00:00
Saeed Maleki
4c6616e7b9
lint
2023-04-01 19:20:50 +00:00
Saeed Maleki
8927dd4d72
great allgather numbers with the current binding mechanism
2023-04-01 18:54:42 +00:00
Binyang Li
af5825b474
bind numa node to communicator
2023-03-31 08:05:49 +00:00
Changho Hwang
b58eae4037
Minor changes
2023-03-30 07:11:41 +00:00
Saeed Maleki
e2cfd5ac83
a lot of documentation
2023-03-30 00:37:33 +00:00
Binyang Li
d725e45f13
fix
2023-03-28 14:53:08 +00:00
Binyang Li
9c633a9633
bug fix
2023-03-28 14:40:51 +00:00
Binyang Li
487030887b
refactor
2023-03-28 12:22:43 +00:00
Saeed Maleki
17e144c774
a typo in p2p proxy
2023-03-28 08:07:54 +00:00
Saeed Maleki
81b18cd9f9
a bit of clean up
2023-03-28 06:08:12 +00:00
Saeed Maleki
fa26bdd9fc
no gdr copy anywhere in the code except for the files that are not compiled
2023-03-28 05:40:40 +00:00
Saeed Maleki
33af4bfb67
no gdr copy anywhere in the code except for the files that are not compiled
2023-03-28 05:36:31 +00:00
Saeed Maleki
d9ba953fb0
gdrcopy is not initialized
2023-03-28 04:56:06 +00:00
Saeed Maleki
952d852256
both head and tail are on OK to be only used by GPU
2023-03-28 04:24:45 +00:00
Ziyue Yang
b234cf5012
NPKit: add DMA events and fix bandwidth calculation ( #33 )
2023-03-28 09:58:32 +08:00
Saeed Maleki
19bf369dc1
link format correction
2023-03-27 20:40:15 +00:00
Saeed Maleki
3fb9383621
Merge pull request #24 from microsoft/madanm-apipush
...
simplified API for CUDA level communication calls.
2023-03-24 10:42:07 -07:00
Ziyue Yang
f92b428cba
Port NPKit
2023-03-24 06:41:16 +00:00
Saeed Maleki
777e93ee47
merged with main
2023-03-24 02:35:15 +00:00
Saeed Maleki
b75f9e6d8a
implementing new API
2023-03-22 00:29:10 +00:00