Saeed Maleki
|
82c27625e6
|
ipc uses a base ptr now
|
2023-04-27 21:33:15 +00:00 |
|
Changho Hwang
|
08e80f1754
|
IB: completely replaced with C++ interfaces
|
2023-04-27 04:01:46 +00:00 |
|
Olli Saarikivi
|
83c7ba1afb
|
C++ API working, allgather_test_cpp passing
|
2023-04-19 17:11:21 +00:00 |
|
Olli Saarikivi
|
65597e1f63
|
Fix a copy-paste mistake
|
2023-04-14 23:35:10 +00:00 |
|
Olli Saarikivi
|
46790d79e8
|
Implement C API buffer registration support
|
2023-04-14 23:20:42 +00:00 |
|
Changho Hwang
|
7c2108d135
|
fix
|
2023-04-12 18:05:27 +00:00 |
|
Changho Hwang
|
1d3ea7bb83
|
fix
|
2023-04-12 17:25:54 +00:00 |
|
Changho Hwang
|
ca1f803692
|
Rename remote MR infos
|
2023-04-12 09:33:14 +00:00 |
|
Changho Hwang
|
dd0883b84f
|
Lint
|
2023-04-12 09:25:35 +00:00 |
|
Changho Hwang
|
63a5be6953
|
Move ibQp to mscclppHostIBConn
|
2023-04-12 09:20:05 +00:00 |
|
Changho Hwang
|
bc729cd481
|
Move MRs / MR infos to mscclppHostIBConn & cleanup
|
2023-04-12 09:05:42 +00:00 |
|
Changho Hwang
|
fd3f928108
|
remove hostFifo & rename devFifo to just fifo
|
2023-04-12 08:08:19 +00:00 |
|
Saeed Maleki
|
edc3c237ed
|
deleteing hostconn
|
2023-04-12 04:09:12 +00:00 |
|
Madan Musuvathi
|
9124856ea4
|
first version hostConn
|
2023-04-12 01:36:06 +00:00 |
|
Changho Hwang
|
7a0e64813a
|
Add fifo for host connections
|
2023-04-11 12:28:45 +00:00 |
|
Changho Hwang
|
35acdf796c
|
Add mscclppProxyFifo
|
2023-04-11 11:28:40 +00:00 |
|
Changho Hwang
|
d2c2ae72a7
|
Some cleanup
|
2023-04-11 08:45:22 +00:00 |
|
Saeed Maleki
|
b6179224aa
|
lint
|
2023-04-11 01:36:37 +00:00 |
|
Saeed Maleki
|
48102a0858
|
removing unnecessary flags
|
2023-04-11 01:22:40 +00:00 |
|
Changho Hwang
|
a1ae982c61
|
Merge signalEpochId with proxySignalEpochId
|
2023-04-10 14:05:25 +00:00 |
|
Saeed Maleki
|
426e78997c
|
name changes + documentation for clarity
|
2023-04-09 02:20:54 +00:00 |
|
Felipe Petroski Such
|
cc8c30f958
|
error checking
|
2023-04-07 15:38:48 -07:00 |
|
Felipe Petroski Such
|
38cd87cdcc
|
add memory region functions
|
2023-04-07 15:38:48 -07:00 |
|
Changho Hwang
|
949a9cd0a3
|
Optional use of gdrcopy (#48)
Co-authored-by: Saeed Maleki <saemal@microsoft.com>
|
2023-04-07 13:36:59 +08:00 |
|
Saeed Maleki
|
cd3cd2c157
|
lint
|
2023-04-06 03:20:21 +00:00 |
|
Saeed Maleki
|
08275e93d7
|
added barrier API + pushed one after mscclppsetup
|
2023-04-06 03:15:54 +00:00 |
|
Saeed Maleki
|
1731911d00
|
removing extra stream and destroying created ones
|
2023-04-02 02:07:41 +00:00 |
|
Saeed Maleki
|
4c6616e7b9
|
lint
|
2023-04-01 19:20:50 +00:00 |
|
Saeed Maleki
|
8927dd4d72
|
great allgather numbers with the current binding mechanism
|
2023-04-01 18:54:42 +00:00 |
|
Binyang Li
|
af5825b474
|
bind numa node to communicator
|
2023-03-31 08:05:49 +00:00 |
|
Saeed Maleki
|
be5e422021
|
merged with main
|
2023-03-29 23:03:12 +00:00 |
|
Binyang2014
|
62279b0063
|
Add mscclppSetBootstrapConnTimeout (#34)
|
2023-03-28 14:01:56 +08:00 |
|
Saeed Maleki
|
33af4bfb67
|
no gdr copy anywhere in the code except for the files that are not compiled
|
2023-03-28 05:36:31 +00:00 |
|
Saeed Maleki
|
d9ba953fb0
|
gdrcopy is not initialized
|
2023-03-28 04:56:06 +00:00 |
|
Saeed Maleki
|
e7cccbf897
|
both head and tail are on OK to be only used by GPU
|
2023-03-28 04:26:39 +00:00 |
|
Saeed Maleki
|
952d852256
|
both head and tail are on OK to be only used by GPU
|
2023-03-28 04:24:45 +00:00 |
|
Saeed Maleki
|
43c52367fb
|
merged with main and simplified the callback requirements
|
2023-03-27 23:41:27 +00:00 |
|
Saeed Maleki
|
19bf369dc1
|
link format correction
|
2023-03-27 20:40:15 +00:00 |
|
Changho Hwang
|
8fc8f5b4fe
|
Lint
|
2023-03-27 14:09:26 +00:00 |
|
Changho Hwang
|
8e4146aba9
|
Add mscclppSetLogHandler
|
2023-03-27 13:33:07 +00:00 |
|
Saeed Maleki
|
0898214f0a
|
added mscclppGetErrorString
|
2023-03-24 22:57:14 +00:00 |
|
Saeed Maleki
|
3fb9383621
|
Merge pull request #24 from microsoft/madanm-apipush
simplified API for CUDA level communication calls.
|
2023-03-24 10:42:07 -07:00 |
|
Saeed Maleki
|
56b599b5e7
|
a bit of api change and clean up on docs
|
2023-03-24 17:41:04 +00:00 |
|
Changho Hwang
|
551eae0ba1
|
Update docs
|
2023-03-24 09:28:12 +00:00 |
|
Ziyue Yang
|
f92b428cba
|
Port NPKit
|
2023-03-24 06:41:16 +00:00 |
|
Changho Hwang
|
05fde6c6f3
|
minor changes
|
2023-03-24 04:51:20 +00:00 |
|
Saeed Maleki
|
777e93ee47
|
merged with main
|
2023-03-24 02:35:15 +00:00 |
|
Madan Musuvathi
|
e6ee81e4fa
|
fixed the order of remote rank and tag in mscclppConnect API
|
2023-03-23 21:09:04 +00:00 |
|
Madan Musuvathi
|
72edabe2a6
|
added GetDevConn api to retrieve a connection from remoteRank and tag
|
2023-03-23 21:03:30 +00:00 |
|
Madan Musuvathi
|
896539b236
|
Comm owns all state including devcons
|
2023-03-22 22:43:32 +00:00 |
|