Crutcher Dunnavant
|
d014693288
|
cleanup tests
|
2023-04-07 11:37:24 -07:00 |
|
Crutcher Dunnavant
|
68eff98bbc
|
update ci.sh
|
2023-04-07 11:27:45 -07:00 |
|
Crutcher Dunnavant
|
e65def8657
|
bug
|
2023-04-07 11:27:45 -07:00 |
|
Crutcher Dunnavant
|
7753c38eb1
|
working on connect
|
2023-04-07 11:27:45 -07:00 |
|
Changho Hwang
|
b6ea0ca266
|
IB unit test (#47)
|
2023-04-07 21:45:14 +08:00 |
|
Changho Hwang
|
b7461facff
|
Fix Makefile
|
2023-04-07 13:09:56 +00:00 |
|
Changho Hwang
|
949a9cd0a3
|
Optional use of gdrcopy (#48)
Co-authored-by: Saeed Maleki <saemal@microsoft.com>
|
2023-04-07 13:36:59 +08:00 |
|
Saeed Maleki
|
6c1ebed569
|
combining ./python and ./ lint formats into makefile
|
2023-04-06 23:26:56 +00:00 |
|
Ziyue Yang
|
352a10a33d
|
NPKit: improve event collection for async requests (#45)
|
2023-04-06 16:21:34 +08:00 |
|
Saeed Maleki
|
cd3cd2c157
|
lint
|
2023-04-06 03:20:21 +00:00 |
|
Saeed Maleki
|
08275e93d7
|
added barrier API + pushed one after mscclppsetup
|
2023-04-06 03:15:54 +00:00 |
|
Saeed Maleki
|
ef851d2557
|
Merge pull request #44 from microsoft/crutcher-fixformat
Fixup formating for python
|
2023-04-05 16:28:37 -07:00 |
|
Crutcher Dunnavant
|
f7e330da21
|
Fixup formating for python
|
2023-04-05 21:47:08 +00:00 |
|
Saeed Maleki
|
fa7d2ad877
|
Merge pull request #43 from microsoft/crutcher-bootstrap
[python] Pull in bootstrap all gather, log callbacks.
|
2023-04-04 18:43:22 -07:00 |
|
Saeed Maleki
|
5fe87e5da6
|
lint fixes
|
2023-04-05 01:41:21 +00:00 |
|
Crutcher Dunnavant
|
151b29f70c
|
docs and format
|
2023-04-04 18:55:08 +00:00 |
|
Crutcher Dunnavant
|
659a88a767
|
remove env hook
|
2023-04-04 18:31:08 +00:00 |
|
Crutcher Dunnavant
|
aaf93c858d
|
extract level
|
2023-04-04 18:29:45 +00:00 |
|
Crutcher Dunnavant
|
0df50830e1
|
log callbacks
|
2023-04-04 17:57:46 +00:00 |
|
Crutcher Dunnavant
|
423affeaa6
|
all gather bytes, json, pickle
|
2023-04-03 23:39:06 +00:00 |
|
Crutcher Dunnavant
|
17e1885981
|
allocation fixes
|
2023-04-03 23:39:06 +00:00 |
|
Crutcher Dunnavant
|
8cac41c8ac
|
[python] working on bootstrap all gather bug
|
2023-04-03 23:39:06 +00:00 |
|
Saeed Maleki
|
2c6460ce72
|
bug fix for allgather0
|
2023-04-03 04:36:20 +00:00 |
|
Saeed Maleki
|
bfbdaf6b05
|
Merge pull request #41 from microsoft/saemal/allgather_hier
Saemal/allgather hier
|
2023-04-01 20:37:43 -07:00 |
|
Saeed Maleki
|
5ff64d36f4
|
documents for allgather2 + refactoring local allgather
|
2023-04-02 03:36:22 +00:00 |
|
Saeed Maleki
|
0887cfe768
|
no need for remapping anymore
|
2023-04-02 02:35:08 +00:00 |
|
Saeed Maleki
|
5cf3f3c524
|
Merge pull request #39 from microsoft/binyli/numabindAPI
bind proxy threads and host allocation are now done automatically and closest to the device
|
2023-04-01 19:16:43 -07:00 |
|
Saeed Maleki
|
1731911d00
|
removing extra stream and destroying created ones
|
2023-04-02 02:07:41 +00:00 |
|
Saeed Maleki
|
4c6616e7b9
|
lint
|
2023-04-01 19:20:50 +00:00 |
|
Saeed Maleki
|
8927dd4d72
|
great allgather numbers with the current binding mechanism
|
2023-04-01 18:54:42 +00:00 |
|
Saeed Maleki
|
701255959e
|
lint
|
2023-03-31 23:34:43 +00:00 |
|
Saeed Maleki
|
97dadd8d64
|
merged with main
|
2023-03-31 23:32:01 +00:00 |
|
Binyang Li
|
8e4edd4d83
|
fix lint
|
2023-03-31 08:32:45 +00:00 |
|
Binyang Li
|
af5825b474
|
bind numa node to communicator
|
2023-03-31 08:05:49 +00:00 |
|
Saeed Maleki
|
44e8760af7
|
allgather kernel2
|
2023-03-31 06:31:25 +00:00 |
|
Changho Hwang
|
fe1d7fee9e
|
Bug Fix: null-termination in logging
|
2023-03-31 05:25:07 +00:00 |
|
Saeed Maleki
|
fef0bff945
|
a third kernel for allgather cross-node
|
2023-03-30 23:24:04 +00:00 |
|
Saeed Maleki
|
29254439e5
|
Merge pull request #38 from microsoft/saemal/removing_gdrcopy
removing gdrcopy and adding flush functionality
|
2023-03-30 13:17:47 -07:00 |
|
Changho Hwang
|
b58eae4037
|
Minor changes
|
2023-03-30 07:11:41 +00:00 |
|
Saeed Maleki
|
e2cfd5ac83
|
a lot of documentation
|
2023-03-30 00:37:33 +00:00 |
|
Saeed Maleki
|
be5e422021
|
merged with main
|
2023-03-29 23:03:12 +00:00 |
|
Saeed Maleki
|
4be1dcc5d6
|
Merge pull request #37 from microsoft/saemal/removing_gdrcopy_flush
flush capability added in this PR
|
2023-03-29 15:44:30 -07:00 |
|
Saeed Maleki
|
629d59a9c0
|
bug fix -- flush doesn't need to increment the epoch
|
2023-03-29 22:21:24 +00:00 |
|
Saeed Maleki
|
debd110874
|
fused flush instructions
|
2023-03-29 22:17:02 +00:00 |
|
Saeed Maleki
|
42b11c5c9a
|
fast flush
|
2023-03-29 20:50:01 +00:00 |
|
Saeed Maleki
|
d97bee6973
|
flush mechanism
|
2023-03-29 17:31:20 +00:00 |
|
Bin Wang
|
7880be8ee2
|
Fix the 2 GiB limit in allgather test. (#36)
|
2023-03-29 19:02:43 +08:00 |
|
Saeed Maleki
|
7a0962e4be
|
Merge pull request #35 from microsoft/binyli/refactor
Refactor mscclppProxyService
|
2023-03-28 13:46:16 -07:00 |
|
Binyang Li
|
d725e45f13
|
fix
|
2023-03-28 14:53:08 +00:00 |
|
Binyang Li
|
9c633a9633
|
bug fix
|
2023-03-28 14:40:51 +00:00 |
|