Changho Hwang
|
8c0f9e84d0
|
v0.3.0 (#171)
|
2023-10-11 22:35:54 +08:00 |
|
Changho Hwang
|
497a9e0c82
|
Add backup workflows (#189)
|
2023-10-07 15:13:49 +08:00 |
|
Saeed Maleki
|
e7d5e652df
|
Python bindings (#125)
Co-authored-by: Olli Saarikivi <olsaarik@microsoft.com>
Co-authored-by: Changho Hwang <changhohwang@microsoft.com>
Co-authored-by: Binyang Li <binyli@microsoft.com>
|
2023-07-19 15:35:54 +08:00 |
|
Changho Hwang
|
bb7b85a810
|
2-node AllReduce improvements (#118)
* Added `get()` interfaces to `SmChannel`
* Improved 2-node (8 gpus/node) AllReduce: algbw 139GB/s for 1GB (kernel
3) and 99GB/s for 48MB (kernel 4)
* Fixed a FIFO perf bug
* Several fixes & validations in mscclpp-test
---------
Co-authored-by: Binyang Li <binyli@microsoft.com>
Co-authored-by: Saeed Maleki <saemal@microsoft.com>
|
2023-07-07 07:05:46 +00:00 |
|
Binyang2014
|
2640578b22
|
Add performance check for mscclpp-test (#110)
- Add ndmv4 perf baseline
- change mscclpp-test to output perf number into a json file
- add python script to check the perf result with the baseline
|
2023-06-21 07:42:53 +00:00 |
|
Changho Hwang
|
5a4885ccbb
|
Misc updates (#95)
|
2023-06-12 13:53:43 +08:00 |
|
Changho Hwang
|
798631bd52
|
Update unit tests (#81)
|
2023-06-08 09:58:05 +00:00 |
|
Changho Hwang
|
8d54bf3301
|
Update CI (#79)
|
2023-05-21 11:45:41 -07:00 |
|