Binyang2014
|
56bdbc2f32
|
Enable test for both cuda11 and cuda12 (#124)
Update pipeline: enable test for both cuda11 and cuda12
|
2023-07-10 13:19:14 +08:00 |
|
Changho Hwang
|
bb7b85a810
|
2-node AllReduce improvements (#118)
* Added `get()` interfaces to `SmChannel`
* Improved 2-node (8 gpus/node) AllReduce: algbw 139GB/s for 1GB (kernel
3) and 99GB/s for 48MB (kernel 4)
* Fixed a FIFO perf bug
* Several fixes & validations in mscclpp-test
---------
Co-authored-by: Binyang Li <binyli@microsoft.com>
Co-authored-by: Saeed Maleki <saemal@microsoft.com>
|
2023-07-07 07:05:46 +00:00 |
|
Binyang2014
|
2640578b22
|
Add performance check for mscclpp-test (#110)
- Add ndmv4 perf baseline
- change mscclpp-test to output perf number into a json file
- add python script to check the perf result with the baseline
|
2023-06-21 07:42:53 +00:00 |
|
Binyang2014
|
8efacae332
|
update pipeline (#103)
Update Azure pipeline:
- Using mscclpp:base-cuda12.1 image for building and testing
- Add mp-ut tests for multi-nodes
|
2023-06-14 20:14:57 +08:00 |
|