2-node AllReduce improvements (#118)

* Added `get()` interfaces to `SmChannel`
* Improved 2-node (8 gpus/node) AllReduce: algbw 139GB/s for 1GB (kernel
3) and 99GB/s for 48MB (kernel 4)
* Fixed a FIFO perf bug
* Several fixes & validations in mscclpp-test

---------

Co-authored-by: Binyang Li <binyli@microsoft.com>
Co-authored-by: Saeed Maleki <saemal@microsoft.com>
This commit is contained in:
Changho Hwang
2023-07-07 15:05:46 +08:00
committed by GitHub
parent 6ec585f3d8
commit bb7b85a810
16 changed files with 1171 additions and 133 deletions

View File

@@ -14,7 +14,9 @@ jobs:
uses: actions/checkout@v3
- name: Install ClangFormat
run: sudo apt-get install -y clang-format
run: |
sudo apt-get update
sudo apt-get install -y clang-format
- name: Run cpplint
run: |