Commit Graph

12 Commits

Author SHA1 Message Date
Copilot
93f6eeaa6b Remove GTest dependency, add code coverage, and refactor unit tests and CI pipelines (#744)
- Removes the GTest dependency, replacing it with a minimal custom
framework (`test/framework.*`) that covers only what the tests actually
use — a unified `TEST()` macro with SFINAE-based fixture auto-detection,
`EXPECT_*`/`ASSERT_*` assertions, environments, and setup/teardown.
- `--exclude-perf-tests` flag and substring-based negative filtering
- `MSCCLPP_ENABLE_COVERAGE` CMake option with gcov/lcov; CI uploads to
Codecov
- Merges standalone `test/perf/` into main test targets
- Refactors Azure pipelines to reduce redundancies & make more readable

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Changho Hwang <changhohwang@microsoft.com>
2026-03-24 23:34:38 -04:00
Binyang Li
25435acf5d Add new algos for GB200 (#747)
- Add new algos (allreduce_rsag, allreduce_rsag_pipeline and
allreduce_rsag_zero_copy) for GB200.
- Add IB stub for non-IB env
- Provides example for algorithm tunning with different nblocks/nthreads

Perf for allreduce_rsag
```
#                                                              out-of-place                       in-place          
#       size         count      type   redop    root     time   algbw   busbw  #wrong     time   algbw   busbw  #wrong 
#        (B)    (elements)                               (us)  (GB/s)  (GB/s)             (us)  (GB/s)  (GB/s)         
     1048576        262144     float     sum      -1    25.16   41.67   62.51       0    23.73   44.18   66.27       0
     2097152        524288     float     sum      -1    26.06   80.47  120.71       0    25.31   82.86  124.29       0
     4194304       1048576     float     sum      -1    31.09  134.93  202.39       0    30.75  136.39  204.58       0
     8388608       2097152     float     sum      -1    45.52  184.29  276.43       0    45.13  185.87  278.80       0
    16777216       4194304     float     sum      -1    75.73  221.53  332.30       0    75.51  222.18  333.27       0
    33554432       8388608     float     sum      -1   137.25  244.48  366.72       0   137.22  244.54  366.81       0
    67108864      16777216     float     sum      -1   271.34  247.32  370.99       0   270.86  247.76  371.65       0
   134217728      33554432     float     sum      -1   534.25  251.22  376.84       0   534.43  251.14  376.71       0
# Out of bounds values : 0 OK
# Avg bus bandwidth    : 264.454 
#
# Collective test concluded: all_reduce_perf
```

perf for allreduce_rsag_pipeline
```
#                                                              out-of-place                       in-place          
#       size         count      type   redop    root     time   algbw   busbw  #wrong     time   algbw   busbw  #wrong 
#        (B)    (elements)                               (us)  (GB/s)  (GB/s)             (us)  (GB/s)  (GB/s)         
     1048576        262144     float     sum      -1    61.57   17.03   25.55       0    61.51   17.05   25.57       0
     2097152        524288     float     sum      -1    61.31   34.20   51.31       0    61.23   34.25   51.38       0
     4194304       1048576     float     sum      -1    61.62   68.06  102.10       0    61.84   67.83  101.74       0
     8388608       2097152     float     sum      -1    61.97  135.37  203.06       0    61.89  135.53  203.30       0
    16777216       4194304     float     sum      -1    63.15  265.65  398.48       0    62.89  266.76  400.15       0
    33554432       8388608     float     sum      -1   100.63  333.46  500.19       0    99.76  336.34  504.51       0
    67108864      16777216     float     sum      -1   180.04  372.75  559.13       0   179.75  373.34  560.01       0
   134217728      33554432     float     sum      -1   339.60  395.23  592.84       0   338.16  396.91  595.36       0
# Out of bounds values : 0 OK
# Avg bus bandwidth    : 304.665 
#
# Collective test concluded: all_reduce_perf
```

perf for allreduce_rsag_zero_copy
```
#                                                              out-of-place                       in-place          
#       size         count      type   redop    root     time   algbw   busbw  #wrong     time   algbw   busbw  #wrong 
#        (B)    (elements)                               (us)  (GB/s)  (GB/s)             (us)  (GB/s)  (GB/s)         
     1048576        262144     float     sum      -1    14.99   69.93  104.90       0    14.44   72.61  108.92       0
     2097152        524288     float     sum      -1    16.19  129.56  194.33       0    15.85  132.32  198.48       0
     4194304       1048576     float     sum      -1    21.19  197.98  296.97       0    20.64  203.20  304.81       0
     8388608       2097152     float     sum      -1    31.04  270.27  405.41       0    30.68  273.44  410.16       0
    16777216       4194304     float     sum      -1    50.34  333.26  499.89       0    50.15  334.51  501.77       0
    33554432       8388608     float     sum      -1    89.58  374.56  561.84       0    88.65  378.48  567.73       0
    67108864      16777216     float     sum      -1   165.69  405.03  607.54       0   163.64  410.10  615.16       0
   134217728      33554432     float     sum      -1   323.19  415.28  622.93       0   318.01  422.05  633.07       0
# Out of bounds values : 0 OK
# Avg bus bandwidth    : 414.619 
#
# Collective test concluded: all_reduce_perf
```

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>
Co-authored-by: Qinghua Zhou <qinghuazhou@microsoft.com>
Co-authored-by: Caio Rocha <caiorocha@microsoft.com>
2026-02-24 16:43:23 -08:00
Binyang Li
9eb958183c upgrade codeql to v3 (#676) 2025-11-06 16:58:19 -08:00
Changho Hwang
f7d1fb4492 Exclude irrelevant files from workflow triggers (#663) 2025-10-23 15:52:19 -07:00
Changho Hwang
def68ced64 Add CUDA 12.8 images (#488) 2025-03-29 00:31:26 +00:00
Binyang Li
f18a440feb trigger ci for release branches (#426) 2024-12-21 00:05:13 +00:00
Changho Hwang
2127a3ba29 Improve CMake options (#376)
* Let all CMake option names start with `MSCCLPP_`
* Explain the `MSCCLPP_BUILD_PYTHON_BINDINGS` option in readme

---------

Co-authored-by: Binyang Li <binyli@microsoft.com>
2024-11-22 01:54:11 +00:00
Changho Hwang
8a330f9135 Update ROCm CI (#357)
Co-authored-by: Binyang Li <binyli@microsoft.com>
2024-09-20 17:57:02 +00:00
Changho Hwang
5fa5bd2706 Check nvidia_peermem during runtime (#234) 2023-12-25 12:02:10 +08:00
Changho Hwang
544ff0c21d ROCm support (#213)
Co-authored-by: Binyang Li <binyli@microsoft.com>
2023-11-24 16:41:56 +08:00
Changho Hwang
dab19e00c1 Templatize Dockerfiles & update workflows (#223)
Now build images by a script with a shared Dockerfile template

---------

Co-authored-by: Binyang Li <binyli@microsoft.com>
Co-authored-by: Saeed Maleki <saemal@microsoft.com>
2023-11-22 13:29:12 -08:00
Changho Hwang
f68820436c Explicit build dependency on nvidia_peermem (#201) 2023-10-23 04:29:30 +00:00