Files
mscclpp/include
Binyang Li d1869011c2 Add device semaphore API (#523)
Add deviceSemaphore structure, implement a new NVLS based algo to show
how to use these APIs. Current perf for NVLS non-zero copy version is:
```
#
#                                                              out-of-place                       in-place          
#       size         count      type   redop    root     time   algbw   busbw #wrong     time   algbw   busbw #wrong
#        (B)    (elements)                               (us)  (GB/s)  (GB/s)            (us)  (GB/s)  (GB/s)       
        1024           512      half     sum      -1     6.10    0.17    0.29      0     5.65    0.18    0.32      0
        2048          1024      half     sum      -1     5.94    0.35    0.60      0     5.85    0.35    0.61      0
        4096          2048      half     sum      -1     6.11    0.67    1.17      0     5.97    0.69    1.20      0
        8192          4096      half     sum      -1     6.22    1.32    2.31      0     6.17    1.33    2.33      0
       16384          8192      half     sum      -1     6.68    2.45    4.29      0     6.52    2.51    4.39      0
       32768         16384      half     sum      -1     8.02    4.09    7.15      0     7.66    4.28    7.49      0
       65536         32768      half     sum      -1     8.09    8.10   14.18      0     7.91    8.29   14.51      0
      131072         65536      half     sum      -1     9.58   13.68   23.93      0     9.61   13.64   23.86      0
      262144        131072      half     sum      -1    12.60   20.81   36.42      0    12.28   21.35   37.37      0
      524288        262144      half     sum      -1    14.51   36.12   63.22      0    14.09   37.21   65.12      0
     1048576        524288      half     sum      -1    19.45   53.92   94.36      0    19.29   54.35   95.12      0
     2097152       1048576      half     sum      -1    31.00   67.66  118.40      0    30.80   68.08  119.14      0
     4194304       2097152      half     sum      -1    44.71   93.80  164.16      0    44.66   93.91  164.34      0
     8388608       4194304      half     sum      -1    62.96  133.24  233.17      0    62.49  134.24  234.91      0
    16777216       8388608      half     sum      -1    105.1  159.68  279.45      0    104.4  160.74  281.29      0
    33554432      16777216      half     sum      -1    169.9  197.55  345.71      0    169.8  197.64  345.87      0
    67108864      33554432      half     sum      -1    298.1  225.12  393.96      0    298.1  225.09  393.91      0
   134217728      67108864      half     sum      -1    552.9  242.77  424.84      0    553.7  242.39  424.18      0
   268435456     134217728      half     sum      -1   1055.8  254.24  444.91      0   1056.9  253.98  444.47      0
   536870912     268435456      half     sum      -1   2040.1  263.15  460.52      0   2045.1  262.52  459.40      0
  1073741824     536870912      half     sum      -1   3996.9  268.65  470.13      0   4007.7  267.92  468.86      0
```
2025-05-20 09:32:38 -07:00
..
2025-05-20 09:32:38 -07:00
2023-07-25 10:23:16 -07:00