mirror of https://github.com/microsoft/mscclpp.git synced 2026-05-11 17:00:22 +00:00

Files

Binyang Li a707273701 Torch integration (#692 )

Reorganize current native algorithm implementation and DSL algorithm
implementation.
Provide unified API for DSL algo and native algo and provide interface
to tune the algo
Provide interface for pytorch integration with native API and DSL

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>

2026-01-21 20:32:24 -08:00

2.1 KiB

Raw Blame History

C++ Examples

Tutorials

Check out our tutorials for step-by-step guides on using MSCCL++.

(mscclpp-test)=

mscclpp-test

NOTE: mscclpp-test is NOT a performance benchmark. If you want to get the latest performance numbers, please use the Python benchmark or the NCCL APIs instead.

mscclpp-test is a set of C++ implementation examples. It requires MPI on the system, and the path should be provided via MPI_HOME environment variable to the CMake build system.

$ MPI_HOME=/path/to/mpi cmake -DCMAKE_BUILD_TYPE=Release ..
$ make -j allgather_test_perf allreduce_test_perf

For example, the following command runs the allreduce5 algorithm with 8 GPUs starting from 3MB to 48MB messages, by doubling the message size in between. You can try different algorithms by changing the -k 5 option to another value (e.g., -k 3 runs allreduce3). Check all algorithms from the code: allreduce_test.cu and allgather_test.cu.

$ mpirun --bind-to numa -np 8 ./bin/allreduce_test_perf -b 3m -e 48m -G 100 -n 100 -w 20 -f 2 -k 5

NOTE: a few algorithms set a condition on the total data size, such as to be a multiple of 3. If the condition is unmet, the command will throw a regarding error.

Check the help message for more details.

$ ./bin/allreduce_test_perf --help
USAGE: allreduce_test_perf
        [-b,--minbytes <min size in bytes>]
        [-e,--maxbytes <max size in bytes>]
        [-i,--stepbytes <increment size>]
        [-f,--stepfactor <increment factor>]
        [-n,--iters <iteration count>]
        [-w,--warmup_iters <warmup iteration count>]
        [-c,--check <0/1>]
        [-T,--timeout <time in seconds>]
        [-G,--cudagraph <num graph launches>]
        [-a,--average <0/1/2/3> report average iteration time <0=RANK0/1=AVG/2=MIN/3=MAX>]
        [-k,--kernel_num <kernel number of commnication primitive>]
        [-o, --output_file <output file name>]
        [-h,--help]

2.1 KiB Raw Blame History

C++ Examples

Tutorials

mscclpp-test

2.1 KiB

Raw Blame History