mscclpp

mirror of https://github.com/microsoft/mscclpp.git synced 2026-05-13 09:46:00 +00:00

Author	SHA1	Message	Date
Changho Hwang	d0c709ea82	Fix Codecov token usage in coverage upload step	2026-02-23 14:45:29 -08:00
Changho Hwang	6c2bc8f4b3	coverage fix	2026-02-23 11:32:50 -08:00
Changho Hwang	04ebd9ba6e	fix coverage file path	2026-02-23 10:39:39 -08:00
Changho Hwang	c4afbe12d9	Merge branch 'main' into copilot/remove-gtest-use-custom-framework	2026-02-23 10:25:22 -08:00
mahdiehghazim	2a6f1c1192	Mahdieh/switchchannel test clean (#751 ) This PR adds an example code for switch channel testing. It validates switch channel on single node and multi node environments. We need to add the description of the algorithms and the explanation of the code under doc. example outputs: rank0: ./bidir_switch_channel 10.0.5.233:45571 0 0 Rank 0 (GPU 0): Preparing for tests ... Rank 0 (GPU 0): bytes 4096, elapsed 0.0062328 ms/iter, BW 0.657169 GB/s Rank 0 (GPU 0): bytes 4.1943e+06, elapsed 0.0164577 ms/iter, BW 254.854 GB/s Rank 0 (GPU 0): bytes 1.34218e+08, elapsed 0.33628 ms/iter, BW 399.125 GB/s Rank 0: Succeed! rank1: ./bidir_switch_channel 10.0.5.233:45571 1 0 Rank 1 (GPU 0): Preparing for tests ... Rank 1: Succeed!	2026-02-20 22:46:32 -05:00
Binyang Li	3962574bcb	Address installation issue in some env (#750 ) This pull request updates the way the `nlohmann/json` library is fetched and upgrades it to a newer version in both the main build and test configuration files. Addressed installation issue in some env	2026-02-20 16:11:16 -08:00
Caio Rocha	e2acf7f1c8	Removing MPI Dependency (#743 )	2026-02-20 16:04:12 -08:00
Changho Hwang	41695bab94	Merge branch 'main' into copilot/remove-gtest-use-custom-framework	2026-02-20 14:04:27 -08:00
Changho Hwang	b9609f83a0	add coverage flags	2026-02-20 14:03:54 -08:00
Changho Hwang	caeec7590a	updates	2026-02-20 13:43:32 -08:00
Binyang Li	39865c218b	address flagBuffer ownership issue (#749 ) This pull request updates the handling of the default flag buffer in the C++ and Python bindings to ensure proper memory management when interfacing with Python. Make sure the buffer will not be deallocated when transfer ownership from cpp to python	2026-02-20 13:42:29 -08:00
Changho Hwang	dcdd3febd1	update UT CI	2026-02-20 13:35:32 -08:00
Changho Hwang	b64536f28e	Merge branch 'main' into copilot/remove-gtest-use-custom-framework	2026-02-18 20:35:34 -08:00
Changho Hwang	2b4adcc4ad	fix lint	2026-02-18 20:33:57 -08:00
Changho Hwang	b693d1b3fc	lint issue	2026-02-18 20:31:25 -08:00
Changho Hwang	4d9aceac6f	badge	2026-02-18 20:25:50 -08:00
Changho Hwang	bed85b56cb	codecov upload	2026-02-18 20:23:42 -08:00
Changho Hwang	e40c72bd2b	license text update	2026-02-18 20:12:32 -08:00
Changho Hwang	4afbf780ed	minor	2026-02-18 19:54:37 -08:00
Changho Hwang	d2efc2fd3b	coverage update	2026-02-18 19:48:29 -08:00
Changho Hwang	b6ce0f2ede	simplify	2026-02-18 19:16:21 -08:00
Changho Hwang	30b9891180	simplifying	2026-02-18 18:35:33 -08:00
Binyang Li	4701ae3a95	Update dtype name (#748 ) - Change FP8_E4M3/FP8_E5M2 to FLOAT8_E4M3/FLOAT8_E5M2 - Add torch.uint8 to DataType.uint8 mapping	2026-02-18 10:35:44 -08:00
Binyang Li	d0d5a8c034	Add new CI pipeline for RCCL test (#746 ) Add rccl allreduce/allgather test in ci pipeline Fix hang issue which introduced by PR #741 --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-02-13 10:50:10 -08:00
Qinghua Zhou	edc9c38751	Support uint8 data type for Allreduce (#736 ) Support uint8 data type for Allreduce. Current limitation: uint8 is not supported for NVLS. Performance results with RCCL-test with MSCCLPP on MI300X: \# out-of-place in-place \# size count type redop root time algbw busbw #wrong time algbw busbw #wrong \# (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s) 1024 \| 512 \| half \| sum \| -1 \| 5.39 \| 0.19 \| 0.33 \| 0 \| 5.45 \| 0.19 \| 0.33 \| 0 -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- 2048 \| 1024 \| half \| sum \| -1 \| 5.53 \| 0.37 \| 0.65 \| 0 \| 5.63 \| 0.36 \| 0.64 \| 0 4096 \| 2048 \| half \| sum \| -1 \| 5.55 \| 0.74 \| 1.29 \| 0 \| 5.56 \| 0.74 \| 1.29 \| 0 8192 \| 4096 \| half \| sum \| -1 \| 5.8 \| 1.41 \| 2.47 \| 0 \| 5.84 \| 1.4 \| 2.46 \| 0 16384 \| 8192 \| half \| sum \| -1 \| 6.57 \| 2.49 \| 4.36 \| 0 \| 6.56 \| 2.5 \| 4.37 \| 0 32768 \| 16384 \| half \| sum \| -1 \| 8.02 \| 4.09 \| 7.15 \| 0 \| 8.06 \| 4.07 \| 7.11 \| 0 65536 \| 32768 \| half \| sum \| -1 \| 8.77 \| 7.47 \| 13.07 \| 0 \| 8.82 \| 7.43 \| 13 \| 0 131072 \| 65536 \| half \| sum \| -1 \| 9.61 \| 13.64 \| 23.87 \| 0 \| 9.78 \| 13.4 \| 23.45 \| 0 262144 \| 131072 \| half \| sum \| -1 \| 11.68 \| 22.44 \| 39.27 \| 0 \| 12.1 \| 21.67 \| 37.93 \| 0 524288 \| 262144 \| half \| sum \| -1 \| 13.77 \| 38.08 \| 66.64 \| 0 \| 13.87 \| 37.79 \| 66.13 \| 0 1048576 \| 524288 \| half \| sum \| -1 \| 19.11 \| 54.87 \| 96.03 \| 0 \| 19.27 \| 54.42 \| 95.24 \| 0 2097152 \| 1048576 \| half \| sum \| -1 \| 24.1 \| 87 \| 152.26 \| 0 \| 24.24 \| 86.52 \| 151.41 \| 0 4194304 \| 2097152 \| half \| sum \| -1 \| 37.16 \| 112.87 \| 197.52 \| 0 \| 37.44 \| 112.03 \| 196.06 \| 0 8388608 \| 4194304 \| half \| sum \| -1 \| 61.53 \| 136.33 \| 238.58 \| 0 \| 61.68 \| 135.99 \| 237.99 \| 0 16777216 \| 8388608 \| half \| sum \| -1 \| 108.8 \| 154.22 \| 269.88 \| 0 \| 109.2 \| 153.6 \| 268.79 \| 0 33554432 \| 16777216 \| half \| sum \| -1 \| 197.8 \| 169.68 \| 296.94 \| 0 \| 198.6 \| 168.92 \| 295.61 \| 0 67108864 \| 33554432 \| half \| sum \| -1 \| 384.6 \| 174.51 \| 305.39 \| 0 \| 385.1 \| 174.27 \| 304.98 \| 0 134217728 \| 67108864 \| half \| sum \| -1 \| 754.1 \| 177.99 \| 311.48 \| 0 \| 754.9 \| 177.78 \| 311.12 \| 0 268435456 \| 134217728 \| half \| sum \| -1 \| 1491.8 \| 179.94 \| 314.89 \| 0 \| 1493.2 \| 179.77 \| 314.6 \| 0 536870912 \| 268435456 \| half \| sum \| -1 \| 2979.6 \| 180.18 \| 315.31 \| 0 \| 2983.9 \| 179.92 \| 314.87 \| 0 \# out-of-place in-place \# size count type redop root time algbw busbw #wrong time algbw busbw #wrong \# (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s) 1024 \| 1024 \| fp8_e4m3 \| sum \| -1 \| 5.4 \| 0.19 \| 0.33 \| 0 \| 5.45 \| 0.19 \| 0.33 \| 0 -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- 2048 \| 2048 \| fp8_e4m3 \| sum \| -1 \| 5.5 \| 0.37 \| 0.65 \| 0 \| 5.6 \| 0.37 \| 0.64 \| 0 4096 \| 4096 \| fp8_e4m3 \| sum \| -1 \| 5.61 \| 0.73 \| 1.28 \| 0 \| 5.68 \| 0.72 \| 1.26 \| 0 8192 \| 8192 \| fp8_e4m3 \| sum \| -1 \| 5.96 \| 1.38 \| 2.41 \| 0 \| 5.98 \| 1.37 \| 2.4 \| 0 16384 \| 16384 \| fp8_e4m3 \| sum \| -1 \| 6.49 \| 2.52 \| 4.42 \| 0 \| 6.58 \| 2.49 \| 4.36 \| 0 32768 \| 32768 \| fp8_e4m3 \| sum \| -1 \| 8.09 \| 4.05 \| 7.09 \| 0 \| 8.15 \| 4.02 \| 7.03 \| 0 65536 \| 65536 \| fp8_e4m3 \| sum \| -1 \| 8.58 \| 7.64 \| 13.37 \| 0 \| 8.7 \| 7.53 \| 13.18 \| 0 131072 \| 131072 \| fp8_e4m3 \| sum \| -1 \| 9.44 \| 13.88 \| 24.29 \| 0 \| 9.62 \| 13.63 \| 23.85 \| 0 262144 \| 262144 \| fp8_e4m3 \| sum \| -1 \| 10.12 \| 25.9 \| 45.32 \| 0 \| 10.37 \| 25.27 \| 44.22 \| 0 524288 \| 524288 \| fp8_e4m3 \| sum \| -1 \| 13.73 \| 38.19 \| 66.82 \| 0 \| 13.89 \| 37.74 \| 66.04 \| 0 1048576 \| 1048576 \| fp8_e4m3 \| sum \| -1 \| 18.66 \| 56.2 \| 98.34 \| 0 \| 18.92 \| 55.41 \| 96.97 \| 0 2097152 \| 2097152 \| fp8_e4m3 \| sum \| -1 \| 24.54 \| 85.46 \| 149.56 \| 0 \| 24.63 \| 85.16 \| 149.03 \| 0 4194304 \| 4194304 \| fp8_e4m3 \| sum \| -1 \| 37.79 \| 110.98 \| 194.21 \| 0 \| 38.05 \| 110.22 \| 192.88 \| 0 8388608 \| 8388608 \| fp8_e4m3 \| sum \| -1 \| 62.22 \| 134.82 \| 235.94 \| 0 \| 62.63 \| 133.94 \| 234.4 \| 0 16777216 \| 16777216 \| fp8_e4m3 \| sum \| -1 \| 109.9 \| 152.62 \| 267.09 \| 0 \| 110.4 \| 151.9 \| 265.83 \| 0 33554432 \| 33554432 \| fp8_e4m3 \| sum \| -1 \| 201.1 \| 166.82 \| 291.94 \| 0 \| 202.3 \| 165.84 \| 290.22 \| 0 67108864 \| 67108864 \| fp8_e4m3 \| sum \| -1 \| 390 \| 172.06 \| 301.11 \| 0 \| 390.2 \| 171.99 \| 300.99 \| 0 134217728 \| 134217728 \| fp8_e4m3 \| sum \| -1 \| 763.9 \| 175.7 \| 307.47 \| 0 \| 764.2 \| 175.62 \| 307.34 \| 0 268435456 \| 268435456 \| fp8_e4m3 \| sum \| -1 \| 1509.5 \| 177.83 \| 311.2 \| 0 \| 1510.1 \| 177.76 \| 311.08 \| 0 536870912 \| 536870912 \| fp8_e4m3 \| sum \| -1 \| 3010.2 \| 178.35 \| 312.11 \| 0 \| 3014.2 \| 178.11 \| 311.7 \| 0 \# out-of-place in-place \# size count type redop root time algbw busbw #wrong time algbw busbw #wrong \# (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s) 1024 \| 1024 \| fp8_e5m2 \| sum \| -1 \| 5.41 \| 0.19 \| 0.33 \| 0 \| 5.44 \| 0.19 \| 0.33 \| 0 -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- 2048 \| 2048 \| fp8_e5m2 \| sum \| -1 \| 5.5 \| 0.37 \| 0.65 \| 0 \| 5.67 \| 0.36 \| 0.63 \| 0 4096 \| 4096 \| fp8_e5m2 \| sum \| -1 \| 5.61 \| 0.73 \| 1.28 \| 0 \| 5.69 \| 0.72 \| 1.26 \| 0 8192 \| 8192 \| fp8_e5m2 \| sum \| -1 \| 5.96 \| 1.37 \| 2.4 \| 0 \| 6 \| 1.36 \| 2.39 \| 0 16384 \| 16384 \| fp8_e5m2 \| sum \| -1 \| 6.63 \| 2.47 \| 4.32 \| 0 \| 6.59 \| 2.49 \| 4.35 \| 0 32768 \| 32768 \| fp8_e5m2 \| sum \| -1 \| 8.07 \| 4.06 \| 7.1 \| 0 \| 8.16 \| 4.02 \| 7.03 \| 0 65536 \| 65536 \| fp8_e5m2 \| sum \| -1 \| 8.62 \| 7.61 \| 13.31 \| 0 \| 8.73 \| 7.51 \| 13.14 \| 0 131072 \| 131072 \| fp8_e5m2 \| sum \| -1 \| 9.43 \| 13.9 \| 24.33 \| 0 \| 9.6 \| 13.66 \| 23.9 \| 0 262144 \| 262144 \| fp8_e5m2 \| sum \| -1 \| 10.11 \| 25.94 \| 45.39 \| 0 \| 10.38 \| 25.26 \| 44.21 \| 0 524288 \| 524288 \| fp8_e5m2 \| sum \| -1 \| 13.73 \| 38.19 \| 66.84 \| 0 \| 13.87 \| 37.79 \| 66.13 \| 0 1048576 \| 1048576 \| fp8_e5m2 \| sum \| -1 \| 18.65 \| 56.22 \| 98.39 \| 0 \| 18.93 \| 55.38 \| 96.92 \| 0 2097152 \| 2097152 \| fp8_e5m2 \| sum \| -1 \| 24.54 \| 85.47 \| 149.57 \| 0 \| 24.63 \| 85.16 \| 149.03 \| 0 4194304 \| 4194304 \| fp8_e5m2 \| sum \| -1 \| 37.84 \| 110.83 \| 193.96 \| 0 \| 38.01 \| 110.36 \| 193.12 \| 0 8388608 \| 8388608 \| fp8_e5m2 \| sum \| -1 \| 62.32 \| 134.61 \| 235.58 \| 0 \| 62.55 \| 134.12 \| 234.71 \| 0 16777216 \| 16777216 \| fp8_e5m2 \| sum \| -1 \| 110 \| 152.58 \| 267.01 \| 0 \| 110.3 \| 152.12 \| 266.21 \| 0 33554432 \| 33554432 \| fp8_e5m2 \| sum \| -1 \| 201.1 \| 166.9 \| 292.07 \| 0 \| 201.8 \| 166.26 \| 290.96 \| 0 67108864 \| 67108864 \| fp8_e5m2 \| sum \| -1 \| 390 \| 172.07 \| 301.12 \| 0 \| 390.5 \| 171.87 \| 300.78 \| 0 134217728 \| 134217728 \| fp8_e5m2 \| sum \| -1 \| 763.9 \| 175.69 \| 307.46 \| 0 \| 764.5 \| 175.56 \| 307.23 \| 0 268435456 \| 268435456 \| fp8_e5m2 \| sum \| -1 \| 1509.4 \| 177.84 \| 311.22 \| 0 \| 1509.8 \| 177.8 \| 311.14 \| 0 536870912 \| 536870912 \| fp8_e5m2 \| sum \| -1 \| 3013 \| 178.18 \| 311.82 \| 0 \| 3018 \| 177.89 \| 311.31 \| 0 \# out-of-place in-place \# size count type redop root time algbw busbw #wrong time algbw busbw #wrong \# (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s) 1024 \| 1024 \| uint8 \| sum \| -1 \| 5.46 \| 0.19 \| 0.33 \| 0 \| 5.46 \| 0.19 \| 0.33 \| 0 -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- 2048 \| 2048 \| uint8 \| sum \| -1 \| 5.54 \| 0.37 \| 0.65 \| 0 \| 5.63 \| 0.36 \| 0.64 \| 0 4096 \| 4096 \| uint8 \| sum \| -1 \| 5.61 \| 0.73 \| 1.28 \| 0 \| 5.63 \| 0.73 \| 1.27 \| 0 8192 \| 8192 \| uint8 \| sum \| -1 \| 5.9 \| 1.39 \| 2.43 \| 0 \| 5.9 \| 1.39 \| 2.43 \| 0 16384 \| 16384 \| uint8 \| sum \| -1 \| 6.6 \| 2.48 \| 4.35 \| 0 \| 6.64 \| 2.47 \| 4.32 \| 0 32768 \| 32768 \| uint8 \| sum \| -1 \| 8.99 \| 3.65 \| 6.38 \| 0 \| 8.99 \| 3.64 \| 6.38 \| 0 65536 \| 65536 \| uint8 \| sum \| -1 \| 9.44 \| 6.94 \| 12.15 \| 0 \| 9.58 \| 6.84 \| 11.98 \| 0 131072 \| 131072 \| uint8 \| sum \| -1 \| 11.72 \| 11.18 \| 19.57 \| 0 \| 11.83 \| 11.08 \| 19.4 \| 0 262144 \| 262144 \| uint8 \| sum \| -1 \| 12.29 \| 21.32 \| 37.31 \| 0 \| 12.45 \| 21.05 \| 36.84 \| 0 524288 \| 524288 \| uint8 \| sum \| -1 \| 13.87 \| 37.8 \| 66.15 \| 0 \| 13.93 \| 37.64 \| 65.88 \| 0 1048576 \| 1048576 \| uint8 \| sum \| -1 \| 19.11 \| 54.88 \| 96.04 \| 0 \| 19.3 \| 54.33 \| 95.08 \| 0 2097152 \| 2097152 \| uint8 \| sum \| -1 \| 24.38 \| 86.01 \| 150.51 \| 0 \| 24.52 \| 85.53 \| 149.67 \| 0 4194304 \| 4194304 \| uint8 \| sum \| -1 \| 37.52 \| 111.78 \| 195.61 \| 0 \| 37.76 \| 111.08 \| 194.39 \| 0 8388608 \| 8388608 \| uint8 \| sum \| -1 \| 62.4 \| 134.44 \| 235.26 \| 0 \| 62.56 \| 134.1 \| 234.67 \| 0 16777216 \| 16777216 \| uint8 \| sum \| -1 \| 110.2 \| 152.22 \| 266.39 \| 0 \| 110.3 \| 152.04 \| 266.08 \| 0 33554432 \| 33554432 \| uint8 \| sum \| -1 \| 199.8 \| 167.94 \| 293.9 \| 0 \| 197.5 \| 169.88 \| 297.29 \| 0 67108864 \| 67108864 \| uint8 \| sum \| -1 \| 386.3 \| 173.73 \| 304.03 \| 0 \| 378.4 \| 177.37 \| 310.39 \| 0 134217728 \| 134217728 \| uint8 \| sum \| -1 \| 758 \| 177.07 \| 309.87 \| 0 \| 741.1 \| 181.12 \| 316.95 \| 0 268435456 \| 268435456 \| uint8 \| sum \| -1 \| 1500.1 \| 178.95 \| 313.16 \| 0 \| 1466.2 \| 183.09 \| 320.4 \| 0 536870912 \| 536870912 \| uint8 \| sum \| -1 \| 2991.7 \| 179.45 \| 314.04 \| 0 \| 2924.8 \| 183.56 \| 321.23 \| 0 --------- Co-authored-by: Qinghua Zhou <qinghuahzhou@microsoft.com>	2026-02-13 10:49:25 -08:00
Binyang Li	bd68319e3e	Refactor algo selection logic and introduce symmetric_memory env (#741 ) This PR refactors the algorithm selection logic in MSCCL++ and introduces support for symmetric memory configuration through environment variables. 1. Algorithm Selection Refactoring Use separate class for algo selection. Could introduce more complex logic for algo selection based on message size, arch, if cuda graph is enabled and memory allocation method 2. Symmetric Memory Support Introduced symmetricMemory parameter in algorithm context key generation. Remove disableChannelCache env as is ambiguous 3. Add new args for build_default_algorithms Add flag_buffer, and flag_buffer_size args to build default algorithm. Then we could use unified flag buffer for different algorithms, avoid application hanging when switch algo for different message size. --------- Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com> Co-authored-by: Qinghua Zhou <qinghuazhou@microsoft.com> Co-authored-by: Caio Rocha <caiorocha@microsoft.com>	2026-02-12 19:06:18 -08:00
Caio Rocha	dff3bc7bbb	Support Fusion for ReadPutPacket Operation at DSL (#742 ) Support is being added for fusing the ReadPutPacket operation on DSL, which reduces the overhead caused by reading packet data multiple times in the scratch buffer. Fusion will occur when two rppkt operations are executed consecutively with the same src_buffer: rppkt(src, dst0) + rppkt(src, dst1) -> rppkt(src, [dst0, dst1] Co-authored-by: Binyang Li <binyli@microsoft.com>	2026-02-12 17:27:20 -08:00
copilot-swe-agent[bot]	7003fec763	Simplify filter matching to use substring matching - Remove complex wildcard pattern matching (*, ?, negative patterns) - Use simple substring matching with find() - Simpler implementation, easier to understand and maintain - Still supports --gtest_filter for basic test name filtering Note: For advanced filtering like wildcards, users can use multiple test runs with different substring filters. Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>	2026-02-11 08:36:24 +00:00
copilot-swe-agent[bot]	e26f8abbd4	Address PR review comments 1. Add missing includes to fifo_perf_tests.cu - Add #include <cassert> - Add #include <unordered_map> 2. Fix license header capitalization (4 files) - "license." → "License." in test/framework.{hpp,cc} - "license." → "License." in test/unit/{unit_tests_main.cc,fifo_perf_tests.cu} 3. Fix double MPI_Init issue - Check MPI_Initialized() before calling MPI_Init - Prevents double initialization when mp_unit_tests already inits MPI 4. Fix coverage flags for CUDA compilation - Use generator expressions to apply --coverage only to C++ language - Prevents breaking CUDA compilation with host-only flags 5. Fix environment memory leak - Delete environment objects after TearDown() - Clear environments_ vector 6. Implement proper GTEST_SKIP handling - Create SkipException class - Handle skipped tests separately from failures - Report skipped test count 7. Implement GTest-style filter pattern matching - Support wildcards (* and ?) - Support negative patterns (-Pattern) - Support colon-separated patterns (Foo:Bar) - Compatible with existing CI usage like --gtest_filter=-Ib Verified builds successfully with Docker. Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>	2026-02-11 08:32:28 +00:00
copilot-swe-agent[bot]	50f6a24b69	Remove test/perf/ directory completely - Remove test/perf/ directory (fifo_test.cu, framework.{cc,hpp}, CMakeLists.txt) - Remove add_subdirectory(perf) from test/CMakeLists.txt - Performance tests now integrated into unit_tests as fifo_perf_tests.cu - Fix mp_unit_tests.cc to use framework functions without ::testing:: namespace - Fix bootstrap_tests.cc ErrorCode comparison to use ASSERT_TRUE - Fix switch_channel_tests.cu to not use streaming with ASSERT_EQ - Add missing #include <unistd.h> to executor_tests.cc All perf test functionality is now in unit_tests and can be filtered with --exclude-perf-tests flag. The standalone test/perf/ directory is no longer needed. Verified builds: - unit_tests: ✅ - mp_unit_tests: ✅ Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>	2026-02-11 03:16:00 +00:00
copilot-swe-agent[bot]	ba0451ae68	Remove build2 CMake artifacts from repository Another set of accidentally committed build artifacts in build2/ directory. The .gitignore pattern build_*/ should prevent these in the future. Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>	2026-02-11 02:37:12 +00:00
copilot-swe-agent[bot]	b59196b8a5	Integrate perf tests into unit_tests and add CI coverage step - Add unit_tests_main.cc with main() function for unit_tests executable - Create fifo_perf_tests.cu as PERF_TEST for unit_tests - Add fifo_perf_tests.cu to unit_tests sources - Fix errors_tests.cc to use ASSERT_TRUE for ErrorCode comparisons - Fix core_tests.cc to use ASSERT_TRUE for TransportFlags comparisons - Add Azure pipeline step for Debug build with coverage - Add step to run mp_unit_tests --exclude-perf-tests with coverage The perf tests are now part of unit_tests and can be filtered out for coverage reporting. CI now includes Debug build with coverage collection for non-performance tests. Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>	2026-02-11 02:35:46 +00:00
copilot-swe-agent[bot]	7e4365f014	Add performance test filtering and remove HTML coverage - Add isPerfTest field to TestInfoInternal struct - Add --exclude-perf-tests command line argument - Add PERF_TEST and PERF_TEST_F macros for marking performance tests - Update runAllTests to filter performance tests when requested - Remove genhtml dependency and HTML report generation - Keep only coverage.info file generation with lcov Performance tests can now be excluded with: ./build/bin/unit_tests --exclude-perf-tests ./build/bin/mp_unit_tests --exclude-perf-tests Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>	2026-02-11 02:29:50 +00:00
copilot-swe-agent[bot]	6da12fade1	Comprehensive plan for refactoring Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>	2026-02-11 02:27:17 +00:00
copilot-swe-agent[bot]	b1f458eca3	Convert test framework identifiers from snake_case to camelCase Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>	2026-02-11 02:25:28 +00:00
copilot-swe-agent[bot]	305d15717e	Remove PerfTestResult and reuse TestResult directly - Add nlohmann::ordered_json metrics field to TestResult struct - Add nlohmann/json.hpp include to test/framework.hpp - Link test_framework with nlohmann_json::nlohmann_json - Replace PerfTestResult with TestResult in test/perf/framework.cc - Move perf utility functions to utils namespace for consistency - Remove duplicate PerfTestResult struct definition This consolidates the two similar structs into one, reducing code duplication while maintaining all necessary fields for both unit tests (passed/failure_message) and performance tests (metrics). Verified build succeeds with Docker: docker run --rm -v $(pwd):/workspace -w /workspace \ ghcr.io/microsoft/mscclpp/mscclpp:base-dev-cuda12.4 bash -c \ "cd /workspace/build && make -j4 fifo_test" Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>	2026-02-11 02:21:10 +00:00
copilot-swe-agent[bot]	403b2fb3db	Remove unnecessary CMake build artifacts from PR - Remove build_test/ directory containing CMake cache and build files - Update .gitignore to exclude build_*/ pattern to prevent future accidents These CMake artifacts (CMakeCache.txt, CMakeFiles/, generated headers) were accidentally committed and should never be in version control. Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>	2026-02-11 01:55:40 +00:00
copilot-swe-agent[bot]	4823583587	Move FailHelper and SkipHelper into mscclpp::test namespace - Move helper classes inside namespace for proper access - Remove duplicate class definitions outside namespace - Tests can now build in Docker with CUDA toolkit installed - Remaining issues: ErrorCode and TransportFlags need operator<< for EXPECT_EQ Successfully building with Docker: docker run --rm -v $(pwd):/workspace -w /workspace \ ghcr.io/microsoft/mscclpp/mscclpp:base-dev-cuda12.4 bash -c \ "mkdir build && cd build && cmake -DMSCCLPP_BYPASS_GPU_CHECK=ON \ -DMSCCLPP_USE_CUDA=ON .. && make -j4" Note: Some unit tests (errors_tests.cc, core_tests.cc) need operator<< defined for ErrorCode and TransportFlags to compile with custom framework. Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>	2026-02-11 01:17:08 +00:00
copilot-swe-agent[bot]	0eae34c53d	Fix test framework for building with Docker - Make MPI REQUIRED for test builds (clearer error messages) - Add project include directories to test_framework library - Fix core_tests.cc to use custom framework correctly - Fix mp_unit_tests.hpp to use mscclpp::test namespace - Add FAIL() macro with streaming support for test messages - Building tests now works in Docker environment with GPU bypass Tests can now be built using: docker run --rm -v $(pwd):/workspace -w /workspace \ ghcr.io/microsoft/mscclpp/mscclpp:base-dev-cuda12.4 bash -c \ "mkdir build && cd build && cmake -DMSCCLPP_BYPASS_GPU_CHECK=ON \ -DMSCCLPP_USE_CUDA=ON .. && make -j" Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>	2026-02-11 01:13:29 +00:00
copilot-swe-agent[bot]	5657e4a321	Initial plan for fixing test build with GPU bypass Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>	2026-02-11 01:02:59 +00:00
copilot-swe-agent[bot]	1818709de0	Fix CodeQL workflow by disabling test builds The recent removal of GTest and introduction of custom test framework requires MPI dependency which is not needed for CodeQL analysis. Disable test building in CodeQL workflows to fix the build failures. CodeQL only needs to analyze the core library code, not the tests. Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>	2026-02-11 00:59:34 +00:00
copilot-swe-agent[bot]	a10aff559f	Address code review feedback - Move PerfTestResult struct definition outside vector declaration - Move getCurrentTimestamp to anonymous namespace - Add documentation for GTEST_SKIP macro explaining RAII pattern Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>	2026-02-11 00:28:00 +00:00
copilot-swe-agent[bot]	3d8a2e7349	Add --gtest_filter support to framework Support --gtest_filter command line argument for test filtering, compatible with Azure pipeline configurations. Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>	2026-02-11 00:25:43 +00:00
copilot-swe-agent[bot]	eafa6fbfaf	Add custom test framework and code coverage support - Move test framework from test/perf/ to test/ for shared use - Add GTest-compatible macros (TEST, TEST_F, EXPECT_, ASSERT_, etc.) - Remove GTest dependency from CMakeLists.txt - Add test_framework library for unit and mp_unit tests - Add code coverage support with lcov (MSCCLPP_ENABLE_COVERAGE option) - Update perf tests to use shared framework Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>	2026-02-11 00:24:03 +00:00
copilot-swe-agent[bot]	1e32e17c1e	Address code review comments - Remove duplicate static getMPIRank() and getMPISize() functions - Add full namespace qualification to GTEST_SKIP macro Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>	2026-02-11 00:22:04 +00:00
copilot-swe-agent[bot]	e227fdc1ef	Convert mp_unit tests from gtest to framework.hpp - Modified test/mp_unit/mp_unit_tests.hpp to use ../framework.hpp instead of gtest/gtest.h - Enhanced test/framework.hpp with GTest-compatible APIs: - Added Environment base class for global test setup/teardown - Added TestInfo and UnitTest classes for test metadata access - Added GTEST_SKIP macro support via SkipHelper class - Added namespace alias 'testing' for compatibility - Added InitGoogleTest and AddGlobalTestEnvironment helper functions - Updated test/framework.cc with implementations for new classes - All mp_unit test files now use framework.hpp through mp_unit_tests.hpp - Formatting applied via lint.sh Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>	2026-02-11 00:21:04 +00:00
copilot-swe-agent[bot]	c881bc5e16	Replace gtest/gtest.h with framework.hpp in all unit tests Co-authored-by: chhwang <8018170+chhwang@users.noreply.github.com>	2026-02-11 00:17:18 +00:00
copilot-swe-agent[bot]	e711b62ab7	Initial plan	2026-02-11 00:12:09 +00:00
Changho Hwang	42be3660e0	Add a new IB stack impl that doesn't use RDMA atomics (#728 ) * Added configurable InfiniBand (IB) signaling mode. `EndpointConfig::Ib::Mode` enum selects the mode (`Default`, `Host`, `HostNoAtomic`). `Default` is equivalent to `Host` unless specified different by envrionment `MSCCLPP_IBV_MODE`. `Host` corresponds to the previous implementation using RDMA atomics for signaling, while `HostNoAtomic` uses write-with-immediate instead. * Regarding updates in Python bindings and API.	2026-02-10 01:07:53 +00:00
Binyang Li	c12822a7af	create CI pipeline for rocm (#718 ) Create CI pipeline for AMD GPU.	2026-02-09 16:55:16 -08:00

1 2 3 4 5 ...

941 Commits