mscclpp/test/python/ext/ep/test_internode_multirank.py at 7650e699a0fac2422a7d7ac86b19f3aad05bea59

mirror of https://github.com/microsoft/mscclpp.git synced 2026-05-24 14:54:51 +00:00

Files

qinghuazhou 13babbfff2 test/ext/ep: HT — scale combine tolerance with bf16 ulp

At 16 nodes (64 ranks) with topk=8, expected combine values reach
rank*8 = 504, while intermediate partial sums (rank*7 etc.) cross the
bf16 ulp=2 boundary at 256. With the test pattern x = rank*ones and
weights = 1, this produces deterministic +/-1 round-off on certain
ranks (odd local_rank on nodes >= 9), tripping the previous 1e-2
absolute tolerance even though the kernel is correct.

Use tol = max(1e-2, max_exp / 64) which matches the bf16 mantissa
precision and scales with the magnitude of the expected combined
output. The previous tight bound is preserved for small-scale runs
where max_exp < 0.64.

2026-05-12 05:37:42 +00:00

24 KiB

Raw Blame History

View Raw

24 KiB Raw Blame History

24 KiB

Raw Blame History