mscclpp/python/requirements_rocm6.txt at 4b2168549ad8bd0c26fb6182cbc243e84d8d65f9 - mscclpp - Public git mirror

microsoft/mscclpp

mirror of https://github.com/microsoft/mscclpp.git synced 2026-05-11 17:00:22 +00:00

Files

Binyang Li 8896cd909a Add ROCm FP8 E4M3B15 support (#774 )

## Summary

Add ROCm (gfx942) support for the FP8 E4M3B15 data type, including
optimized conversion routines between FP8 E4M3B15 and FP16/FP32 using
inline assembly.

Extends the allpair packet and fullmesh allreduce kernels to support
higher-precision accumulation (e.g., FP16/FP32) when reducing FP8 data,
improving numerical accuracy.

Adds Python tests to verify that higher-precision accumulation is at
least as accurate as native FP8 accumulation across all algorithm
variants.

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

2026-04-08 09:53:45 -07:00

10 lines

195 B

Plaintext

Raw Blame History

 mpi4py
 cupy
 prettytable
 netifaces
 pytest
 numpy
 matplotlib
 sortedcontainers @ git+https://github.com/grantjenks/python-sortedcontainers.git@3ac358631f58c1347f1d6d2d92784117db0f38ed
 blake3
 pybind11