Files
mscclpp/test/python/ext
Qinghua Zhou 5d16ac958e EP GB200 (4 GPUs/node) support
- configs.cuh: NUM_MAX_NVL_PEERS 8 -> 4
- internode.cu: introduce NvlPackT (uint64_t for 8 peers, uint32_t for 4)
  to handle packed-bool loads of is_token_in_rank; relax SourceMeta
  static_assert; replace 4 uint64_t-coupled sites
- buffer.hpp/buffer.cc: relax NUM_MAX_NVL_PEERS assert (4 || 8); read
  MSCCLPP_EP_LOCAL_WORLD_SIZE env to override rdma_rank/nvl_rank
  partitioning when local world size != NUM_MAX_NVL_PEERS
- CMakeLists.txt (ext/ep): rpath / install fix
- pyproject.toml: MSCCLPP_BUILD_EXT_EP=ON
- src/core/atomicadd_kernel.cu, kernels/buffer.cuh, kernels/utils.cuh:
  related EP fixes
- test_internode_multirank.py: NUM_MAX_NVL_PEERS=4, rank %% 4
2026-05-08 01:42:21 +00:00
..
2026-05-08 01:42:21 +00:00