ext/ep: add pragma once to event.hpp and update validation docs

- Add #pragma once to src/ext/ep/event.hpp; including it in multiple TUs
  would otherwise redefine EventHandle.
- python/mscclpp/ext/ep/buffer.py: low-latency internode is now validated
  on 2x H100x8; remove the 'untested on multi-node H100' note.
- src/ext/ep/kernels/internode_ll.cu: replace the untested-on-multi-node
  WARNING with the current validated-on-2x-H100x8 status.

Addresses Copilot review comments on PR #796.
This commit is contained in:
Qinghua Zhou
2026-05-06 03:24:34 +00:00
parent c641487c55
commit 23e8ce6dbe
3 changed files with 7 additions and 5 deletions

View File

@@ -17,8 +17,9 @@ Current status (see ``src/ext/ep/README.md``):
* Internode HT (MSCCL++ PortChannel + MemoryChannel) dispatch and combine:
ported and validated on 2 nodes x 8 H100 GPUs with
``test/python/ext/ep/test_internode_multirank.py``.
* Internode low-latency kernels: structural port (NVSHMEM/IBGDA ->
MSCCL++ PortChannel), **untested on multi-node H100**.
* Internode low-latency kernels (NVSHMEM/IBGDA -> MSCCL++ PortChannel):
ported and validated on 2 nodes x 8 H100 GPUs with
``test/python/ext/ep/test_low_latency_multirank.py``.
"""
from __future__ import annotations