mirror of
https://github.com/microsoft/mscclpp.git
synced 2026-05-12 09:17:06 +00:00
Annotate the two known issues flagged by Copilot review on PR #796: - atomicadd_kernel.cu: launching the atomicAdd kernel from a separate CUDA context while `dst` is a CUDA-IPC mapping registered in the primary context is technically UB; works in practice on current drivers but should be revisited. - context.cc: `CudaIpcStream::sync()` deliberately skips `proxyAtomicStream_` to avoid deadlocking the proxy thread, with the side effect that `Connection::flush()` does not order pending remote atomicAdd ops on the CUDA-IPC transport. Both behaviors were cherry-picked from DeepEP branch `chhwang/dev-atomic-add-cleanup` and should be revisited before this lands on `main`.