Files
mscclpp/python
Changho Hwang b4dde38db8 FIFO improvements (#557)
* Revert `MSCCLPP_FIFO_USE_TAIL_REPLICA=1` back to the default.
* Optimize `FifoDeviceHandle`.
* Do not use `cudaHostAllocWriteCombined` that increases latency.
* Pin host memory for `Host2DeviceSemaphore::outboundSemaphore_`.
* Fix proxy NUMA binding issues.
* Prevent graph capture inside proxy threads.
* Now `CudaIpcConnection` skips stream sync when unnecessary.
* Now any type of connection needs to hold a shared pointer to the
context for memory safety.
* Now a context should be always managed by a shared pointer for memory
safety.
* Minor docs & interface improvements.
* Minor fix in `mscclpp-test` correctness test.
2025-06-24 09:50:28 -07:00
..
2025-06-24 09:50:28 -07:00
2025-06-24 09:50:28 -07:00
2024-03-27 11:53:09 -07:00
2024-03-27 11:53:09 -07:00
2024-09-15 22:30:54 +00:00