Files
mscclpp/test/deploy
Binyang Li 06f31994dc Fix performance issue introduced in PR: 499 (#505)
1. use `fence+relaxed` to replace `release` for fifo. `fence+relax` is
more efficient on A100
2. Update the deviceSyncer. Previous one cannot handle threadBlock
number change correctly. Use three counters to solve this issue. Reset
previous counter before sync on current counter.
3. Introduce relaxedWait which can be used with relaxedSignal for case
doesn't need guarantee the memory visibility
2025-04-22 14:03:37 -07:00
..
2023-11-03 05:18:32 +00:00
2024-12-16 09:43:00 -08:00
2023-11-03 05:18:32 +00:00
2023-11-03 05:18:32 +00:00
2023-09-06 17:09:04 -07:00
2025-03-29 00:31:26 +00:00
2024-03-27 11:53:09 -07:00