Files
mscclpp/include
Binyang Li f1b2c9df12 Fix performance downgrade issue & update doc (#229)
For push function, we only need to make sure the instruction `st.global`
will be executed after the while loop. Since there is a Write-After-Read
hazard for `trigger.fst` (Check `this->triggers[curFifoHead % size].fst
!= 0` first then write value to `triggers[curFifoHead % size]`), we can
expect the compiler and hardware can handle this situation correctly.
Remove the `release.sys` there.

BTW, `st.global.release.sys.v2.u64` will cause perf regression issue.
Previous we use `st.global.release.cta.v2.u64`, but seems not necessary.
2023-12-04 10:20:10 -08:00
..
2023-07-25 10:23:16 -07:00