mirror of
https://github.com/microsoft/mscclpp.git
synced 2026-05-13 01:36:10 +00:00
* Pass the op type as a template parameter * Use the all-pairs algorithm for ~10KB * Don't write channel handles on the shared memory for small sizes * A reduction bug fix & cleanup