mirror of
https://github.com/microsoft/mscclpp.git
synced 2026-05-11 17:00:22 +00:00
Improve all reduce performance for single node. New number: | n_ctx | size | target latency (us) | allreduce5 | allreduce6 | |---------|---------|----------------|------------|------------| | 1 | 24.0kB | 7.7 | | 7.23| | 2 | 48.0kB | 7.7 | | 7.69| | 4 | 96.0kB | 8 | | 8.34| | 8 | 192.0kB | 12.6 | | 9.75| | 12 | 288.0kB | 13 | | 11.34| | 16 | 384.0kB | 13.3 | | 12.99| | 768 | 18.0MB | 158.7 | 160.3| | | 896 | 21.0MB | 184.5 | 183.8| | | 1024 | 24.0MB | 209.5 | 207.5| | | 1152 | 27.0MB | 234.3 | 231.9| | | 1280 | 30.0MB | 260 | 255.6| | | 1408 | 33.0MB | 284.9 | 278.7| | | 1536 | 36.0MB | 310.3 | 302.0| | | 1664 | 39.0MB | 336.2 | 325.3| | | 1792 | 42.0MB | 361.4 | 348.8| | | 1920 | 45.0MB | 384.6 | 372.2| | | 2048 | 48.0MB | 409.1 | 395.4| | --------- Co-authored-by: Changho Hwang <changhohwang@microsoft.com>