mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-15 18:42:06 +00:00
* Observed a 2x perf improvement with kBlockSize = 256
* Using 512 threads may lead to redundant computations
[ROCm/composable_kernel commit: 0b8f117f1a]
3.9 KiB
3.9 KiB