Files
composable_kernel/example/ck_tile/05_reduce
ClementLinCF 8549197d6e [CK_TILE] Adjust kBlockSize of reduce example for better perf (#1779)
* Observed a 2x perf improvement with kBlockSize = 256
* Using 512 threads may lead to redundant computations

[ROCm/composable_kernel commit: 0b8f117f1a]
2025-01-12 20:50:32 -08:00
..
2024-10-22 09:26:18 +08:00