Files
composable_kernel/example/ck_tile/05_reduce
ClementLinCF 0b8f117f1a [CK_TILE] Adjust kBlockSize of reduce example for better perf (#1779)
* Observed a 2x perf improvement with kBlockSize = 256
* Using 512 threads may lead to redundant computations
2025-01-12 20:50:32 -08:00
..
2024-10-22 09:26:18 +08:00