Files
composable_kernel/example/ck_tile/05_reduce/reduce.cpp
ClementLinCF 0b8f117f1a [CK_TILE] Adjust kBlockSize of reduce example for better perf (#1779)
* Observed a 2x perf improvement with kBlockSize = 256
* Using 512 threads may lead to redundant computations
2025-01-12 20:50:32 -08:00

3.9 KiB