Files
composable_kernel/example/ck_tile
ClementLinCF 0b8f117f1a [CK_TILE] Adjust kBlockSize of reduce example for better perf (#1779)
* Observed a 2x perf improvement with kBlockSize = 256
* Using 512 threads may lead to redundant computations
2025-01-12 20:50:32 -08:00
..
2025-01-09 17:41:49 -08:00
2024-10-26 23:52:49 +08:00
2024-04-15 19:27:12 -05:00