Files
composable_kernel/example/ck_tile
Po Yen Chen 144377ae38 [CK_TILE] FMHA forward batch_prefill optimization for low CU utilization (#2251)
* Add constraint on traits/tile/pipeline

* Use kM0=128 if max_seqlen_q == 8192

* Re-format codegen script

* Remove redundant attr name postix

* Fix import error: default field in dataclass

* Use kK0=64 & kK1=64 to hide latency

* Use CU utilization to decide tile size

[ROCm/composable_kernel commit: 28cd0dffc9]
2025-05-29 18:36:33 +09:00
..
2024-04-15 19:27:12 -05:00