Files
composable_kernel/example
Po Yen Chen a7e36b2781 [CK_TILE] FMHA forward batch_prefill optimization for low CU utilization (#2251)
* Add constraint on traits/tile/pipeline

* Use kM0=128 if max_seqlen_q == 8192

* Re-format codegen script

* Remove redundant attr name postix

* Fix import error: default field in dataclass

* Use kK0=64 & kK1=64 to hide latency

* Use CU utilization to decide tile size

[ROCm/composable_kernel commit: 28cd0dffc9]
2025-05-29 18:36:33 +09:00
..
2024-05-10 09:41:39 -07:00