Files
composable_kernel/example/ck_tile/01_fmha/codegen/ops
Po Yen Chen ad9863fe05 [CK_TILE] Low CU utilization optimization for fMHA fwd kernels (#2402)
* Wrap tile size mapping as class method

* Warp pipeline generating as class method

* Add constraint as kernel dispatching criteria

* Support mutltiple tile size for a (hdim, hdim_v) combination

* Use smaller tile size if CU utilization is low

* Use integar as the key of the tile size map

* Fix type error

* Simply override parent class method return value

* Add attribute to eliminate warnging

* Allow using environment variables to turn on/off custom factory

* Unify param naming style

* Add missing HIP runtime include directive

* Fix os.environ.get() usage
2025-07-09 22:01:33 +08:00
..