* Wrap tile size mapping as class method
* Warp pipeline generating as class method
* Add constraint as kernel dispatching criteria
* Support mutltiple tile size for a (hdim, hdim_v) combination
* Use smaller tile size if CU utilization is low
* Use integar as the key of the tile size map
* Fix type error
* Simply override parent class method return value
* Add attribute to eliminate warnging
* Allow using environment variables to turn on/off custom factory
* Unify param naming style
* Add missing HIP runtime include directive
* Fix os.environ.get() usage
[ROCm/composable_kernel commit: ad9863fe05]