mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-14 10:09:41 +00:00
* Add Rmsnorm2dFwdPipelineModelSensitiveT5Pass
* Update rmsnorm2d_fwd_pipeline_model_sensitive_pass
1. Add BlockReduce2dTreeCrossWarpSync
* Add Rmsnorm2dFusedModelSensitiveEnum
* Update patch
1. Reverse generate.py
2. Remove comment in generate.py
3. Update tree cross warp reduce
* Refactor RMSNorm model enum and introduce T5-like option
* Update the n stage for cross warp reduce
* Add new cmdline option in RMSNorm for new pipeline testing
---------
Co-authored-by: Clement Lin <clement.lin@amd.com>
Co-authored-by: ClementLinCF <162283536+ClementLinCF@users.noreply.github.com>
[ROCm/composable_kernel commit: 3499fe67ff]