mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-14 02:02:46 +00:00
* WIP: refactoring
* Swap operation/data nested loops order
* Improve memory coalescing
* Add comments
* Enforce same identity element for the reduce operations
* Re-add compile time constant
* Comment + re-add __builtin_amdgcn_readfirstlane(0) to the loop init
---------
Co-authored-by: Damien Lejeune <damien.lejeune@amd.com>
[ROCm/composable_kernel commit: 91e32f305f]