mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-14 18:17:44 +00:00
* Use smaller width for lse_accum dist tensor
* Update pipeline comment
* Fix wrong distribution for lse_accum
* Remove duplicate dim in lse_accum dist encoding
* Decide fmha splitkv combine kernel kBlockSize by kM0
* Remove assumption of MPerThread=1
* Add log<4> & log<8> specialization
* Enlarge occupancy array
* Fix vector size for small tile
* Add support for kMaxSplits=8
* Re-format gemm.hpp
* Use 16x16x16 warp gemm for fwd_splitkv
* Centralize policy code changes
* Leave fp8/bf8 tile settings unchanged
[ROCm/composable_kernel commit: 95e722a3b3]