mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-16 02:54:21 +00:00
- Added MaxNumBlocks template parameter to all kernel traits - Propagated through pipeline problem and pipeline - Added compile-time kNeedsRebasing check with if constexpr blocks - Created small-cache optimized instantiations (MaxNumBlocks=100000) - Added runtime dispatch logic for small vs large cache - 3.7% performance improvement for small caches vs runtime check