mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-06-29 19:28:33 +00:00
- Pipeline: remove kPageBlockSize >= kN0 static_assert; QK dequant now precomputes tile_k_pages[] and indexes per-column. page_size >= kN0 stays on the original single-page fast path (kPagesPerTile==1). - Codegen: add page_size=64 to SUPPORTED_PAGE_SIZE; drop per_token_head from the page_size < tile.F_bn0 filter (kv_blockscale still filtered).