Files
composable_kernel/include
msaffari-amd ee3ada6e4a [AITERKER-112] PER_TOKEN_HEAD: support page_size < kN0 via cross-page dequant
- Pipeline: remove kPageBlockSize >= kN0 static_assert; QK dequant now
  precomputes tile_k_pages[] and indexes per-column. page_size >= kN0 stays
  on the original single-page fast path (kPagesPerTile==1).
- Codegen: add page_size=64 to SUPPORTED_PAGE_SIZE; drop per_token_head from
  the page_size < tile.F_bn0 filter (kv_blockscale still filtered).
2026-05-20 14:21:12 +00:00
..