Files
composable_kernel/example
root 65a3f88ad8 Fix CK-UA mixed batch: use max_seqlen_q for tier selection
Decode grid (num_kv_heads, num_seqs) assumes each seq has <= kBlockQ
tokens. For mixed batches (decode + prefill), avg_q is low but some
seqs have hundreds of tokens, causing truncation. Added max_seqlen_q
to args and check it in select_tile_tier to force medium tier (1D
grid with Q tile iteration) for mixed batches.

362/362 no-window shapes now pass.

Made-with: Cursor
2026-04-01 18:09:48 +00:00
..
2026-01-14 07:31:45 -08:00