Commit Graph

6 Commits

Author SHA1 Message Date
Damien Lejeune
b686143624 Adding SWA decode dispatcher to support GPT-OSS shape + update smoke test 2026-05-08 14:38:16 +00:00
Damien Lejeune
f438cef286 Add smoke tests for SWA edge cases and performance gating 2026-05-08 11:30:48 +00:00
Damien Lejeune
5afd97ff5b Adding SWA implementation + instances 2026-05-08 08:52:25 +00:00
Damien Lejeune
c132e6fc18 Prepare the interface to support SWA 2026-05-07 13:52:56 +00:00
root
65a3f88ad8 Fix CK-UA mixed batch: use max_seqlen_q for tier selection
Decode grid (num_kv_heads, num_seqs) assumes each seq has <= kBlockQ
tokens. For mixed batches (decode + prefill), avg_q is low but some
seqs have hundreds of tokens, causing truncation. Added max_seqlen_q
to args and check it in select_tile_tier to force medium tier (1D
grid with Q tile iteration) for mixed batches.

362/362 no-window shapes now pass.

Made-with: Cursor
2026-04-01 18:09:48 +00:00
root
4c5e290378 Add unified attention (42_unified_attention) and topk_softmax_decode
Squashed from aghamari/unified-attention-decode-opt branch.

42_unified_attention: CK tile paged-KV attention kernel optimized for
decode with 4-tier dispatch (tiny/small/medium/large), 16x16 MFMA,
2D decode grid, head-group merging. Supports hdim=64 GQA-8 and
hdim=128 MHA with block_size=32.

topk_softmax_decode: fused topk + softmax kernel for M=1 MoE decode.

Made-with: Cursor
2026-04-01 16:24:04 +00:00