Commit Graph

3 Commits

Author SHA1 Message Date
Damien Lejeune
5afd97ff5b Adding SWA implementation + instances 2026-05-08 08:52:25 +00:00
Damien Lejeune
076d505826 Add smoke test for SWA 2026-05-07 14:32:03 +00:00
root
4c5e290378 Add unified attention (42_unified_attention) and topk_softmax_decode
Squashed from aghamari/unified-attention-decode-opt branch.

42_unified_attention: CK tile paged-KV attention kernel optimized for
decode with 4-tier dispatch (tiny/small/medium/large), 16x16 MFMA,
2D decode grid, head-group merging. Supports hdim=64 GQA-8 and
hdim=128 MHA with block_size=32.

topk_softmax_decode: fused topk + softmax kernel for M=1 MoE decode.

Made-with: Cursor
2026-04-01 16:24:04 +00:00