root
|
4c5e290378
|
Add unified attention (42_unified_attention) and topk_softmax_decode
Squashed from aghamari/unified-attention-decode-opt branch.
42_unified_attention: CK tile paged-KV attention kernel optimized for
decode with 4-tier dispatch (tiny/small/medium/large), 16x16 MFMA,
2D decode grid, head-group merging. Supports hdim=64 GQA-8 and
hdim=128 MHA with block_size=32.
topk_softmax_decode: fused topk + softmax kernel for M=1 MoE decode.
Made-with: Cursor
|
2026-04-01 16:24:04 +00:00 |
|