Files
composable_kernel/include
root cd7ba6e2e8 Add unified attention (42_unified_attention)
Squashed from aghamari/unified-attention-decode-opt branch.

CK tile paged-KV attention kernel optimized for decode with 4-tier
dispatch (tiny/small/medium/large), 16x16 MFMA, 2D decode grid,
head-group merging. Supports hdim=64 GQA-8 and hdim=128 MHA with
block_size=32.

Made-with: Cursor
2026-04-01 16:39:15 +00:00
..