root
4c5e290378
Add unified attention (42_unified_attention) and topk_softmax_decode
Squashed from aghamari/unified-attention-decode-opt branch.
42_unified_attention: CK tile paged-KV attention kernel optimized for
decode with 4-tier dispatch (tiny/small/medium/large), 16x16 MFMA,
2D decode grid, head-group merging. Supports hdim=64 GQA-8 and
hdim=128 MHA with block_size=32.
topk_softmax_decode: fused topk + softmax kernel for M=1 MoE decode.
Made-with: Cursor
2026-04-01 16:24:04 +00:00
..
2025-11-26 11:00:05 -07:00
2025-12-02 13:30:27 +01:00
2025-11-26 11:00:05 -07:00
2026-03-27 09:18:14 +00:00
2026-03-13 01:21:08 +00:00
2026-03-31 03:40:25 +00:00
2026-03-03 21:55:14 +00:00
2026-04-01 05:45:19 +00:00
2026-02-25 16:13:13 +00:00
2026-04-01 16:22:08 +00:00
2026-03-27 20:37:23 +00:00
2026-03-17 18:58:56 +00:00
2026-03-31 08:03:41 +00:00
2025-11-26 11:00:05 -07:00
2025-11-26 11:00:05 -07:00
2025-11-26 11:00:05 -07:00
2025-11-26 11:00:05 -07:00
2026-01-13 09:21:29 -08:00
2026-01-27 12:56:09 -08:00
2026-01-30 10:52:19 +08:00
2025-11-26 11:00:05 -07:00
2026-02-11 05:52:42 +00:00
2026-01-31 00:59:47 +08:00
2025-11-26 11:00:05 -07:00
2026-01-13 09:21:29 -08:00
2026-04-01 16:24:04 +00:00
2026-03-02 12:21:44 +00:00
2026-03-02 12:21:44 +00:00
2026-03-02 12:21:44 +00:00
2026-03-02 12:21:44 +00:00
2026-03-02 12:21:44 +00:00
2026-03-02 12:21:44 +00:00
2026-03-02 12:21:44 +00:00
2026-03-11 10:00:52 +00:00
2026-03-02 12:21:44 +00:00
2026-03-12 08:27:49 +00:00
2026-03-16 08:31:56 +00:00
2026-03-16 08:31:56 +00:00
2026-03-12 08:27:49 +00:00
2026-03-02 12:21:44 +00:00
2026-03-02 12:21:44 +00:00
2025-11-26 11:00:05 -07:00
2026-03-02 12:21:44 +00:00
2026-03-02 12:21:44 +00:00
2026-03-02 12:21:44 +00:00
2026-03-02 12:21:44 +00:00
2026-03-02 12:21:44 +00:00
2026-03-02 12:21:44 +00:00
2026-03-02 12:21:44 +00:00
2026-03-02 12:21:44 +00:00
2026-03-02 12:21:44 +00:00
2026-03-02 12:21:44 +00:00