Files
Amir Ghamarian 4af7e472a3 Add unified attention kernel on top of CK develop
Cherry-picked all unified attention files from aghamari/unified-attention-decode-opt
onto CK develop (046d3ac27). Includes:
- Unified attention pipeline, kernel, and block masking
- All kernel tiers: large (8-warp), medium (4-warp), small (2-warp), tiny (1-warp)
- block_size=32 support with bs32 narrow tier (2-warp 16x16 MFMA kBlockM=32)
- int32 overflow fix (long_index_t for KV cache strides)
- BlockSize_ template parameter for flexible page block sizes
- Example binary and 40 instance files

Made-with: Cursor
2026-03-30 17:37:12 +00:00
..
2026-01-14 07:31:45 -08:00