Files
composable_kernel/example
juuso-oskari fb0d729fbb Collapse CK-UA traits into single kernel_traits<V, DType, IsMask> template
Replace 4 near-identical *_kernel_traits classes (~400 lines of repeated
shape/policy plumbing) with one templated `unified_attention_kernel_traits`
parameterized by `KernelVariant V`. The 6 dispatch_<variant> helpers in
unified_attention.cpp collapse into a single `dispatch_variant<V>` function
template that fans out over (dtype, mask).

Per-variant compile-time knobs (BlockM, BlockSize, warp count, MFMA shape,
pipeline policy, decode-grid flag) now live in one variant_config<V>
specialization each. "What's different between variants" is readable
top-to-bottom in a single block of code, and each instance .cpp shrinks to
a one-line `INST_UNIFIED_ATTENTION_DISPATCH(V, DTYPE, IS_MASK)` macro.

No behavior change. Correctness suite: 236/240 (same 4 known
num_blocks=32768 + d=128 MHA int32-overflow failures as baseline).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 10:35:15 +00:00
..
2026-01-14 07:31:45 -08:00