Files
composable_kernel/example
juuso-oskari 3ab4df37e2 Refactor CK-UA dispatcher around KernelVariant + select_config
The previous dispatcher was a 4-deep nested-if cascade that picked one
of seven DISPATCH_* macros based on (hdim, num_queries_per_kv, dtype,
mask, tile_tier, use_bs32). The macro names hid both the traits class
and the dispatch path, so reasoning about "what kernel runs for shape
X" required reading the whole file.

Replace it with two named layers:

  1. KernelVariant enum -- a flat list of every compiled instance.
  2. select_config(args) -- the only place runtime decisions live;
     reads the problem and emits a KernelConfig{variant, ...}.

The final switch over the variant calls into per-variant dispatch
helpers that fan out over (dtype, mask) via the existing DISPATCH_*
macros. Behaviour is unchanged: each old (hdim, nqpkv, tier, p32) tuple
maps 1:1 to a KernelVariant, and the same instance is launched.

Follow-up commits in this series will:
  - add a dedicated d=128 MHA decode variant
  - delete the _p32 ("bs32") family now that the multi-page-tile fix
    in the pipeline makes kBlockN independent of page_size

Test: ua-test-scripts/test_unified_attention_ck_correctness.py
      stays at 236/240 (same 4 pre-existing int32-overflow failures).
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 09:27:59 +00:00
..
2026-01-14 07:31:45 -08:00