[CK] Add rocm_ck schema engine: Signature, resolve(),
ArchProperties (#7179)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
## Summary
A `Signature` is a directed compute graph: tensors are nodes, operators
are edges. Shared names between operator outputs and inputs form the
graph structure. `resolve()` walks this graph at compile time
(`consteval`), inferring dtype, rank, and layout for every tensor —
invalid configs become compiler errors, not runtime crashes.
**Key design decisions:**
- **Operators teach the system about tensors.** `GemmOp` implies rank 2
and Row/Col/Row layout. `AddOp` and `ReluOp` propagate from connected
slots. The dtype cascade fills in the rest: per-tensor → signature-wide
→ error.
- **Adding a new op is zero lines in the resolution engine** if it's
structurally binary (`lhs/rhs/out`) or unary (`in/out`) — C++20 concepts
handle dispatch automatically. Only ops with special semantics need
explicit branches.
- **TargetSet is a compile-time bitset over GPU targets.** The wave tile
validation table is the single source of truth for valid instruction
shapes, traced from CK Tile's WarpGemmDispatcher. FP8 tiles are
available on gfx942+ via IterateK composition, not gfx950-only.
**Reading order:** signature.hpp (data model) → arch_properties.hpp
(TargetSet, wave tiles) → resolve.hpp (resolution engine).
3 new headers, 3 unit tests (including diamond DAG coverage), 3
compile-fail tests. Introduces tests/compile_fail/ infrastructure.
**Stack**: PR 2 of 3 porting the rocm_ck constexpr schema from
experimental to production.
1. Foundation types — DataType, Layout, Args, Ops (#7114)
2. **This PR** — Schema engine (graph resolution)
3. Spec factories — GemmSpec, makeSpec() (#7180 )
Note: We also removed `FmhaBwdOp` for clarity, since that was introduced
early and doesn't have tests set up.
**Depends on**: #7114
## Test plan
- [x] ctest --test-dir build --output-on-failure — unit tests +
compile-fail tests pass
- [x] Compile-fail tests correctly reject: mixed CDNA+RDNA TargetSet,
conflicting layouts, empty quantization scale names