[CK] Add rocm_ck spec factories: GemmSpec, makeSpec() (#7180)
## What this PR does
This is the third PR in the rocm_ck schema stack:
1. **#7150** — Foundation types (DataType, Layout, Args, Ops)
2. **#7163** — Schema engine (Signature, resolve(), ArchProperties)
3. **#7180 (this)** — Spec factories (GemmSpec, makeSpec())
`makeSpec()` is the bridge between user intent and kernel instantiation.
It takes a **Signature** (WHAT to compute — operator graph, dtypes,
layouts) and a **GemmAlgorithm** (HOW to compute it — tile sizes,
pipeline, partitioning) and produces a validated `GemmSpec` — a
structural type ready to use as a non-type template parameter.
The key property: **every constraint is enforced at compile time.** An
invalid GEMM configuration is a compile error, not a runtime crash or
silent corruption. The 33 compile-fail tests are the executable
specification of what's allowed.
## What's interesting
**Physical tensor table.** Not every tensor in a compute graph needs
device memory. The intermediate result of `C = A * B` in a fused
GEMM+Add+ReLU lives only in registers. `makeSpec()` walks the operator
chain and determines which tensors are physical (need Args slots) and
which are intermediate. The output is a fixed-layout table: `[lhs, rhs,
output, D0?, D1?, scale?]`.
**Epilogue composition.** Instead of a combinatorial explosion of named
patterns (GemmAdd, GemmAddRelu, GemmMulSilu, ...), the epilogue is a
composable chain of ops. `{GemmOp, AddOp, ReluOp}` produces
`epilogue_ops = {Add, Relu}` with the bias tensor automatically slotted
as D0. Two consecutive AddOps fold into a single Add with two D tensors
via CK Tile's parameter pack.
**Signature/Algorithm split.** The same Signature can pair with multiple
GemmAlgorithms to produce different tuning variants without changing the
mathematical result. This is the foundation for the dispatcher — one
operation description, many tile configurations.
## New types
| Type | Role |
|------|------|
| `GemmSpec` | Validated NTTP kernel descriptor — physical tensors, tile
geometry, epilogue chain |
| `GemmAlgorithm` | User-facing tuning input — tile sizes, pipeline,
partitioning, padding |
| `EpilogueOp` | NTTP-compatible projection of the Op variant for
epilogue chains |
| `Dim3` | M x N x K triple for tile geometry |
## Test coverage
- **69 unit tests** — happy paths, layouts, dtypes, quantization,
epilogue chains, algorithm variants
- **33 compile-fail tests** — one per constraint (tile divisibility,
INT8 rules, pipeline restrictions, etc.)
- **6 schema compatibility baselines** — frozen specs that break if the
schema changes
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>