Fixes https://github.com/NVIDIA/cutlass/issues/3268
A `@cute.struct` instance captured into an `scf.if` branch or `scf.while`
body fails the DSL trace with:
DSLRuntimeError: The 'if' statement encountered a user-defined Python
object, which cannot be automatically converted into an dynamic
expression.
This blocks the natural warp-specialization pattern, where each
`if warp_idx == <role>:` branch reads its tile from a shared storage
struct.
A struct instance is fully described by its `base` pointer (already
DynamicExpression-aware via `_Pointer`); every field instance is
re-derived from `base + static offsets` on construction. Implement the
DynamicExpression protocol on each decorated class by forwarding
`__get_mlir_types__` / `__extract_mlir_values__` to `base`, and
`__new_from_mlir_values__` to a fresh decorator invocation that
re-derives the fields from a rebuilt base pointer.
Tested in Docker on cutlass-dsl 4.5.1 with six new unit tests in
test/python/CuTeDSL/test_struct_in_if.py covering:
* the original failing case (storage.get_tensor inside dynamic if),
* regression: plain non-branched struct usage still works,
* nested struct (struct-of-struct) inside a dynamic if,
* if/else with both branches accessing the struct,
* if/elif/elif/else (the actual warp-specialization shape),
* scf.while body capturing the struct.
* Release 3.3.0
Adds support for mixed precision GEMMs On Hopper and Ampere
Adds support for < 16B aligned GEMMs on Hopper
Enhancements to EVT
Enhancements to Python interface
Enhancements to Sub-byte type handling in CuTe
Several other bug-fixes and performance improvements.
* minor doc update