cutlass

mirror of https://github.com/NVIDIA/cutlass.git synced 2026-06-29 10:57:06 +00:00

Author	SHA1	Message	Date
George Karpenkov	1732ed7da3	[CuTeDSL] Make @cute.struct instances flattenable across scf.if / scf.while (#3270 ) Fixes https://github.com/NVIDIA/cutlass/issues/3268 A `@cute.struct` instance captured into an `scf.if` branch or `scf.while` body fails the DSL trace with: DSLRuntimeError: The 'if' statement encountered a user-defined Python object, which cannot be automatically converted into an dynamic expression. This blocks the natural warp-specialization pattern, where each `if warp_idx == <role>:` branch reads its tile from a shared storage struct. A struct instance is fully described by its `base` pointer (already DynamicExpression-aware via `_Pointer`); every field instance is re-derived from `base + static offsets` on construction. Implement the DynamicExpression protocol on each decorated class by forwarding `__get_mlir_types__` / `__extract_mlir_values__` to `base`, and `__new_from_mlir_values__` to a fresh decorator invocation that re-derives the fields from a rebuilt base pointer. Tested in Docker on cutlass-dsl 4.5.1 with six new unit tests in test/python/CuTeDSL/test_struct_in_if.py covering: * the original failing case (storage.get_tensor inside dynamic if), * regression: plain non-branched struct usage still works, * nested struct (struct-of-struct) inside a dynamic if, * if/else with both branches accessing the struct, * if/elif/elif/else (the actual warp-specialization shape), * scf.while body capturing the struct.	2026-05-29 08:34:48 +08:00
Junkai-Wu	0d2b201e8c	v4.3.5 update. (#2934 ) * v4.3.5 update. * Update copyright to 2026	2026-01-08 15:02:56 -05:00
Junkai-Wu	6a35b4d22f	v4.2 tag release. (#2638 )	2025-09-15 12:21:53 -04:00
Junkai-Wu	a49a78ffef	v4.2 release. (#2587 ) * Fix default cluster callback values to 1 to avoid profiler failure when these values are not set in command line. * v4.2 release.	2025-08-22 18:11:24 -04:00
Junkai-Wu	a1aaf2300a	v4.1 release	2025-07-03 08:07:53 -04:00
Yujia Zhai	62750a2b75	v3.9 (#2185 ) * v3.8 update x * fix blackwell gg * doc change * doc change * doc change --------- Co-authored-by: yuzhai <yuzhai@nvidia.com> Co-authored-by: Haicheng Wu <haichengw@nvidia.com> Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com>	2025-03-21 01:52:23 -04:00
Yujia Zhai	b78588d163	CUTLASS 3.7 (#2045 ) * CUTLASS 3.7 * clean up changelog --------- Co-authored-by: yuzhai <yuzhai@nvidia.com> Co-authored-by: Haicheng Wu <haichengw@nvidia.com>	2025-01-18 09:53:07 -05:00
Yujia Zhai	3d261a5974	3.6.0 update (#2005 ) * 3.6.0 update * doc and swap stuff --------- Co-authored-by: yuzhai <yuzhai@nvidia.com> Co-authored-by: Haicheng Wu <haichengw@nvidia.com>	2024-12-25 01:34:40 -05:00
Xinyu Yang	f3a3bfcbf2	add maximum support (#1833 )	2024-10-23 12:44:56 -04:00
Vijay Thakkar	7d49e6c7e2	Updates for CUTLASS 3.5.0 (#1468 )	2024-04-11 21:33:40 -04:00
Vijay Thakkar	629f4653c3	CUTLASS 3.5.0 (#1411 )	2024-03-19 17:51:04 -04:00
ANIKET SHIVAM	751eb9a885	Update license year (#1306 )	2024-01-16 14:37:22 -05:00
ANIKET SHIVAM	2f589ffa76	Updates for 3.4 release. (#1305 )	2024-01-16 13:42:51 -05:00
Pradeep Ramani	8236f30675	CUTLASS 3.4.0 (#1286 ) * CUTLASS 3.4.0 * Update CHANGELOG.md --------- Co-authored-by: Pradeep Ramani <prramani@nvidia.com>	2023-12-29 15:21:31 -05:00
Pradeep Ramani	c008b4aea8	CUTLASS 3.3.0 (#1167 ) * Release 3.3.0 Adds support for mixed precision GEMMs On Hopper and Ampere Adds support for < 16B aligned GEMMs on Hopper Enhancements to EVT Enhancements to Python interface Enhancements to Sub-byte type handling in CuTe Several other bug-fixes and performance improvements. * minor doc update	2023-11-02 11:09:05 -04:00
ANIKET SHIVAM	90d3b0fb18	CUTLASS 3.2.1 (#1113 ) * Updates for 3.2.1 release. * Minor fix in gemm op profiler for raster order. * Add scheduler mapping for raster order in the kernels.	2023-09-26 17:24:26 -04:00
ANIKET SHIVAM	a88c41cf8d	Updates for 3.2 release (#1065 )	2023-08-25 23:05:46 -04:00
ANIKET SHIVAM	4575443d44	CUTLASS 3.2 (#1024 ) * CUTLASS 3.2	2023-08-07 20:50:32 -04:00
ANIKET SHIVAM	d572cc1aab	CUTLASS 3.1 (#915 ) Co-authored-by: Aniket Shivam <ashivam@nvidia.com>	2023-04-14 23:19:34 -04:00

19 Commits