v4.3.3 update (#2869)

* v4.3.3 update.

* fix print_layout printf format in device code (#2688)

* fix print_layout printf format in device code

* Replace %.*s format specifier with explicit loop
* Remove unused delim variable

The printf format %.*s with dynamic width does not work correctly
in CUDA device code, causing literal %.*s to appear in output.

Fixes #2496

* Update include/cute/util/print_tensor.hpp

Co-authored-by: Cris Cecka <ccecka@users.noreply.github.com>

* Update include/cute/util/print_tensor.hpp

Co-authored-by: Cris Cecka <ccecka@users.noreply.github.com>

---------

Co-authored-by: Cris Cecka <ccecka@users.noreply.github.com>

* Support PDL for SM90 Array TMA GEMM

* Update changelog

---------

Co-authored-by: Amin Sedaghat <35748194+Aminsed@users.noreply.github.com>
Co-authored-by: Cris Cecka <ccecka@users.noreply.github.com>
This commit is contained in:
Junkai-Wu
2025-12-11 13:26:17 +08:00
committed by GitHub
parent 5c149f52a4
commit 5873443bb6
26 changed files with 810 additions and 213 deletions

View File

@@ -133,7 +133,7 @@ def get_option_registry():
this._option_registry = OptionRegistry(device_cc())
return this._option_registry
this.__version__ = '4.3.2'
this.__version__ = '4.3.3'
from cutlass_cppgen.backend import create_memory_pool
from cutlass_cppgen.emit.pytorch import pytorch