Update the release note for 4.5 dev (#3154)

This commit is contained in:
brandonsun
2026-04-08 10:02:46 +08:00
committed by GitHub
parent a221da7ccf
commit bd01dd3651
2 changed files with 0 additions and 14 deletions

View File

@@ -5,13 +5,6 @@
## [4.5.0](https://github.com/NVIDIA/cutlass/tree/main) (2026-03-27)
### CuTe DSL
* New features
- Auto-deduced smem size for launching kernels
- Launch config `smem` now defaults to `None` for auto-calculating kernel shared memory usage, which is recommended unless manual control is required.
- Warnings will be raised when the manually set shared memory size is insufficient or exceeds the GPU maximum.
- The default shared memory usage calculation aligns with CUDA C++ static shared memory behavior, i.e. summing all allocations additively.
- An additional launch option `smem_merge_branch_allocs` is provided to merge shared memory allocations across mutually exclusive code branches, which is recommended for inlined mega-kernels to reduce total footprint.
* Bug fixing and improvements
- Improved source code correlation for profiling/debugging

View File

@@ -46,13 +46,6 @@ To get started quickly - please refer :
# What's New in CUTLASS 4.5
### CuTe DSL
* New features
- Auto-deduced smem size for launching kernels
- Launch config `smem` now defaults to `None` for auto-calculating kernel shared memory usage, which is recommended unless manual control is required.
- Warnings will be raised when the manually set shared memory size is insufficient or exceeds the GPU maximum.
- The default shared memory usage calculation aligns with CUDA C++ static shared memory behavior, i.e. summing all allocations additively.
- An additional launch option `smem_merge_branch_allocs` is provided to merge shared memory allocations across mutually exclusive code branches, which is recommended for inlined mega-kernels to reduce total footprint.
* Bug fixing and improvements
- Improved source code correlation for profiling/debugging