mirror of
https://github.com/NVIDIA/cutlass.git
synced 2026-05-11 17:00:05 +00:00
Update the release note for 4.5 dev (#3154)
This commit is contained in:
@@ -5,13 +5,6 @@
|
||||
## [4.5.0](https://github.com/NVIDIA/cutlass/tree/main) (2026-03-27)
|
||||
|
||||
### CuTe DSL
|
||||
* New features
|
||||
- Auto-deduced smem size for launching kernels
|
||||
- Launch config `smem` now defaults to `None` for auto-calculating kernel shared memory usage, which is recommended unless manual control is required.
|
||||
- Warnings will be raised when the manually set shared memory size is insufficient or exceeds the GPU maximum.
|
||||
- The default shared memory usage calculation aligns with CUDA C++ static shared memory behavior, i.e. summing all allocations additively.
|
||||
- An additional launch option `smem_merge_branch_allocs` is provided to merge shared memory allocations across mutually exclusive code branches, which is recommended for inlined mega-kernels to reduce total footprint.
|
||||
|
||||
* Bug fixing and improvements
|
||||
- Improved source code correlation for profiling/debugging
|
||||
|
||||
|
||||
@@ -46,13 +46,6 @@ To get started quickly - please refer :
|
||||
# What's New in CUTLASS 4.5
|
||||
|
||||
### CuTe DSL
|
||||
* New features
|
||||
- Auto-deduced smem size for launching kernels
|
||||
- Launch config `smem` now defaults to `None` for auto-calculating kernel shared memory usage, which is recommended unless manual control is required.
|
||||
- Warnings will be raised when the manually set shared memory size is insufficient or exceeds the GPU maximum.
|
||||
- The default shared memory usage calculation aligns with CUDA C++ static shared memory behavior, i.e. summing all allocations additively.
|
||||
- An additional launch option `smem_merge_branch_allocs` is provided to merge shared memory allocations across mutually exclusive code branches, which is recommended for inlined mega-kernels to reduce total footprint.
|
||||
|
||||
* Bug fixing and improvements
|
||||
- Improved source code correlation for profiling/debugging
|
||||
|
||||
|
||||
Reference in New Issue
Block a user