diff --git a/CHANGELOG.md b/CHANGELOG.md index b97d4ccb8..b6af0b9d0 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,13 +5,6 @@ ## [4.5.0](https://github.com/NVIDIA/cutlass/tree/main) (2026-03-27) ### CuTe DSL -* New features - - Auto-deduced smem size for launching kernels - - Launch config `smem` now defaults to `None` for auto-calculating kernel shared memory usage, which is recommended unless manual control is required. - - Warnings will be raised when the manually set shared memory size is insufficient or exceeds the GPU maximum. - - The default shared memory usage calculation aligns with CUDA C++ static shared memory behavior, i.e. summing all allocations additively. - - An additional launch option `smem_merge_branch_allocs` is provided to merge shared memory allocations across mutually exclusive code branches, which is recommended for inlined mega-kernels to reduce total footprint. - * Bug fixing and improvements - Improved source code correlation for profiling/debugging diff --git a/README.md b/README.md index af62b1727..c410237e8 100644 --- a/README.md +++ b/README.md @@ -46,13 +46,6 @@ To get started quickly - please refer : # What's New in CUTLASS 4.5 ### CuTe DSL -* New features - - Auto-deduced smem size for launching kernels - - Launch config `smem` now defaults to `None` for auto-calculating kernel shared memory usage, which is recommended unless manual control is required. - - Warnings will be raised when the manually set shared memory size is insufficient or exceeds the GPU maximum. - - The default shared memory usage calculation aligns with CUDA C++ static shared memory behavior, i.e. summing all allocations additively. - - An additional launch option `smem_merge_branch_allocs` is provided to merge shared memory allocations across mutually exclusive code branches, which is recommended for inlined mega-kernels to reduce total footprint. - * Bug fixing and improvements - Improved source code correlation for profiling/debugging