cutlass

mirror of https://github.com/NVIDIA/cutlass.git synced 2026-06-29 10:57:06 +00:00

Author	SHA1	Message	Date
Haicheng Wu	1afc6d355b	port nvrtc change to version.h update to 4.3.6 alignment-related miscalculation for pipeline stages Allow larger library on 64bit platform add more changelog items remove whitespace	2026-06-20 06:42:52 -07:00
Junkai-Wu	4faf1a1568	v4.3.5 update. (#2935 ) * v4.3.5 update. * Update copyright to 2026.	2026-01-08 15:02:14 -05:00
Junkai-Wu	7233a05f24	v4.3.4 update. (#2893 )	2025-12-21 11:49:35 -05:00
Junkai-Wu	5873443bb6	v4.3.3 update (#2869 ) * v4.3.3 update. * fix print_layout printf format in device code (#2688) * fix print_layout printf format in device code * Replace %.s format specifier with explicit loop Remove unused delim variable The printf format %.s with dynamic width does not work correctly in CUDA device code, causing literal %.s to appear in output. Fixes #2496 * Update include/cute/util/print_tensor.hpp Co-authored-by: Cris Cecka <ccecka@users.noreply.github.com> * Update include/cute/util/print_tensor.hpp Co-authored-by: Cris Cecka <ccecka@users.noreply.github.com> --------- Co-authored-by: Cris Cecka <ccecka@users.noreply.github.com> * Support PDL for SM90 Array TMA GEMM * Update changelog --------- Co-authored-by: Amin Sedaghat <35748194+Aminsed@users.noreply.github.com> Co-authored-by: Cris Cecka <ccecka@users.noreply.github.com>	2025-12-11 00:26:17 -05:00
Junkai-Wu	ff35fa561d	v4.3.2 update. (#2840 )	2025-12-04 10:14:50 -05:00
Haicheng Wu	10d4651439	Bump version from 4.2.0 to 4.3.1	2025-12-01 19:17:19 -08:00
Junkai-Wu	5fd9685dce	v4.3.1 update (#2818 ) * Blockscaled Ragged Contiguous Grouped Gemm for MoEs (#2790) * Adding blockscaled ragged contiguous grouped gemm for MoEs * cleaning up the example * introduction to example improved --------- Co-authored-by: Shreya Gaur <shgaur@dc2-container-xterm-012.prd.it.nvidia.com> * v4.3.1 update. --------- Co-authored-by: Shreya Gaur <48754356+Shreya-gaur@users.noreply.github.com> Co-authored-by: Shreya Gaur <shgaur@dc2-container-xterm-012.prd.it.nvidia.com>	2025-11-27 09:48:55 -05:00
Haicheng Wu	ddaf12c1b1	Bump version from 4.2.0 to 4.3.0	2025-11-24 16:35:27 -05:00
Junkai-Wu	8cd5bef43a	v4.3 tag release update. (#2789 )	2025-11-20 20:49:44 -05:00
Junkai-Wu	b1d6e2c9b3	v4.3 update. (#2709 ) * v4.3 update. * Update the cute_dsl_api changelog's doc link * Update version to 4.3.0 * Update the example link * Update doc to encourage user to install DSL from requirements.txt --------- Co-authored-by: Larry Wu <larwu@nvidia.com>	2025-10-21 14:26:30 -04:00
Junkai-Wu	6a35b4d22f	v4.2 tag release. (#2638 )	2025-09-15 12:21:53 -04:00
Junkai-Wu	a49a78ffef	v4.2 release. (#2587 ) * Fix default cluster callback values to 1 to avoid profiler failure when these values are not set in command line. * v4.2 release.	2025-08-22 18:11:24 -04:00
Yujia Zhai	b78588d163	CUTLASS 3.7 (#2045 ) * CUTLASS 3.7 * clean up changelog --------- Co-authored-by: yuzhai <yuzhai@nvidia.com> Co-authored-by: Haicheng Wu <haichengw@nvidia.com>	2025-01-18 09:53:07 -05:00
ANIKET SHIVAM	751eb9a885	Update license year (#1306 )	2024-01-16 14:37:22 -05:00
ANIKET SHIVAM	2f589ffa76	Updates for 3.4 release. (#1305 )	2024-01-16 13:42:51 -05:00
Pradeep Ramani	8236f30675	CUTLASS 3.4.0 (#1286 ) * CUTLASS 3.4.0 * Update CHANGELOG.md --------- Co-authored-by: Pradeep Ramani <prramani@nvidia.com>	2023-12-29 15:21:31 -05:00
Pradeep Ramani	c008b4aea8	CUTLASS 3.3.0 (#1167 ) * Release 3.3.0 Adds support for mixed precision GEMMs On Hopper and Ampere Adds support for < 16B aligned GEMMs on Hopper Enhancements to EVT Enhancements to Python interface Enhancements to Sub-byte type handling in CuTe Several other bug-fixes and performance improvements. * minor doc update	2023-11-02 11:09:05 -04:00

17 Commits