cutlass

mirror of https://github.com/NVIDIA/cutlass.git synced 2026-05-13 17:55:42 +00:00

Author	SHA1	Message	Date
Junkai-Wu	d4bbf728ca	v4.4 tag release update. (#3032 )	2026-02-13 23:27:58 -05:00
Junkai-Wu	9fba3195f9	v4.4 update. (#2979 )	2026-01-24 11:46:17 -05:00
Brian K. Ryu	147f5673d0	New RMS Norm example with unit tests (#2917 ) * Add rmsnorm example * Address reviewer comments. (1) use the cute.runtime definition directly. (2) use the nvvm_wrapper's warp reduce directly * Separate out reduce.py * Change copyright notice years	2026-01-13 09:05:31 +08:00
Junkai-Wu	0d2b201e8c	v4.3.5 update. (#2934 ) * v4.3.5 update. * Update copyright to 2026	2026-01-08 15:02:56 -05:00
questa-quan-wang	2aee73922c	Minor fix for testing of blockscaled dense GEMM with TMA prefetch (#2930 ) * new example with TMA prefetch feature targeting for DRAM latency bound cases * minor fix to resitrct as 100a arch * typo * apply arch for whole pytest --------- Co-authored-by: Questa Wang <questaw@computelab-frontend-7.nvidia.com> Co-authored-by: Questa Wang <questaw@umbriel-b200-145.ipp4a1.colossus.nvidia.com>	2026-01-05 16:36:03 +08:00
questa-quan-wang	3f4c086d09	new example with TMA prefetch feature targeting for DRAM latency bound cases (#2881 ) Co-authored-by: Questa Wang <questaw@computelab-frontend-7.nvidia.com>	2025-12-23 15:29:48 +08:00
Linfeng Zheng	f6402fcd5e	add pytest support for tutorial gemm (#2826 ) * add pytest support for tutorial gemm * add license	2025-12-05 08:45:01 -05:00
Junkai-Wu	8cd5bef43a	v4.3 tag release update. (#2789 )	2025-11-20 20:49:44 -05:00
Junkai-Wu	b1d6e2c9b3	v4.3 update. (#2709 ) * v4.3 update. * Update the cute_dsl_api changelog's doc link * Update version to 4.3.0 * Update the example link * Update doc to encourage user to install DSL from requirements.txt --------- Co-authored-by: Larry Wu <larwu@nvidia.com>	2025-10-21 14:26:30 -04:00
Junkai-Wu	7a6d4ee099	v4.2.1 update. (#2666 )	2025-09-23 13:25:43 -04:00
Junkai-Wu	6a35b4d22f	v4.2 tag release. (#2638 )	2025-09-15 12:21:53 -04:00
Junkai-Wu	a49a78ffef	v4.2 release. (#2587 ) * Fix default cluster callback values to 1 to avoid profiler failure when these values are not set in command line. * v4.2 release.	2025-08-22 18:11:24 -04:00
Inoday Yadav	42e7c546c4	Add movmatrix support (movmatrix.sync.aligned.m8n8.trans.b16) (#2562 )	2025-08-19 22:22:02 -04:00
Srinath Kailasa	3b054767b3	Fix typo (#2514 )	2025-07-30 22:14:54 -04:00
kf-zhang	26b7450023	support fp16 accmulator for sm89 fp8 mma (#2378 ) * add support for sm89 in cute and the unit tests * support fp16 accmulator for sm89 fp8 mma * format code	2025-07-30 22:12:08 -04:00
Junkai-Wu	a1aaf2300a	v4.1 release	2025-07-03 08:07:53 -04:00
Junkai-Wu	8bdbfca682	v4.0 update. (#2371 )	2025-06-06 02:39:20 -04:00
Kihiro Bando	f115c3f854	Release v4.0.0 (#2294 )	2025-05-13 15:55:29 -04:00
Yujia Zhai	331a1f5b3f	cutlass 3.9 update (#2255 ) * cutlass 3.9 update * rebase * fixes out of shared memory for blockwise Blackwell * doc format * fix issue 2253 * disable host ref by default * fix sm120 smem capacity --------- Co-authored-by: yuzhai <yuzhai@nvidia.com> Co-authored-by: Haicheng Wu <haichengw@nvidia.com>	2025-04-24 15:42:40 -04:00
kf-zhang	19cc2a5feb	add support for sm89 in cute and the unit tests (#2177 ) * add support for sm89 in cute and the unit tests * rebase v3.9 and format code * minor fix --------- Co-authored-by: Haicheng Wu <haichengw@nvidia.com>	2025-04-10 14:16:36 -04:00
Yujia Zhai	79fc51f4b8	v3.9 update (#2213 ) Co-authored-by: yuzhai <yuzhai@nvidia.com>	2025-04-03 02:10:16 -04:00
Yujia Zhai	6f4921858b	v3.9 update (#2203 ) * v3.9 update * voidD --------- Co-authored-by: yuzhai <yuzhai@nvidia.com>	2025-04-02 15:11:18 -04:00
Yujia Zhai	62750a2b75	v3.9 (#2185 ) * v3.8 update x * fix blackwell gg * doc change * doc change * doc change --------- Co-authored-by: yuzhai <yuzhai@nvidia.com> Co-authored-by: Haicheng Wu <haichengw@nvidia.com> Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com>	2025-03-21 01:52:23 -04:00
Tyler Michael Smith	8c4d1dc47d	Treat negative zero as equivalent to positive zero in sm90_sparse_gemm_compressor.hpp (#2110 ) * Treat negative zero as zero in the sparse gemm compressor Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * format Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * Apply patch Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * sm90_sparse_gemm_compressor.hpp * test/unit/transform/CMakeLists.txt * test/unit/transform/device/sm90_sparse_gemm_compressor_legacy.hpp * include/cutlass/numeric_types.h --------- Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Haicheng Wu <haichengw@nvidia.com> Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com>	2025-03-21 01:44:17 -04:00
Yujia Zhai	b84e9802d8	update 3.8 v2 (#2112 ) * update 3.8 v2 * update 3.8 --------- Co-authored-by: yuzhai <yuzhai@nvidia.com>	2025-02-19 22:03:14 -05:00
Yujia Zhai	833f6990e0	v3.8.0 update (#2082 ) * 3.8 update * fix Markus' name --------- Co-authored-by: yuzhai <yuzhai@nvidia.com>	2025-02-06 21:33:40 -05:00
mihir-awatramani	389e493055	CUTLASS 3.8 Release (#2059 ) * CUTLASS 3.8 Release * update * Update README.md * Revert "Update README.md" This reverts commit `b353e36fe8`. * update * update --------- Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com> Co-authored-by: Haicheng Wu <haichengw@nvidia.com>	2025-01-25 02:44:06 -05:00
Yujia Zhai	b78588d163	CUTLASS 3.7 (#2045 ) * CUTLASS 3.7 * clean up changelog --------- Co-authored-by: yuzhai <yuzhai@nvidia.com> Co-authored-by: Haicheng Wu <haichengw@nvidia.com>	2025-01-18 09:53:07 -05:00
Yujia Zhai	3d261a5974	3.6.0 update (#2005 ) * 3.6.0 update * doc and swap stuff --------- Co-authored-by: yuzhai <yuzhai@nvidia.com> Co-authored-by: Haicheng Wu <haichengw@nvidia.com>	2024-12-25 01:34:40 -05:00
Lain	80243e0b8c	add {uint4, uint2, int2} => {fp16, bf16} conversion (#1966 )	2024-12-03 14:03:43 -05:00
侯奇	12626bcfe4	Update gemm_f16n_f16t_f32t_tensor_op_f32_sm80.cu with include "cutlass/gemm/device/gemm_universal.h" (#1569 ) fix compile with `cmake .. -DCUTLASS_ENABLE_TESTS=ON -DCUTLASS_TEST_LEVEL=2`	2024-10-23 12:56:36 -04:00
Xinyu Yang	f3a3bfcbf2	add maximum support (#1833 )	2024-10-23 12:44:56 -04:00
Yujia Zhai	cc3c29a81a	CUTLASS 3.6.0 (#1850 ) * v3.6 * update changelog * update readme * fix typo * fixing typos * hopper gemm with weight prefetch --------- Co-authored-by: yuzhai <yuzhai@nvidia.com> Co-authored-by: Haicheng Wu <haichengw@nvidia.com>	2024-10-09 15:33:27 -04:00
Alexander Zinoviev	477a677317	Fix typos in test/unit/conv/cache_testbed_output.h (#1652 ) Co-authored-by: Alexander Zinoviev <azinoviev@tesla.com>	2024-10-07 12:39:11 -04:00
Junkai-Wu	dbdae514e0	Support for TMA Epilogue for Group Gemm and add pingpong ptr array & Group Gemm (#1795 )	2024-09-11 00:07:31 -04:00
Aleksandar Samardžić	e1976daacc	Add support for mixed 4-bit/8-bit data types GEMM (#1413 ) * Add support for mixed 4-bit/8-bit data types GEMM * fix ( and ) --------- Co-authored-by: Aleksandar Samardžić <asamardzic@matf.bg.ac.rs> Co-authored-by: Haicheng Wu <haichengw@nvidia.com>	2024-08-29 23:11:06 -04:00
Aleksandar Samardžić	3f084f7f3c	Add couple configs into generator.py for mixed input MM (#1350 ) * Add couple configs into generator.py for mixed input MM * change one unit test name; reenable 128x32 in the profiler * Added U8/BF16 tests. --------- Co-authored-by: Haicheng Wu <haichengw@nvidia.com> Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com>	2024-08-16 00:59:29 -04:00
Haicheng Wu	8d8cfdf375	update 3.5.1 readme/changelog	2024-08-14 21:12:44 -07:00
Mark Hoemmen	19b4c5e065	Fix isnan namespace qualification in cutlass/functional.h (#1679 ) * Fix unrelated MSVC build warnings * Fix use of isnan in functional.h Correct namespace qualification of isnan in functional.h so that it invokes cutlass::isnan for half_t, instead of converting half_t to float and invoking std::isnan (on host, or ::isnan on device).	2024-08-05 14:28:13 -04:00
Vijay Thakkar	be60a0b272	CUTLASS 3.5.1 (#1623 ) * CUTLASS 3.5.1 * updates, optimizations, fixes	2024-07-29 08:46:24 -04:00
Daniel Richard G	d6580c3dc0	Support use of external/system GTest installation (#1469 ) * Support use of system/external GTest installation * Create working directory for tests explicitly	2024-07-10 11:07:57 -04:00
Alexander Zinoviev	dbfced05e7	Fix typos in convolution tests (#1433 )	2024-07-10 11:00:52 -04:00
Vijay Thakkar	7d49e6c7e2	Updates for CUTLASS 3.5.0 (#1468 )	2024-04-11 21:33:40 -04:00
reed	19f3cc33f1	Fix uint128 operator add (#1400 ) * fix uint128 operator add for 64-bit hilo implemenation * add uint128 test for operator add * make clang happy --------- Co-authored-by: Haicheng Wu <haichengw@nvidia.com>	2024-04-02 13:32:18 -04:00
Vijay Thakkar	629f4653c3	CUTLASS 3.5.0 (#1411 )	2024-03-19 17:51:04 -04:00
ANIKET SHIVAM	bbe579a9e3	Updates for CUTLASS 3.4.1 (#1346 ) * Updates for CUTLASS 3.4.1 * minor epi change	2024-02-15 15:48:34 -05:00
Aleksandar Samardžić	ca37d632c9	Remove sparse GEMM with row broadcasted bias vector (#1302 ) This reverts commit `d3e72719b4`. Co-authored-by: Aleksandar Samardžić <asamardzic@matf.bg.ac.rs>	2024-01-17 14:06:27 -05:00
Chengquan Jiang	362abbf274	Support ElementD to be void for tma (#1153 ) * Support void D with AuxStore * refine get_element_aux	2024-01-16 18:15:42 -05:00
ANIKET SHIVAM	751eb9a885	Update license year (#1306 )	2024-01-16 14:37:22 -05:00
ANIKET SHIVAM	2f589ffa76	Updates for 3.4 release. (#1305 )	2024-01-16 13:42:51 -05:00

1 2 3

110 Commits