Junkai-Wu
d4bbf728ca
v4.4 tag release update. ( #3032 )
2026-02-13 23:27:58 -05:00
Junkai-Wu
9fba3195f9
v4.4 update. ( #2979 )
2026-01-24 11:46:17 -05:00
Brian K. Ryu
147f5673d0
New RMS Norm example with unit tests ( #2917 )
...
* Add rmsnorm example
* Address reviewer comments. (1) use the cute.runtime definition directly. (2) use the nvvm_wrapper's warp reduce directly
* Separate out reduce.py
* Change copyright notice years
2026-01-13 09:05:31 +08:00
Junkai-Wu
0d2b201e8c
v4.3.5 update. ( #2934 )
...
* v4.3.5 update.
* Update copyright to 2026
2026-01-08 15:02:56 -05:00
questa-quan-wang
2aee73922c
Minor fix for testing of blockscaled dense GEMM with TMA prefetch ( #2930 )
...
* new example with TMA prefetch feature targeting for DRAM latency bound cases
* minor fix to resitrct as 100a arch
* typo
* apply arch for whole pytest
---------
Co-authored-by: Questa Wang <questaw@computelab-frontend-7.nvidia.com >
Co-authored-by: Questa Wang <questaw@umbriel-b200-145.ipp4a1.colossus.nvidia.com >
2026-01-05 16:36:03 +08:00
questa-quan-wang
3f4c086d09
new example with TMA prefetch feature targeting for DRAM latency bound cases ( #2881 )
...
Co-authored-by: Questa Wang <questaw@computelab-frontend-7.nvidia.com >
2025-12-23 15:29:48 +08:00
Linfeng Zheng
f6402fcd5e
add pytest support for tutorial gemm ( #2826 )
...
* add pytest support for tutorial gemm
* add license
2025-12-05 08:45:01 -05:00
Junkai-Wu
8cd5bef43a
v4.3 tag release update. ( #2789 )
2025-11-20 20:49:44 -05:00
Junkai-Wu
b1d6e2c9b3
v4.3 update. ( #2709 )
...
* v4.3 update.
* Update the cute_dsl_api changelog's doc link
* Update version to 4.3.0
* Update the example link
* Update doc to encourage user to install DSL from requirements.txt
---------
Co-authored-by: Larry Wu <larwu@nvidia.com >
2025-10-21 14:26:30 -04:00
Junkai-Wu
7a6d4ee099
v4.2.1 update. ( #2666 )
2025-09-23 13:25:43 -04:00
Junkai-Wu
6a35b4d22f
v4.2 tag release. ( #2638 )
2025-09-15 12:21:53 -04:00
Junkai-Wu
a49a78ffef
v4.2 release. ( #2587 )
...
* Fix default cluster callback values to 1 to avoid profiler failure when these values are not set in command line.
* v4.2 release.
2025-08-22 18:11:24 -04:00
Inoday Yadav
42e7c546c4
Add movmatrix support (movmatrix.sync.aligned.m8n8.trans.b16) ( #2562 )
2025-08-19 22:22:02 -04:00
Srinath Kailasa
3b054767b3
Fix typo ( #2514 )
2025-07-30 22:14:54 -04:00
kf-zhang
26b7450023
support fp16 accmulator for sm89 fp8 mma ( #2378 )
...
* add support for sm89 in cute and the unit tests
* support fp16 accmulator for sm89 fp8 mma
* format code
2025-07-30 22:12:08 -04:00
Junkai-Wu
a1aaf2300a
v4.1 release
2025-07-03 08:07:53 -04:00
Junkai-Wu
8bdbfca682
v4.0 update. ( #2371 )
2025-06-06 02:39:20 -04:00
Kihiro Bando
f115c3f854
Release v4.0.0 ( #2294 )
2025-05-13 15:55:29 -04:00
Yujia Zhai
331a1f5b3f
cutlass 3.9 update ( #2255 )
...
* cutlass 3.9 update
* rebase
* fixes out of shared memory for blockwise Blackwell
* doc format
* fix issue 2253
* disable host ref by default
* fix sm120 smem capacity
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com >
Co-authored-by: Haicheng Wu <haichengw@nvidia.com >
2025-04-24 15:42:40 -04:00
kf-zhang
19cc2a5feb
add support for sm89 in cute and the unit tests ( #2177 )
...
* add support for sm89 in cute and the unit tests
* rebase v3.9 and format code
* minor fix
---------
Co-authored-by: Haicheng Wu <haichengw@nvidia.com >
2025-04-10 14:16:36 -04:00
Yujia Zhai
79fc51f4b8
v3.9 update ( #2213 )
...
Co-authored-by: yuzhai <yuzhai@nvidia.com >
2025-04-03 02:10:16 -04:00
Yujia Zhai
6f4921858b
v3.9 update ( #2203 )
...
* v3.9 update
* voidD
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com >
2025-04-02 15:11:18 -04:00
Yujia Zhai
62750a2b75
v3.9 ( #2185 )
...
* v3.8 update x
* fix blackwell gg
* doc change
* doc change
* doc change
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com >
Co-authored-by: Haicheng Wu <haichengw@nvidia.com >
Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com >
2025-03-21 01:52:23 -04:00
Tyler Michael Smith
8c4d1dc47d
Treat negative zero as equivalent to positive zero in sm90_sparse_gemm_compressor.hpp ( #2110 )
...
* Treat negative zero as zero in the sparse gemm compressor
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
* format
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
* Apply patch
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
* sm90_sparse_gemm_compressor.hpp
* test/unit/transform/CMakeLists.txt
* test/unit/transform/device/sm90_sparse_gemm_compressor_legacy.hpp
* include/cutlass/numeric_types.h
---------
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: Haicheng Wu <haichengw@nvidia.com >
Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com >
2025-03-21 01:44:17 -04:00
Yujia Zhai
b84e9802d8
update 3.8 v2 ( #2112 )
...
* update 3.8 v2
* update 3.8
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com >
2025-02-19 22:03:14 -05:00
Yujia Zhai
833f6990e0
v3.8.0 update ( #2082 )
...
* 3.8 update
* fix Markus' name
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com >
2025-02-06 21:33:40 -05:00
mihir-awatramani
389e493055
CUTLASS 3.8 Release ( #2059 )
...
* CUTLASS 3.8 Release
* update
* Update README.md
* Revert "Update README.md"
This reverts commit b353e36fe8 .
* update
* update
---------
Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com >
Co-authored-by: Haicheng Wu <haichengw@nvidia.com >
2025-01-25 02:44:06 -05:00
Yujia Zhai
b78588d163
CUTLASS 3.7 ( #2045 )
...
* CUTLASS 3.7
* clean up changelog
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com >
Co-authored-by: Haicheng Wu <haichengw@nvidia.com >
2025-01-18 09:53:07 -05:00
Yujia Zhai
3d261a5974
3.6.0 update ( #2005 )
...
* 3.6.0 update
* doc and swap stuff
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com >
Co-authored-by: Haicheng Wu <haichengw@nvidia.com >
2024-12-25 01:34:40 -05:00
Lain
80243e0b8c
add {uint4, uint2, int2} => {fp16, bf16} conversion ( #1966 )
2024-12-03 14:03:43 -05:00
侯奇
12626bcfe4
Update gemm_f16n_f16t_f32t_tensor_op_f32_sm80.cu with include "cutlass/gemm/device/gemm_universal.h" ( #1569 )
...
fix compile with `cmake .. -DCUTLASS_ENABLE_TESTS=ON -DCUTLASS_TEST_LEVEL=2`
2024-10-23 12:56:36 -04:00
Xinyu Yang
f3a3bfcbf2
add maximum support ( #1833 )
2024-10-23 12:44:56 -04:00
Yujia Zhai
cc3c29a81a
CUTLASS 3.6.0 ( #1850 )
...
* v3.6
* update changelog
* update readme
* fix typo
* fixing typos
* hopper gemm with weight prefetch
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com >
Co-authored-by: Haicheng Wu <haichengw@nvidia.com >
2024-10-09 15:33:27 -04:00
Alexander Zinoviev
477a677317
Fix typos in test/unit/conv/cache_testbed_output.h ( #1652 )
...
Co-authored-by: Alexander Zinoviev <azinoviev@tesla.com >
2024-10-07 12:39:11 -04:00
Junkai-Wu
dbdae514e0
Support for TMA Epilogue for Group Gemm and add pingpong ptr array & Group Gemm ( #1795 )
2024-09-11 00:07:31 -04:00
Aleksandar Samardžić
e1976daacc
Add support for mixed 4-bit/8-bit data types GEMM ( #1413 )
...
* Add support for mixed 4-bit/8-bit data types GEMM
* fix ( and )
---------
Co-authored-by: Aleksandar Samardžić <asamardzic@matf.bg.ac.rs >
Co-authored-by: Haicheng Wu <haichengw@nvidia.com >
2024-08-29 23:11:06 -04:00
Aleksandar Samardžić
3f084f7f3c
Add couple configs into generator.py for mixed input MM ( #1350 )
...
* Add couple configs into generator.py for mixed input MM
* change one unit test name; reenable 128x32 in the profiler
* Added U8/BF16 tests.
---------
Co-authored-by: Haicheng Wu <haichengw@nvidia.com >
Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com >
2024-08-16 00:59:29 -04:00
Haicheng Wu
8d8cfdf375
update 3.5.1 readme/changelog
2024-08-14 21:12:44 -07:00
Mark Hoemmen
19b4c5e065
Fix isnan namespace qualification in cutlass/functional.h ( #1679 )
...
* Fix unrelated MSVC build warnings
* Fix use of isnan in functional.h
Correct namespace qualification of isnan in functional.h
so that it invokes cutlass::isnan for half_t, instead of
converting half_t to float and invoking std::isnan (on host,
or ::isnan on device).
2024-08-05 14:28:13 -04:00
Vijay Thakkar
be60a0b272
CUTLASS 3.5.1 ( #1623 )
...
* CUTLASS 3.5.1
* updates, optimizations, fixes
2024-07-29 08:46:24 -04:00
Daniel Richard G
d6580c3dc0
Support use of external/system GTest installation ( #1469 )
...
* Support use of system/external GTest installation
* Create working directory for tests explicitly
2024-07-10 11:07:57 -04:00
Alexander Zinoviev
dbfced05e7
Fix typos in convolution tests ( #1433 )
2024-07-10 11:00:52 -04:00
Vijay Thakkar
7d49e6c7e2
Updates for CUTLASS 3.5.0 ( #1468 )
2024-04-11 21:33:40 -04:00
reed
19f3cc33f1
Fix uint128 operator add ( #1400 )
...
* fix uint128 operator add for 64-bit hilo implemenation
* add uint128 test for operator add
* make clang happy
---------
Co-authored-by: Haicheng Wu <haichengw@nvidia.com >
2024-04-02 13:32:18 -04:00
Vijay Thakkar
629f4653c3
CUTLASS 3.5.0 ( #1411 )
2024-03-19 17:51:04 -04:00
ANIKET SHIVAM
bbe579a9e3
Updates for CUTLASS 3.4.1 ( #1346 )
...
* Updates for CUTLASS 3.4.1
* minor epi change
2024-02-15 15:48:34 -05:00
Aleksandar Samardžić
ca37d632c9
Remove sparse GEMM with row broadcasted bias vector ( #1302 )
...
This reverts commit d3e72719b4 .
Co-authored-by: Aleksandar Samardžić <asamardzic@matf.bg.ac.rs >
2024-01-17 14:06:27 -05:00
Chengquan Jiang
362abbf274
Support ElementD to be void for tma ( #1153 )
...
* Support void D with AuxStore
* refine get_element_aux
2024-01-16 18:15:42 -05:00
ANIKET SHIVAM
751eb9a885
Update license year ( #1306 )
2024-01-16 14:37:22 -05:00
ANIKET SHIVAM
2f589ffa76
Updates for 3.4 release. ( #1305 )
2024-01-16 13:42:51 -05:00