jkosaian
e594def95e
Don't access data_ptr of fake tensor. Fix EFC w/o epilogue
2026-01-14 18:00:08 -08:00
jkosaian
e222b2a9b9
Update TVM FFI version
2026-01-13 07:58:48 -08:00
jkosaian
87cab7bae2
2026-01-12 updates
2026-01-12 18:51:25 -08:00
jkosaian
7c09485e25
2026-01-06 updates
2026-01-06 04:25:33 -08:00
jkosaian
dfcb55de16
Fix batch adding for EFC
2025-12-16 14:08:23 -08:00
jkosaian
ead2fbfe13
Initial commit
2025-12-16 10:00:46 -08:00
Haicheng Wu
d4e16f5d4e
Bump version from 4.2.1 to 4.3.3
2025-12-11 23:58:38 -05:00
Junkai-Wu
d3a5492381
v4.3.3 update. ( #2868 )
2025-12-11 00:26:58 -05:00
Haicheng Wu
c4744f706e
Bump version from 4.2.1 to 4.3.2
2025-12-05 13:45:16 -05:00
Junkai-Wu
bc680c7f67
v4.3.2 update. ( #2839 )
2025-12-04 10:14:32 -05:00
Haicheng Wu
5e847d37c4
Bump version from 4.2.1 to 4.3.1
2025-12-01 22:13:19 -05:00
Haicheng Wu
f16068b4db
Bump version from 4.2.0 to 4.3.1
2025-12-01 22:12:20 -05:00
Haicheng Wu
1acfe141af
Bump version from 4.2.1 to 4.3.1
2025-12-01 22:11:13 -05:00
Fung Xie
03aa211310
update doc
2025-11-27 17:02:59 -08:00
Junkai-Wu
1de3a576cc
v4.3.1 update. ( #2817 )
2025-11-27 09:49:30 -05:00
Haicheng Wu
e67e63c331
Bump version from 4.2.1 to 4.3.0
2025-11-24 16:36:06 -05:00
Haicheng Wu
ddaf12c1b1
Bump version from 4.2.0 to 4.3.0
2025-11-24 16:35:27 -05:00
Haicheng Wu
7967ce5f83
Bump version to 4.3.0
2025-11-24 16:34:45 -05:00
Junkai-Wu
8cd5bef43a
v4.3 tag release update. ( #2789 )
2025-11-20 20:49:44 -05:00
Zekun Fan
a2439551c7
Fixed editable install to depend on CuTeDSL/requirements.txt ( #2768 )
...
To guarantee wheel version alignment of the source code.
2025-11-14 15:31:49 -08:00
Junkai-Wu
b1d6e2c9b3
v4.3 update. ( #2709 )
...
* v4.3 update.
* Update the cute_dsl_api changelog's doc link
* Update version to 4.3.0
* Update the example link
* Update doc to encourage user to install DSL from requirements.txt
---------
Co-authored-by: Larry Wu <larwu@nvidia.com >
2025-10-21 14:26:30 -04:00
Haicheng Wu
f874df19ac
4.2.1 update
2025-09-23 13:45:13 -07:00
Junkai-Wu
7a6d4ee099
v4.2.1 update. ( #2666 )
2025-09-23 13:25:43 -04:00
Jack Kosaian
b234a8c024
Rename python/cutlass to python/cutlass_cppgen ( #2652 )
2025-09-18 14:26:57 -04:00
Junkai-Wu
8825e8be4f
Add required changes for github pipeline. ( #2648 )
2025-09-17 22:22:45 -04:00
Junkai-Wu
6a35b4d22f
v4.2 tag release. ( #2638 )
2025-09-15 12:21:53 -04:00
Harrison Barclay
b2dd65dc86
more robust imports in heuristics.py and heuristics_provider.py ( #2596 )
2025-08-28 22:32:55 -04:00
Junkai-Wu
a49a78ffef
v4.2 release. ( #2587 )
...
* Fix default cluster callback values to 1 to avoid profiler failure when these values are not set in command line.
* v4.2 release.
2025-08-22 18:11:24 -04:00
melonedo
ec18e8043b
Make swizzle in pycute work ( #2553 )
2025-08-19 22:21:00 -04:00
Haicheng Wu
664c4f7b3e
Update CUTLASS version to 4.1
...
Update CUTLASS version to 4.1.
2025-07-26 20:11:04 -04:00
Junkai-Wu
fd6cfe1ed0
v4.1 release update v2. ( #2481 )
2025-07-21 22:03:55 -04:00
Colin Peppler
ebe98c549a
cache procedural_name in GemmOperation ( #2317 )
2025-07-16 22:25:02 -04:00
Junkai-Wu
a1aaf2300a
v4.1 release
2025-07-03 08:07:53 -04:00
brandonsun
5c6bca0441
Update requirements.txt ( #2390 )
...
Remove the dev suffix in the wheel version
2025-06-10 02:31:49 -04:00
Junkai-Wu
8bdbfca682
v4.0 update. ( #2371 )
2025-06-06 02:39:20 -04:00
Ruyman
1ec230c4bf
Fix typo ( #2299 )
...
Needs == for pip to parse the file
2025-05-15 09:38:42 -04:00
Kihiro Bando
f115c3f854
Release v4.0.0 ( #2294 )
2025-05-13 15:55:29 -04:00
Haicheng Wu
ad7b2f5e84
3.9.2 doc/version ( #2279 )
...
* 3.9.2 doc/version
* whitespace
2025-05-04 00:00:15 -04:00
Haicheng Wu
f535c33634
3.9.1 doc/version change ( #2273 )
2025-05-01 00:27:00 -04:00
Michael Lazos
e3cb8a773a
Import cuda, cudart, nvrtc lazily ( #2251 )
...
* Lazy cuda import
* More lazy cuda import
* More lazy cuda imports
* minor fixes
---------
Co-authored-by: Haicheng Wu <haichengw@nvidia.com >
2025-04-30 23:10:33 -04:00
Michael Lazos
c4bdfe821c
Lazy scipy import ( #2250 )
2025-04-30 16:10:00 -04:00
Michael Lazos
b3ce7e12b7
Make cc a positional argument ( #2249 )
2025-04-30 16:09:25 -04:00
Michael Lazos
fe75ead92e
Import pydot lazily ( #2248 )
2025-04-30 16:08:17 -04:00
Ruoxi
35136f5564
Fix wrong detection of python version for use_rmm. ( #2224 )
2025-04-30 15:29:33 -04:00
Yujia Zhai
331a1f5b3f
cutlass 3.9 update ( #2255 )
...
* cutlass 3.9 update
* rebase
* fixes out of shared memory for blockwise Blackwell
* doc format
* fix issue 2253
* disable host ref by default
* fix sm120 smem capacity
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com >
Co-authored-by: Haicheng Wu <haichengw@nvidia.com >
2025-04-24 15:42:40 -04:00
Yujia Zhai
6f4921858b
v3.9 update ( #2203 )
...
* v3.9 update
* voidD
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com >
2025-04-02 15:11:18 -04:00
Yujia Zhai
62750a2b75
v3.9 ( #2185 )
...
* v3.8 update x
* fix blackwell gg
* doc change
* doc change
* doc change
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com >
Co-authored-by: Haicheng Wu <haichengw@nvidia.com >
Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com >
2025-03-21 01:52:23 -04:00
Yujia Zhai
afa1772203
truncate name for cutlass profiler ( #2124 )
...
Co-authored-by: yuzhai <yuzhai@nvidia.com >
2025-02-21 00:16:56 -05:00
ANIKET SHIVAM
9b3772dfa6
Hopper Grouped GEMM support for FP8 Accum ( #2123 )
...
* Add support for fp8accum, with profiler extension
* Update .gitignore
* contri
---------
Co-authored-by: Haicheng Wu <haichengw@nvidia.com >
2025-02-20 21:55:26 -05:00
Yujia Zhai
b84e9802d8
update 3.8 v2 ( #2112 )
...
* update 3.8 v2
* update 3.8
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com >
2025-02-19 22:03:14 -05:00