Commit Graph

104 Commits

Author SHA1 Message Date
Haicheng Wu
954503d44c Bump version to 4.4.0 2026-02-25 00:04:04 -05:00
Haicheng Wu
6c4200f1bc Bump version from 4.3.5 to 4.4.0 2026-02-25 00:03:23 -05:00
Haicheng Wu
de93e8a4ac Bump version from 4.3.5 to 4.4.0 2026-02-25 00:03:04 -05:00
Haicheng Wu
b92b9f0d37 Bump version from 4.3.5 to 4.4.0 2026-02-25 00:02:41 -05:00
Yuan Xiaolan
395ab575f6 Merge branch 'main' into tvm-ffi 2026-02-14 13:35:28 +08:00
Junkai-Wu
d4bbf728ca v4.4 tag release update. (#3032) 2026-02-13 23:27:58 -05:00
drazi
01687cfba1 Merge pull request #3004 from tridao/add-sub-packed-f32x2
[CuTeDSL] Add sub_packed_f32x2 operation
2026-02-13 20:46:26 +08:00
drazi
5c42d0f28c Merge pull request #3021 from tridao/clc_no_multicast
[Cute-DSL] Add option for issue_clc_query without multicast
2026-02-13 20:45:52 +08:00
Tri Dao
244e8d00d5 [Cute-DSL] Add cute.arch.fmin by calling nvvm 2026-02-11 14:23:09 -05:00
Tri Dao
5b83b34afd [Cute-DSL] Add option for issue_clc_query without multicast 2026-02-11 14:19:29 -05:00
Tri Dao
51935551fb [CuTeDSL] Add sub_packed_f32x2 operation
Add subtraction operation for packed f32x2 values, following the same
pattern as the existing add_packed_f32x2 and mul_packed_f32x2 operations.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 21:18:46 +07:00
Junkai-Wu
6b3e607b85 v4.4 release update v2. (#2999) 2026-02-03 20:48:31 -05:00
yuanxiaolan
de161925a5 pass in stream=-1 2026-02-03 11:59:14 +08:00
yuanxiaolan
de198b2419 fix tvm-ffi path in from_dlpack 2026-02-03 11:59:13 +08:00
Xiao Song
acb45938e9 Update nvvm API call from nvvm enum to str (#2985) 2026-01-27 17:28:29 +08:00
Junkai-Wu
9fba3195f9 v4.4 update. (#2979) 2026-01-24 11:46:17 -05:00
Aidan Do
3f5bafb326 [Cutlass profiler] Fix SM100 FP8 nosmem epilogue shape_div “Divisibility Condition” for non‑multiple‑of‑64 N tiles (#2946)
* .

* .

* .

* .

* .

* .

* .
2026-01-20 15:27:34 +08:00
Junkai-Wu
0d2b201e8c v4.3.5 update. (#2934)
* v4.3.5 update.

* Update copyright to 2026
2026-01-08 15:02:56 -05:00
Wenxuan Tan
f86feb0aa8 Fix idx2crd docstring (#2914)
* fix idx2crd docstring

* fix

* fix
2026-01-07 13:11:38 -05:00
Junkai-Wu
7f5fe3edf1 v4.3.4 update. (#2892) 2025-12-21 11:49:12 -05:00
Haicheng Wu
d4e16f5d4e Bump version from 4.2.1 to 4.3.3 2025-12-11 23:58:38 -05:00
Junkai-Wu
d3a5492381 v4.3.3 update. (#2868) 2025-12-11 00:26:58 -05:00
Haicheng Wu
c4744f706e Bump version from 4.2.1 to 4.3.2 2025-12-05 13:45:16 -05:00
Junkai-Wu
bc680c7f67 v4.3.2 update. (#2839) 2025-12-04 10:14:32 -05:00
Haicheng Wu
5e847d37c4 Bump version from 4.2.1 to 4.3.1 2025-12-01 22:13:19 -05:00
Haicheng Wu
f16068b4db Bump version from 4.2.0 to 4.3.1 2025-12-01 22:12:20 -05:00
Haicheng Wu
1acfe141af Bump version from 4.2.1 to 4.3.1 2025-12-01 22:11:13 -05:00
Fung Xie
03aa211310 update doc 2025-11-27 17:02:59 -08:00
Junkai-Wu
1de3a576cc v4.3.1 update. (#2817) 2025-11-27 09:49:30 -05:00
Haicheng Wu
e67e63c331 Bump version from 4.2.1 to 4.3.0 2025-11-24 16:36:06 -05:00
Haicheng Wu
ddaf12c1b1 Bump version from 4.2.0 to 4.3.0 2025-11-24 16:35:27 -05:00
Haicheng Wu
7967ce5f83 Bump version to 4.3.0 2025-11-24 16:34:45 -05:00
Junkai-Wu
8cd5bef43a v4.3 tag release update. (#2789) 2025-11-20 20:49:44 -05:00
Zekun Fan
a2439551c7 Fixed editable install to depend on CuTeDSL/requirements.txt (#2768)
To guarantee wheel version alignment of the source code.
2025-11-14 15:31:49 -08:00
Junkai-Wu
b1d6e2c9b3 v4.3 update. (#2709)
* v4.3 update.

* Update the cute_dsl_api changelog's doc link

* Update version to 4.3.0

* Update the example link

* Update doc to encourage user to install DSL from requirements.txt

---------

Co-authored-by: Larry Wu <larwu@nvidia.com>
2025-10-21 14:26:30 -04:00
Haicheng Wu
f874df19ac 4.2.1 update 2025-09-23 13:45:13 -07:00
Junkai-Wu
7a6d4ee099 v4.2.1 update. (#2666) 2025-09-23 13:25:43 -04:00
Jack Kosaian
b234a8c024 Rename python/cutlass to python/cutlass_cppgen (#2652) 2025-09-18 14:26:57 -04:00
Junkai-Wu
8825e8be4f Add required changes for github pipeline. (#2648) 2025-09-17 22:22:45 -04:00
Junkai-Wu
6a35b4d22f v4.2 tag release. (#2638) 2025-09-15 12:21:53 -04:00
Harrison Barclay
b2dd65dc86 more robust imports in heuristics.py and heuristics_provider.py (#2596) 2025-08-28 22:32:55 -04:00
Junkai-Wu
a49a78ffef v4.2 release. (#2587)
* Fix default cluster callback values to 1 to avoid profiler failure when these values are not set in command line.

* v4.2 release.
2025-08-22 18:11:24 -04:00
melonedo
ec18e8043b Make swizzle in pycute work (#2553) 2025-08-19 22:21:00 -04:00
Haicheng Wu
664c4f7b3e Update CUTLASS version to 4.1
Update CUTLASS version to 4.1.
2025-07-26 20:11:04 -04:00
Junkai-Wu
fd6cfe1ed0 v4.1 release update v2. (#2481) 2025-07-21 22:03:55 -04:00
Colin Peppler
ebe98c549a cache procedural_name in GemmOperation (#2317) 2025-07-16 22:25:02 -04:00
Junkai-Wu
a1aaf2300a v4.1 release 2025-07-03 08:07:53 -04:00
brandonsun
5c6bca0441 Update requirements.txt (#2390)
Remove the dev suffix in the wheel version
2025-06-10 02:31:49 -04:00
Junkai-Wu
8bdbfca682 v4.0 update. (#2371) 2025-06-06 02:39:20 -04:00
Ruyman
1ec230c4bf Fix typo (#2299)
Needs == for pip to parse the file
2025-05-15 09:38:42 -04:00