Junkai-Wu
4faf1a1568
v4.3.5 update. ( #2935 )
...
* v4.3.5 update.
* Update copyright to 2026.
2026-01-08 15:02:14 -05:00
Junkai-Wu
7233a05f24
v4.3.4 update. ( #2893 )
2025-12-21 11:49:35 -05:00
Junkai-Wu
5873443bb6
v4.3.3 update ( #2869 )
...
* v4.3.3 update.
* fix print_layout printf format in device code (#2688 )
* fix print_layout printf format in device code
* Replace %.*s format specifier with explicit loop
* Remove unused delim variable
The printf format %.*s with dynamic width does not work correctly
in CUDA device code, causing literal %.*s to appear in output.
Fixes #2496
* Update include/cute/util/print_tensor.hpp
Co-authored-by: Cris Cecka <ccecka@users.noreply.github.com >
* Update include/cute/util/print_tensor.hpp
Co-authored-by: Cris Cecka <ccecka@users.noreply.github.com >
---------
Co-authored-by: Cris Cecka <ccecka@users.noreply.github.com >
* Support PDL for SM90 Array TMA GEMM
* Update changelog
---------
Co-authored-by: Amin Sedaghat <35748194+Aminsed@users.noreply.github.com >
Co-authored-by: Cris Cecka <ccecka@users.noreply.github.com >
2025-12-11 00:26:17 -05:00
Junkai-Wu
ff35fa561d
v4.3.2 update. ( #2840 )
2025-12-04 10:14:50 -05:00
Junkai-Wu
5fd9685dce
v4.3.1 update ( #2818 )
...
* Blockscaled Ragged Contiguous Grouped Gemm for MoEs (#2790 )
* Adding blockscaled ragged contiguous grouped gemm for MoEs
* cleaning up the example
* introduction to example improved
---------
Co-authored-by: Shreya Gaur <shgaur@dc2-container-xterm-012.prd.it.nvidia.com >
* v4.3.1 update.
---------
Co-authored-by: Shreya Gaur <48754356+Shreya-gaur@users.noreply.github.com >
Co-authored-by: Shreya Gaur <shgaur@dc2-container-xterm-012.prd.it.nvidia.com >
2025-11-27 09:48:55 -05:00
Junkai-Wu
8cd5bef43a
v4.3 tag release update. ( #2789 )
2025-11-20 20:49:44 -05:00
Zekun Fan
a2439551c7
Fixed editable install to depend on CuTeDSL/requirements.txt ( #2768 )
...
To guarantee wheel version alignment of the source code.
2025-11-14 15:31:49 -08:00
Junkai-Wu
b1d6e2c9b3
v4.3 update. ( #2709 )
...
* v4.3 update.
* Update the cute_dsl_api changelog's doc link
* Update version to 4.3.0
* Update the example link
* Update doc to encourage user to install DSL from requirements.txt
---------
Co-authored-by: Larry Wu <larwu@nvidia.com >
2025-10-21 14:26:30 -04:00
Junkai-Wu
7a6d4ee099
v4.2.1 update. ( #2666 )
2025-09-23 13:25:43 -04:00
Junkai-Wu
8825e8be4f
Add required changes for github pipeline. ( #2648 )
2025-09-17 22:22:45 -04:00
Junkai-Wu
6a35b4d22f
v4.2 tag release. ( #2638 )
2025-09-15 12:21:53 -04:00
Junkai-Wu
fd6cfe1ed0
v4.1 release update v2. ( #2481 )
2025-07-21 22:03:55 -04:00
Junkai-Wu
a1aaf2300a
v4.1 release
2025-07-03 08:07:53 -04:00
brandonsun
5c6bca0441
Update requirements.txt ( #2390 )
...
Remove the dev suffix in the wheel version
2025-06-10 02:31:49 -04:00
Junkai-Wu
8bdbfca682
v4.0 update. ( #2371 )
2025-06-06 02:39:20 -04:00
Ruyman
1ec230c4bf
Fix typo ( #2299 )
...
Needs == for pip to parse the file
2025-05-15 09:38:42 -04:00
Kihiro Bando
f115c3f854
Release v4.0.0 ( #2294 )
2025-05-13 15:55:29 -04:00