Commit Graph

33 Commits

Author SHA1 Message Date
Junkai-Wu
a221da7ccf v4.5 dev update. (#3153) 2026-04-07 12:16:05 -04:00
Zheng Linfeng
ecb32fe231 [CLI] Fix tutorial issues 2026-03-24 00:12:01 -07:00
Junkai-Wu
1b741cabaa v4.4.2 update. (#3104) 2026-03-17 00:58:19 -04:00
Linfeng Zheng
772fbb264e [CLI] add cutedsl fp16 gemm tutorial from 2 to 6 (#3106)
* [CLI] add fp16 gemm tutorial from 2 to 6

* [CLI] refine comments
2026-03-17 10:11:55 +08:00
Junkai-Wu
057635de5c Remove redundant dsl example. (#3074) 2026-02-26 08:10:59 -05:00
Junkai-Wu
c213bfdfc1 Remove redundant dsl examples. (#3071) 2026-02-25 22:42:01 -05:00
Linfeng Zheng
3476ddb7bd remove mixed_input_fmha_prefill (#3041) 2026-02-18 07:59:01 -05:00
Junkai-Wu
d4bbf728ca v4.4 tag release update. (#3032) 2026-02-13 23:27:58 -05:00
Junkai-Wu
6b3e607b85 v4.4 release update v2. (#2999) 2026-02-03 20:48:31 -05:00
Hua Huang
1cfbb53a23 [CuTeDSL] Fix: SM100 block-scale gemm overlapping accumulator (#2995)
* Fix: SM100 block-scale gemm overlapping accumulator

Signed-off-by: Hua Huang <huah@nvidia.com>

* Also include threads_per_warp fix

Signed-off-by: Hua Huang <huah@nvidia.com>

---------

Signed-off-by: Hua Huang <huah@nvidia.com>
2026-02-03 11:01:41 +08:00
dongxiao
a4eb0e05f6 fix performance inssues in cute-dsl examples for 4.4-ctk13.1 release (#2988)
* fix grouped gemm

* fix mixed input gemm

* fix mixed input grouped gemm

* fix version checking

* use advanced compiler options

* fix comment

* rename advanced compiler configs to adcanced compiler control

* fix comment

* fix name

* fix name
2026-01-30 13:31:04 +08:00
myu-guo
d252b01300 fix performance regression in cute-dsl examples for 4.4-ctk13.1 release (#2990)
* fix regression with cu13.1

* update
2026-01-30 13:30:49 +08:00
Xiao Song
acb45938e9 Update nvvm API call from nvvm enum to str (#2985) 2026-01-27 17:28:29 +08:00
Junkai-Wu
9fba3195f9 v4.4 update. (#2979) 2026-01-24 11:46:17 -05:00
Brian K. Ryu
147f5673d0 New RMS Norm example with unit tests (#2917)
* Add rmsnorm example

* Address reviewer comments. (1) use the cute.runtime definition directly. (2) use the nvvm_wrapper's warp reduce directly

* Separate out reduce.py

* Change copyright notice years
2026-01-13 09:05:31 +08:00
Junkai-Wu
0d2b201e8c v4.3.5 update. (#2934)
* v4.3.5 update.

* Update copyright to 2026
2026-01-08 15:02:56 -05:00
questa-quan-wang
3f4c086d09 new example with TMA prefetch feature targeting for DRAM latency bound cases (#2881)
Co-authored-by: Questa Wang <questaw@computelab-frontend-7.nvidia.com>
2025-12-23 15:29:48 +08:00
Junkai-Wu
7f5fe3edf1 v4.3.4 update. (#2892) 2025-12-21 11:49:12 -05:00
dongxiao
331e2f451c add missing condition for sync (#2889) 2025-12-19 11:00:30 +08:00
Linfeng Zheng
f6402fcd5e add pytest support for tutorial gemm (#2826)
* add pytest support for tutorial gemm

* add license
2025-12-05 08:45:01 -05:00
Junkai-Wu
bc680c7f67 v4.3.2 update. (#2839) 2025-12-04 10:14:32 -05:00
Fung Xie
afe2f71522 reorganize examples for tvm-ffi 2025-11-27 17:02:26 -08:00
Fung Xie
739fffce27 fix TVM FFI doc and update example 2025-11-27 17:02:26 -08:00
Junkai-Wu
1de3a576cc v4.3.1 update. (#2817) 2025-11-27 09:49:30 -05:00
Junkai-Wu
8cd5bef43a v4.3 tag release update. (#2789) 2025-11-20 20:49:44 -05:00
Mindy Li
06b6bd7d7b remove cute dsl pdl example. 2025-11-09 21:47:00 -08:00
Linfeng Zheng
2252254ce2 Add tutorial fp16_gemm_1 (#2750)
* Add tutorial fp16_gemm_1

* refine

* refine

* refine

* revert changes in fp16_gemm_0.py
2025-11-06 22:40:09 -05:00
Junkai-Wu
b1d6e2c9b3 v4.3 update. (#2709)
* v4.3 update.

* Update the cute_dsl_api changelog's doc link

* Update version to 4.3.0

* Update the example link

* Update doc to encourage user to install DSL from requirements.txt

---------

Co-authored-by: Larry Wu <larwu@nvidia.com>
2025-10-21 14:26:30 -04:00
Junkai-Wu
6a35b4d22f v4.2 tag release. (#2638) 2025-09-15 12:21:53 -04:00
Linfeng Zheng
9ca7e877b2 fix gqa issue for blackwell fmha.py (#2599) 2025-08-28 11:15:20 -04:00
Junkai-Wu
fd6cfe1ed0 v4.1 release update v2. (#2481) 2025-07-21 22:03:55 -04:00
Junkai-Wu
a1aaf2300a v4.1 release 2025-07-03 08:07:53 -04:00
Kihiro Bando
f115c3f854 Release v4.0.0 (#2294) 2025-05-13 15:55:29 -04:00