Commit Graph

755 Commits

Author SHA1 Message Date
Haicheng Wu
f16068b4db Bump version from 4.2.0 to 4.3.1 2025-12-01 22:12:20 -05:00
Haicheng Wu
1acfe141af Bump version from 4.2.1 to 4.3.1 2025-12-01 22:11:13 -05:00
Haicheng Wu
f11375bf91 Bump CUTLASS patch version to 1 2025-12-01 22:08:52 -05:00
Shreya Gaur
af8d5dfa54 bug fix for example 92 (#2830)
Co-authored-by: Shreya Gaur <shgaur@dc2-container-xterm-012.prd.it.nvidia.com>
Co-authored-by: Shreya Gaur <shgaur@2u2g-spr-0015.ipp4a1.colossus.nvidia.com>
2025-12-01 22:02:59 -05:00
drazi
ec8daf642d Merge pull request #2809 from whatdhack/patch-1
Update notebook title from 'Tour to' to 'Tour of'
2025-11-28 18:07:34 +08:00
drazi
5016493cc0 Merge pull request #2813 from fengxie/ftse/fix/example
Refactor TVM FFI examples and update doc
2025-11-28 09:07:15 +08:00
Fung Xie
8588d099e4 refactored doc 2025-11-27 17:04:20 -08:00
Fung Xie
8fc9bc5dda update doc 2025-11-27 17:03:51 -08:00
Fung Xie
f71892b824 update doc 2025-11-27 17:03:03 -08:00
Fung Xie
03aa211310 update doc 2025-11-27 17:02:59 -08:00
Fung Xie
286781a1fb add requirements.txt 2025-11-27 17:02:27 -08:00
Fung Xie
2664cac685 enhanced the example for tvm-ffi 2025-11-27 17:02:26 -08:00
Fung Xie
b9154d65b3 update examples for tvm-ffi 2025-11-27 17:02:26 -08:00
Fung Xie
afe2f71522 reorganize examples for tvm-ffi 2025-11-27 17:02:26 -08:00
Fung Xie
739fffce27 fix TVM FFI doc and update example 2025-11-27 17:02:26 -08:00
Junkai-Wu
1de3a576cc v4.3.1 update. (#2817) 2025-11-27 09:49:30 -05:00
Shreya Gaur
2052fd3885 Blockscaled Ragged Contiguous Grouped Gemm for MoEs (#2790)
* Adding blockscaled ragged contiguous grouped gemm for MoEs

* cleaning up the example

* introduction to example improved

---------

Co-authored-by: Shreya Gaur <shgaur@dc2-container-xterm-012.prd.it.nvidia.com>
2025-11-26 20:16:49 -05:00
whatdhack
4a55379686 Update notebook title from 'Tour to' to 'Tour of'
Grammar check . LLM's can quickly clean up such issues.
2025-11-24 20:11:14 -08:00
Haicheng Wu
e67e63c331 Bump version from 4.2.1 to 4.3.0 v4.3.0 2025-11-24 16:36:06 -05:00
Haicheng Wu
ddaf12c1b1 Bump version from 4.2.0 to 4.3.0 2025-11-24 16:35:27 -05:00
Haicheng Wu
7967ce5f83 Bump version to 4.3.0 2025-11-24 16:34:45 -05:00
Junkai-Wu
8cd5bef43a v4.3 tag release update. (#2789) 2025-11-20 20:49:44 -05:00
Linfeng Zheng
406e078b29 add a notebook for tour to sol gemm (#2780)
* add tour to sol gemm notebook

* change some typos

* change some typos
2025-11-20 09:41:01 -05:00
Zekun Fan
a2439551c7 Fixed editable install to depend on CuTeDSL/requirements.txt (#2768)
To guarantee wheel version alignment of the source code.
2025-11-14 15:31:49 -08:00
drazi
bd96096d58 Merge pull request #2758 from limin2021/delete-pdl-example
[cute-dsl][fix]remove cute dsl pdl example.
2025-11-10 22:56:57 +08:00
Mindy Li
06b6bd7d7b remove cute dsl pdl example. 2025-11-09 21:47:00 -08:00
Linfeng Zheng
2252254ce2 Add tutorial fp16_gemm_1 (#2750)
* Add tutorial fp16_gemm_1

* refine

* refine

* refine

* revert changes in fp16_gemm_0.py
2025-11-06 22:40:09 -05:00
Ali Hassani
d1ef0e87f2 DistGEMM bug fixes (#2713)
* Blackwell DistGEMM bug fixes

1. If using preferred cluster, there needs to be a branch so that
   the universal GEMM wrapper finds the correct base params.
2. Workspace sizes can change depending on problem shape in Blackwell,
   and DistGEMM was previously using the per-device shape to evaluate
   workspace size instead of the per-gemm shape.
3. Flattened size used to initialize host tensors can overflow (in
   Hopper example as well)
4. Preferred and fallback cluster args need to be set explicitly,
   otherwise if someone modifies the example to use preferred cluster,
   it will just fail.

* Fix example runtimes

* Set default fallback cluster shapes to the static ones
2025-11-06 13:31:24 -05:00
ANIKET SHIVAM
020c700e97 support for K=0 for sm100 GG (#2746) 2025-11-04 11:25:39 -05:00
Haicheng Wu
8afb19d904 update CITATION.cff 2025-10-28 23:42:37 -04:00
Qi Yuhang
b2ca083d2b Fixed compilation error when using StreamK scheduler + PDL. (#2686) 2025-10-21 23:11:14 -04:00
Junkai-Wu
b1d6e2c9b3 v4.3 update. (#2709)
* v4.3 update.

* Update the cute_dsl_api changelog's doc link

* Update version to 4.3.0

* Update the example link

* Update doc to encourage user to install DSL from requirements.txt

---------

Co-authored-by: Larry Wu <larwu@nvidia.com>
2025-10-21 14:26:30 -04:00
Lain
e6e2cc29f5 fix (#2684) 2025-10-15 14:46:38 -04:00
Haicheng Wu
c6aeb9179c Update pyproject.toml
update version to 4.2.1
2025-09-24 01:18:51 -04:00
Haicheng Wu
95a5ff14c0 Update CHANGELOG.md
format change
2025-09-23 17:33:00 -04:00
ANIKET SHIVAM
fb8b43ef05 Merge pull request #2669 from NVIDIA/421_update
4.2.1 update
2025-09-23 14:02:29 -07:00
Haicheng Wu
f874df19ac 4.2.1 update 2025-09-23 13:45:13 -07:00
Junkai-Wu
7a6d4ee099 v4.2.1 update. (#2666) 2025-09-23 13:25:43 -04:00
GTO
2b8dff1f90 Fix bfloat16 epsilon (#2607)
* Fix bfloat16 epsilon

* just use constants

---------

Co-authored-by: Konstantin <konstantin@MacBook-Air.local>
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2025-09-21 23:43:59 -04:00
103yiran
fd0312ddf6 Remove duplicate function calls (#1584) 2025-09-21 23:16:59 -04:00
Aya Z. Ibrahim
64579189ec Feature/add bottom causal mask (#2480)
* Rebase to latest

* update

* upd

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Update fmha_fusion.hpp

* Update fmha_fusion.hpp

fixed flipped logic for isQBegin

* Update fmha_fusion.hpp

* Avoid use of booleans

The current expression is confusing

* fmt

* Update fmha_fusion.hpp

Reproduce error/fix with: 
./77_blackwell_fmha_fp16 --verify --b=1 --q=1013 --k=1024 --h=1 --h_k=1 --mask=causal --causal-type=qend

* add test, format

---------

Co-authored-by: Richard Cai <ricai@nvidia.com>
Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com>
2025-09-18 17:11:23 -04:00
Jack Kosaian
b234a8c024 Rename python/cutlass to python/cutlass_cppgen (#2652) 2025-09-18 14:26:57 -04:00
Junkai-Wu
74825181f2 Remove old-version dsl examples. (#2644) 2025-09-17 22:23:30 -04:00
Junkai-Wu
8825e8be4f Add required changes for github pipeline. (#2648) 2025-09-17 22:22:45 -04:00
wbn
7817e47154 Fxied a typo in pipeline descript docs. (#2623) 2025-09-15 22:32:27 -04:00
Asuka
25ccb875b8 Fix: a calculation error in the example of dividing out in the 02_layout_algebra doc (#2635) 2025-09-15 22:31:33 -04:00
Wanshe
29c1ad704a Fix doc cute 03_tensor.md link typo (#2627)
* Update 03_tensor.md fix link typo

change path to relative path

* Update 03_tensor.md

---------

Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com>
2025-09-15 22:26:43 -04:00
Haicheng Wu
57e3cfb47a doc change for 4.2 (#2639)
* doc change

* fix broken links

* ragged gemm doc update

* move around texts about moe gemm
2025-09-15 22:02:45 -04:00
Haicheng Wu
e7e0adddac Update version.h
change version number to 4.2
2025-09-15 12:40:58 -04:00
Junkai-Wu
6a35b4d22f v4.2 tag release. (#2638) 2025-09-15 12:21:53 -04:00