Haicheng Wu
f16068b4db
Bump version from 4.2.0 to 4.3.1
2025-12-01 22:12:20 -05:00
Haicheng Wu
1acfe141af
Bump version from 4.2.1 to 4.3.1
2025-12-01 22:11:13 -05:00
Haicheng Wu
f11375bf91
Bump CUTLASS patch version to 1
2025-12-01 22:08:52 -05:00
Shreya Gaur
af8d5dfa54
bug fix for example 92 ( #2830 )
...
Co-authored-by: Shreya Gaur <shgaur@dc2-container-xterm-012.prd.it.nvidia.com >
Co-authored-by: Shreya Gaur <shgaur@2u2g-spr-0015.ipp4a1.colossus.nvidia.com >
2025-12-01 22:02:59 -05:00
drazi
ec8daf642d
Merge pull request #2809 from whatdhack/patch-1
...
Update notebook title from 'Tour to' to 'Tour of'
2025-11-28 18:07:34 +08:00
drazi
5016493cc0
Merge pull request #2813 from fengxie/ftse/fix/example
...
Refactor TVM FFI examples and update doc
2025-11-28 09:07:15 +08:00
Fung Xie
8588d099e4
refactored doc
2025-11-27 17:04:20 -08:00
Fung Xie
8fc9bc5dda
update doc
2025-11-27 17:03:51 -08:00
Fung Xie
f71892b824
update doc
2025-11-27 17:03:03 -08:00
Fung Xie
03aa211310
update doc
2025-11-27 17:02:59 -08:00
Fung Xie
286781a1fb
add requirements.txt
2025-11-27 17:02:27 -08:00
Fung Xie
2664cac685
enhanced the example for tvm-ffi
2025-11-27 17:02:26 -08:00
Fung Xie
b9154d65b3
update examples for tvm-ffi
2025-11-27 17:02:26 -08:00
Fung Xie
afe2f71522
reorganize examples for tvm-ffi
2025-11-27 17:02:26 -08:00
Fung Xie
739fffce27
fix TVM FFI doc and update example
2025-11-27 17:02:26 -08:00
Junkai-Wu
1de3a576cc
v4.3.1 update. ( #2817 )
2025-11-27 09:49:30 -05:00
Shreya Gaur
2052fd3885
Blockscaled Ragged Contiguous Grouped Gemm for MoEs ( #2790 )
...
* Adding blockscaled ragged contiguous grouped gemm for MoEs
* cleaning up the example
* introduction to example improved
---------
Co-authored-by: Shreya Gaur <shgaur@dc2-container-xterm-012.prd.it.nvidia.com >
2025-11-26 20:16:49 -05:00
whatdhack
4a55379686
Update notebook title from 'Tour to' to 'Tour of'
...
Grammar check . LLM's can quickly clean up such issues.
2025-11-24 20:11:14 -08:00
Haicheng Wu
e67e63c331
Bump version from 4.2.1 to 4.3.0
v4.3.0
2025-11-24 16:36:06 -05:00
Haicheng Wu
ddaf12c1b1
Bump version from 4.2.0 to 4.3.0
2025-11-24 16:35:27 -05:00
Haicheng Wu
7967ce5f83
Bump version to 4.3.0
2025-11-24 16:34:45 -05:00
Junkai-Wu
8cd5bef43a
v4.3 tag release update. ( #2789 )
2025-11-20 20:49:44 -05:00
Linfeng Zheng
406e078b29
add a notebook for tour to sol gemm ( #2780 )
...
* add tour to sol gemm notebook
* change some typos
* change some typos
2025-11-20 09:41:01 -05:00
Zekun Fan
a2439551c7
Fixed editable install to depend on CuTeDSL/requirements.txt ( #2768 )
...
To guarantee wheel version alignment of the source code.
2025-11-14 15:31:49 -08:00
drazi
bd96096d58
Merge pull request #2758 from limin2021/delete-pdl-example
...
[cute-dsl][fix]remove cute dsl pdl example.
2025-11-10 22:56:57 +08:00
Mindy Li
06b6bd7d7b
remove cute dsl pdl example.
2025-11-09 21:47:00 -08:00
Linfeng Zheng
2252254ce2
Add tutorial fp16_gemm_1 ( #2750 )
...
* Add tutorial fp16_gemm_1
* refine
* refine
* refine
* revert changes in fp16_gemm_0.py
2025-11-06 22:40:09 -05:00
Ali Hassani
d1ef0e87f2
DistGEMM bug fixes ( #2713 )
...
* Blackwell DistGEMM bug fixes
1. If using preferred cluster, there needs to be a branch so that
the universal GEMM wrapper finds the correct base params.
2. Workspace sizes can change depending on problem shape in Blackwell,
and DistGEMM was previously using the per-device shape to evaluate
workspace size instead of the per-gemm shape.
3. Flattened size used to initialize host tensors can overflow (in
Hopper example as well)
4. Preferred and fallback cluster args need to be set explicitly,
otherwise if someone modifies the example to use preferred cluster,
it will just fail.
* Fix example runtimes
* Set default fallback cluster shapes to the static ones
2025-11-06 13:31:24 -05:00
ANIKET SHIVAM
020c700e97
support for K=0 for sm100 GG ( #2746 )
2025-11-04 11:25:39 -05:00
Haicheng Wu
8afb19d904
update CITATION.cff
2025-10-28 23:42:37 -04:00
Qi Yuhang
b2ca083d2b
Fixed compilation error when using StreamK scheduler + PDL. ( #2686 )
2025-10-21 23:11:14 -04:00
Junkai-Wu
b1d6e2c9b3
v4.3 update. ( #2709 )
...
* v4.3 update.
* Update the cute_dsl_api changelog's doc link
* Update version to 4.3.0
* Update the example link
* Update doc to encourage user to install DSL from requirements.txt
---------
Co-authored-by: Larry Wu <larwu@nvidia.com >
2025-10-21 14:26:30 -04:00
Lain
e6e2cc29f5
fix ( #2684 )
2025-10-15 14:46:38 -04:00
Haicheng Wu
c6aeb9179c
Update pyproject.toml
...
update version to 4.2.1
2025-09-24 01:18:51 -04:00
Haicheng Wu
95a5ff14c0
Update CHANGELOG.md
...
format change
2025-09-23 17:33:00 -04:00
ANIKET SHIVAM
fb8b43ef05
Merge pull request #2669 from NVIDIA/421_update
...
4.2.1 update
2025-09-23 14:02:29 -07:00
Haicheng Wu
f874df19ac
4.2.1 update
2025-09-23 13:45:13 -07:00
Junkai-Wu
7a6d4ee099
v4.2.1 update. ( #2666 )
2025-09-23 13:25:43 -04:00
GTO
2b8dff1f90
Fix bfloat16 epsilon ( #2607 )
...
* Fix bfloat16 epsilon
* just use constants
---------
Co-authored-by: Konstantin <konstantin@MacBook-Air.local >
Co-authored-by: Haicheng Wu <haichengw@nvidia.com >
2025-09-21 23:43:59 -04:00
103yiran
fd0312ddf6
Remove duplicate function calls ( #1584 )
2025-09-21 23:16:59 -04:00
Aya Z. Ibrahim
64579189ec
Feature/add bottom causal mask ( #2480 )
...
* Rebase to latest
* update
* upd
Summary:
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags:
* Update fmha_fusion.hpp
* Update fmha_fusion.hpp
fixed flipped logic for isQBegin
* Update fmha_fusion.hpp
* Avoid use of booleans
The current expression is confusing
* fmt
* Update fmha_fusion.hpp
Reproduce error/fix with:
./77_blackwell_fmha_fp16 --verify --b=1 --q=1013 --k=1024 --h=1 --h_k=1 --mask=causal --causal-type=qend
* add test, format
---------
Co-authored-by: Richard Cai <ricai@nvidia.com >
Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com >
2025-09-18 17:11:23 -04:00
Jack Kosaian
b234a8c024
Rename python/cutlass to python/cutlass_cppgen ( #2652 )
2025-09-18 14:26:57 -04:00
Junkai-Wu
74825181f2
Remove old-version dsl examples. ( #2644 )
2025-09-17 22:23:30 -04:00
Junkai-Wu
8825e8be4f
Add required changes for github pipeline. ( #2648 )
2025-09-17 22:22:45 -04:00
wbn
7817e47154
Fxied a typo in pipeline descript docs. ( #2623 )
2025-09-15 22:32:27 -04:00
Asuka
25ccb875b8
Fix: a calculation error in the example of dividing out in the 02_layout_algebra doc ( #2635 )
2025-09-15 22:31:33 -04:00
Wanshe
29c1ad704a
Fix doc cute 03_tensor.md link typo ( #2627 )
...
* Update 03_tensor.md fix link typo
change path to relative path
* Update 03_tensor.md
---------
Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com >
2025-09-15 22:26:43 -04:00
Haicheng Wu
57e3cfb47a
doc change for 4.2 ( #2639 )
...
* doc change
* fix broken links
* ragged gemm doc update
* move around texts about moe gemm
2025-09-15 22:02:45 -04:00
Haicheng Wu
e7e0adddac
Update version.h
...
change version number to 4.2
2025-09-15 12:40:58 -04:00
Junkai-Wu
6a35b4d22f
v4.2 tag release. ( #2638 )
2025-09-15 12:21:53 -04:00