Commit Graph

5 Commits

Author SHA1 Message Date
Brian K. Ryu
147f5673d0 New RMS Norm example with unit tests (#2917)
* Add rmsnorm example

* Address reviewer comments. (1) use the cute.runtime definition directly. (2) use the nvvm_wrapper's warp reduce directly

* Separate out reduce.py

* Change copyright notice years
2026-01-13 09:05:31 +08:00
Junkai-Wu
0d2b201e8c v4.3.5 update. (#2934)
* v4.3.5 update.

* Update copyright to 2026
2026-01-08 15:02:56 -05:00
questa-quan-wang
2aee73922c Minor fix for testing of blockscaled dense GEMM with TMA prefetch (#2930)
* new example with TMA prefetch feature targeting for DRAM latency bound cases

* minor fix to resitrct as 100a arch

* typo

* apply arch for whole pytest

---------

Co-authored-by: Questa Wang <questaw@computelab-frontend-7.nvidia.com>
Co-authored-by: Questa Wang <questaw@umbriel-b200-145.ipp4a1.colossus.nvidia.com>
2026-01-05 16:36:03 +08:00
questa-quan-wang
3f4c086d09 new example with TMA prefetch feature targeting for DRAM latency bound cases (#2881)
Co-authored-by: Questa Wang <questaw@computelab-frontend-7.nvidia.com>
2025-12-23 15:29:48 +08:00
Linfeng Zheng
f6402fcd5e add pytest support for tutorial gemm (#2826)
* add pytest support for tutorial gemm

* add license
2025-12-05 08:45:01 -05:00