Brian K. Ryu
147f5673d0
New RMS Norm example with unit tests ( #2917 )
...
* Add rmsnorm example
* Address reviewer comments. (1) use the cute.runtime definition directly. (2) use the nvvm_wrapper's warp reduce directly
* Separate out reduce.py
* Change copyright notice years
2026-01-13 09:05:31 +08:00
Junkai-Wu
0d2b201e8c
v4.3.5 update. ( #2934 )
...
* v4.3.5 update.
* Update copyright to 2026
2026-01-08 15:02:56 -05:00
questa-quan-wang
2aee73922c
Minor fix for testing of blockscaled dense GEMM with TMA prefetch ( #2930 )
...
* new example with TMA prefetch feature targeting for DRAM latency bound cases
* minor fix to resitrct as 100a arch
* typo
* apply arch for whole pytest
---------
Co-authored-by: Questa Wang <questaw@computelab-frontend-7.nvidia.com >
Co-authored-by: Questa Wang <questaw@umbriel-b200-145.ipp4a1.colossus.nvidia.com >
2026-01-05 16:36:03 +08:00
questa-quan-wang
3f4c086d09
new example with TMA prefetch feature targeting for DRAM latency bound cases ( #2881 )
...
Co-authored-by: Questa Wang <questaw@computelab-frontend-7.nvidia.com >
2025-12-23 15:29:48 +08:00
Linfeng Zheng
f6402fcd5e
add pytest support for tutorial gemm ( #2826 )
...
* add pytest support for tutorial gemm
* add license
2025-12-05 08:45:01 -05:00