Hua Huang
|
1cfbb53a23
|
[CuTeDSL] Fix: SM100 block-scale gemm overlapping accumulator (#2995)
* Fix: SM100 block-scale gemm overlapping accumulator
Signed-off-by: Hua Huang <huah@nvidia.com>
* Also include threads_per_warp fix
Signed-off-by: Hua Huang <huah@nvidia.com>
---------
Signed-off-by: Hua Huang <huah@nvidia.com>
|
2026-02-03 11:01:41 +08:00 |
|
Xiao Song
|
acb45938e9
|
Update nvvm API call from nvvm enum to str (#2985)
|
2026-01-27 17:28:29 +08:00 |
|
Junkai-Wu
|
0d2b201e8c
|
v4.3.5 update. (#2934)
* v4.3.5 update.
* Update copyright to 2026
|
2026-01-08 15:02:56 -05:00 |
|
questa-quan-wang
|
3f4c086d09
|
new example with TMA prefetch feature targeting for DRAM latency bound cases (#2881)
Co-authored-by: Questa Wang <questaw@computelab-frontend-7.nvidia.com>
|
2025-12-23 15:29:48 +08:00 |
|