mirror of
https://github.com/NVIDIA/cutlass.git
synced 2026-04-20 06:48:59 +00:00
new example with TMA prefetch feature targeting for DRAM latency bound cases (#2881)
Co-authored-by: Questa Wang <questaw@computelab-frontend-7.nvidia.com>
This commit is contained in:
File diff suppressed because it is too large
Load Diff
2147
examples/python/CuTeDSL/blackwell/dense_gemm_persistent_prefetch.py
Normal file
2147
examples/python/CuTeDSL/blackwell/dense_gemm_persistent_prefetch.py
Normal file
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user