[CuTeDSL] Update atomic_max_float32 to atomic_fmax in blockscaled GEMM example (#3206)

The internal DSL package refactored atomic_max_float32 to atomic_fmax,
which properly handles negative floats via sign-bit-aware integer
atomics. Update the example to use the new API so it works with
current DSL wheels.

Co-authored-by: Questa Wang <questaw@computelab-frontend-7.nvidia.com>
This commit is contained in:
questa-quan-wang
2026-05-07 15:03:37 +08:00
committed by GitHub
parent cb37157db5
commit ae6bccf341

View File

@@ -1423,9 +1423,8 @@ class Sm100BlockScaledPersistentDenseGemmKernel:
# Global atomic max (accumulates across all tiles for final tensor amax)
# Since we compute absolute values, all values are non-negative
# Use wrapper function for atomic max operation
_ = cute.arch.atomic_max_float32(
ptr=mAmax.iterator.llvm_ptr, value=block_amax
_ = cute.arch.atomic_fmax(
mAmax.iterator, block_amax, sign_bit=False
)
#
# Async arrive accumulator buffer empty