mirror of
https://github.com/NVIDIA/cutlass.git
synced 2026-05-11 08:50:09 +00:00
[CuTeDSL] Update atomic_max_float32 to atomic_fmax in blockscaled GEMM example (#3206)
The internal DSL package refactored atomic_max_float32 to atomic_fmax, which properly handles negative floats via sign-bit-aware integer atomics. Update the example to use the new API so it works with current DSL wheels. Co-authored-by: Questa Wang <questaw@computelab-frontend-7.nvidia.com>
This commit is contained in:
@@ -1423,9 +1423,8 @@ class Sm100BlockScaledPersistentDenseGemmKernel:
|
||||
|
||||
# Global atomic max (accumulates across all tiles for final tensor amax)
|
||||
# Since we compute absolute values, all values are non-negative
|
||||
# Use wrapper function for atomic max operation
|
||||
_ = cute.arch.atomic_max_float32(
|
||||
ptr=mAmax.iterator.llvm_ptr, value=block_amax
|
||||
_ = cute.arch.atomic_fmax(
|
||||
mAmax.iterator, block_amax, sign_bit=False
|
||||
)
|
||||
#
|
||||
# Async arrive accumulator buffer empty
|
||||
|
||||
Reference in New Issue
Block a user