mirror of
https://github.com/NVIDIA/cutlass.git
synced 2026-05-12 01:10:08 +00:00
The store(frag, tile_offset) method was computing the pointer offset without dividing by kElementsPerAccess, while the matching load(frag, tile_offset) method does include this division. Both load_with_pointer_offset and store_with_pointer_offset apply the same byte conversion, so the tile_offset -> pointer_offset calculation must also match. When kElementsPerAccess > 1, this caused load and store to reference different memory locations for the same logical tile offset. Fixes #3017 Signed-off-by: Blake Ledden <bledden@users.noreply.github.com>