fix: Add missing kElementsPerAccess division in RegularTileIterator store (#3049)

The store(frag, tile_offset) method was computing the pointer offset
without dividing by kElementsPerAccess, while the matching load(frag,
tile_offset) method does include this division. Both load_with_pointer_offset
and store_with_pointer_offset apply the same byte conversion, so the
tile_offset -> pointer_offset calculation must also match.

When kElementsPerAccess > 1, this caused load and store to reference
different memory locations for the same logical tile offset.

Fixes #3017

Signed-off-by: Blake Ledden <bledden@users.noreply.github.com>
This commit is contained in:
Blake Ledden
2026-04-24 20:27:40 -07:00
committed by GitHub
parent 9135a9bb6d
commit 7a9fe055cb

View File

@@ -204,7 +204,8 @@ public:
void store(Fragment const &frag, TensorCoord const & tile_offset) {
store_with_pointer_offset(
frag,
tile_offset.contiguous() * Shape::kContiguous + tile_offset.strided() * Shape::kStrided * stride_
tile_offset.contiguous() * Shape::kContiguous / ThreadMap::kElementsPerAccess +
tile_offset.strided() * Shape::kStrided * stride_
);
}