mirror of
https://github.com/NVIDIA/cutlass.git
synced 2026-05-11 08:50:09 +00:00
fix: Add missing kElementsPerAccess division in RegularTileIterator store (#3049)
The store(frag, tile_offset) method was computing the pointer offset without dividing by kElementsPerAccess, while the matching load(frag, tile_offset) method does include this division. Both load_with_pointer_offset and store_with_pointer_offset apply the same byte conversion, so the tile_offset -> pointer_offset calculation must also match. When kElementsPerAccess > 1, this caused load and store to reference different memory locations for the same logical tile offset. Fixes #3017 Signed-off-by: Blake Ledden <bledden@users.noreply.github.com>
This commit is contained in:
@@ -204,7 +204,8 @@ public:
|
||||
void store(Fragment const &frag, TensorCoord const & tile_offset) {
|
||||
store_with_pointer_offset(
|
||||
frag,
|
||||
tile_offset.contiguous() * Shape::kContiguous + tile_offset.strided() * Shape::kStrided * stride_
|
||||
tile_offset.contiguous() * Shape::kContiguous / ThreadMap::kElementsPerAccess +
|
||||
tile_offset.strided() * Shape::kStrided * stride_
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
Reference in New Issue
Block a user