mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-03-27 18:47:39 +00:00
* removed the api reference * updating to the latest rocm-docs-core min version * fixed a formatting issue with buffer views * removed reference links from code snippets * removed reference links from code snippets --------- Co-authored-by: John Afaganis <john.afaganis@amd.com>
770 lines
27 KiB
ReStructuredText
770 lines
27 KiB
ReStructuredText
.. _ck_tile_transforms:
|
||
|
||
Individual Transform Operations
|
||
===============================
|
||
|
||
The transformation engine is built from individual transform types that each handle specific coordinate conversions.
|
||
|
||
What Are Transforms?
|
||
--------------------
|
||
|
||
Transform operations convert coordinates between different dimensional spaces. Each transform operates between two :ref:`coordinate spaces <ck_tile_coordinate_systems>`:
|
||
|
||
- **Lower Dimension Space**: The source coordinate system
|
||
- **Upper Dimension Space**: The target coordinate system
|
||
|
||
Transform Direction
|
||
~~~~~~~~~~~~~~~~~~~
|
||
|
||
Transforms work bidirectionally:
|
||
|
||
- **Forward Transform**: Converts coordinates from the lower dimension to the upper dimension
|
||
- **Inverse Transform**: Converts coordinates back from the upper dimension to the lower dimension
|
||
|
||
Zero-Copy Logical Operations
|
||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
**Critical Understanding**: All transform operations happen in **logical coordinate space** only. This is a zero-copy system and there is **no data copying or movement** involved.
|
||
|
||
- **Data Storage**: The actual tensor data remains stored in memory in linear fashion, exactly as specified by the original tensor shape and strides at creation time. See :ref:`ck_tile_buffer_views` for more information about raw memory access.
|
||
- **Logical Mapping**: Transforms create different logical views of the same underlying data and only change how access coordinates are interpreted. See :ref:`ck_tile_tensor_views` for more information about tensor views.
|
||
|
||
..
|
||
Original mermaid diagram (edit here, then run update_diagrams.py)
|
||
|
||
.. mermaid::
|
||
|
||
graph TB
|
||
subgraph "Tensor Coordinate Transformation"
|
||
US["Lower Dimension Space<br/>Source coordinate system"]
|
||
LS["Upper Dimension Space<br/>Target coordinate system"]
|
||
|
||
DATA["Linear Data in Memory<br/>Layout determined by tensor<br/>shape & strides"]
|
||
end
|
||
|
||
US -->|"Forward Transform"| LS
|
||
LS -->|"Inverse Transform"| US
|
||
|
||
DATA -.->|"Same data,<br/>different views"| US
|
||
DATA -.->|"Same data,<br/>different views"| LS
|
||
|
||
style US fill:#e3f2fd,stroke:#1976d2,stroke-width:3px
|
||
style LS fill:#fff3e0,stroke:#f57c00,stroke-width:3px
|
||
style DATA fill:#f0f9ff,stroke:#0284c7,stroke-width:2px,stroke-dasharray: 5 5
|
||
|
||
|
||
|
||
|
||
|
||
.. image:: diagrams/transforms_1.svg
|
||
:alt: Diagram
|
||
:align: center
|
||
|
||
Index Calculation Operations
|
||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
The transform system provides two operations for coordinate conversion:
|
||
|
||
- **calculate_lower_index()**: Takes a coordinate from the **upper dimension space** and transforms it to get the corresponding index or coordinate in the **lower dimension space**. This calculates where to find the actual tensor element using the transformed coordinate system.
|
||
|
||
- **calculate_upper_index()**: Takes a coordinate from the **lower dimension space** and transforms it back to get the corresponding coordinate in the **upper dimension space**. This performs the inverse transformation to recover the original coordinate representation.
|
||
|
||
These operations enable bidirectional navigation between different coordinate representations of the same underlying tensor data.
|
||
|
||
Transform System Architecture
|
||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
..
|
||
Original mermaid diagram (edit here, then run update_diagrams.py)
|
||
|
||
.. mermaid::
|
||
|
||
graph TB
|
||
|
||
subgraph "Transform Types"
|
||
EMB["EmbedTransform<br/>Linear → Multi-D Strided"]
|
||
UNM["MergeTransform<br/>Multi-D → Linear"]
|
||
MRG["UnmergeTransform<br/>Linear → Multi-D"]
|
||
REP["ReplicateTransform<br/>0D → Multi-D Broadcast"]
|
||
OFF["OffsetTransform<br/>Translation"]
|
||
PAS["PassThroughTransform<br/>Identity"]
|
||
PAD["PadTransform<br/>Boundaries"]
|
||
end
|
||
|
||
subgraph "Operations"
|
||
FWD["Forward<br/>calculate_lower_index()"]
|
||
BWD["Backward<br/>calculate_upper_index()"]
|
||
UPD["Update<br/>update_lower_index()"]
|
||
end
|
||
|
||
EMB --> FWD
|
||
UNM --> FWD
|
||
MRG --> FWD
|
||
REP --> FWD
|
||
OFF --> FWD
|
||
PAS --> FWD
|
||
PAD --> FWD
|
||
|
||
style FWD fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
|
||
|
||
|
||
|
||
|
||
|
||
.. image:: diagrams/transforms_2.svg
|
||
:alt: Diagram
|
||
:align: center
|
||
|
||
MergeTransform
|
||
--------------
|
||
|
||
MergeTransform collapses multiple dimensions from the lower coordinate space into a single dimension in the upper coordinate space, effectively reducing the dimensionality of the tensor representation while preserving data relationships. This transform is fundamental to the :ref:`tile distribution system <ck_tile_tile_distribution>`.
|
||
|
||
..
|
||
Original mermaid diagram (edit here, then run update_diagrams.py)
|
||
|
||
.. mermaid::
|
||
|
||
graph TB
|
||
subgraph "MergeTransform: Multi-D → Linear"
|
||
LS["Lower Coordinate Space<br/>2D: [4, 5]<br/>Coord: (2, 3)"]
|
||
US["Upper Coordinate Space<br/>1D Linear<br/>Index: 13"]
|
||
|
||
DATA["Same Tensor Data<br/>Layout: row-major<br/>Size: 20 elements"]
|
||
end
|
||
|
||
LS -->|"Forward Transform<br/>2×5 + 3 = 13"| US
|
||
US -->|"Inverse Transform<br/>13÷5=2, 13%5=3"| LS
|
||
|
||
DATA -.->|"Multi-dimensional<br/>view"| LS
|
||
DATA -.->|"Linear<br/>view"| US
|
||
|
||
style LS fill:#e3f2fd,stroke:#1976d2,stroke-width:3px
|
||
style US fill:#fff3e0,stroke:#f57c00,stroke-width:3px
|
||
style DATA fill:#f0f9ff,stroke:#0284c7,stroke-width:2px,stroke-dasharray: 5 5
|
||
|
||
|
||
|
||
|
||
|
||
.. image:: diagrams/transforms_3.svg
|
||
:alt: Diagram
|
||
:align: center
|
||
|
||
|
||
**C++ Implementation:**
|
||
|
||
.. code-block:: cpp
|
||
|
||
using namespace ck_tile;
|
||
|
||
// Create MergeTransform for 4x5 tensor (20 elements total)
|
||
auto transform = make_merge_transform(make_tuple(4, 5));
|
||
|
||
// Forward: Lower (2D) → Upper (1D) - Manual calculation
|
||
int row = 2, col = 3;
|
||
int linear_index = row * 5 + col; // Result: 13
|
||
printf("2D coord [%d, %d] → Linear index %d\n", row, col, linear_index);
|
||
printf("Calculation: %d×5 + %d = %d\n", row, col, linear_index);
|
||
|
||
// Inverse: Upper (1D) → Lower (2D) - Using transform
|
||
multi_index<1> upper_coord;
|
||
upper_coord[number<0>{}] = 13;
|
||
|
||
multi_index<2> lower_coord;
|
||
transform.calculate_lower_index(lower_coord, upper_coord);
|
||
|
||
printf("Linear index %d → 2D coord [%d, %d]\n",
|
||
static_cast<int>(upper_coord[number<0>{}]),
|
||
static_cast<int>(lower_coord[number<0>{}]),
|
||
static_cast<int>(lower_coord[number<1>{}]));
|
||
printf("Calculation: 13 ÷ 5 = %d remainder %d\n",
|
||
static_cast<int>(lower_coord[number<0>{}]),
|
||
static_cast<int>(lower_coord[number<1>{}]));
|
||
|
||
UnmergeTransform
|
||
----------------
|
||
|
||
UnmergeTransform expands coordinates from a single dimension in the lower coordinate space into multiple dimensions in the upper coordinate space, effectively increasing the dimensionality of the tensor representation while preserving all data relationships.
|
||
|
||
..
|
||
Original mermaid diagram (edit here, then run update_diagrams.py)
|
||
|
||
.. mermaid::
|
||
|
||
graph TB
|
||
subgraph "UnmergeTransform: Linear → Multi-D"
|
||
LS["Lower Coordinate Space<br/>1D Linear<br/>Index: 14"]
|
||
US["Upper Coordinate Space<br/>3D: [3, 4, 2]<br/>Coord: (1, 3, 0)"]
|
||
|
||
DATA["Same Tensor Data<br/>Layout: row-major<br/>Size: 24 elements"]
|
||
end
|
||
|
||
LS -->|"Forward Transform<br/>14 = 1×8 + 3×2 + 0"| US
|
||
US -->|"Inverse Transform<br/>linearize back"| LS
|
||
|
||
DATA -.->|"Linear<br/>view"| LS
|
||
DATA -.->|"Multi-dimensional<br/>view"| US
|
||
|
||
style LS fill:#e3f2fd,stroke:#1976d2,stroke-width:3px
|
||
style US fill:#fff3e0,stroke:#f57c00,stroke-width:3px
|
||
style DATA fill:#f0f9ff,stroke:#0284c7,stroke-width:2px,stroke-dasharray: 5 5
|
||
|
||
|
||
|
||
|
||
|
||
.. image:: diagrams/transforms_4.svg
|
||
:alt: Diagram
|
||
:align: center
|
||
|
||
**C++ Implementation:**
|
||
|
||
.. code-block:: cpp
|
||
|
||
using namespace ck_tile;
|
||
|
||
// Create UnmergeTransform for 3x4x2 tensor (24 elements total)
|
||
auto transform = make_unmerge_transform(make_tuple(3, 4, 2));
|
||
|
||
// Forward: Lower (1D) → Upper (3D) - Manual calculation
|
||
int linear_index = 14;
|
||
int dim0 = linear_index / (4 * 2); // 14 / 8 = 1
|
||
int remainder = linear_index % (4 * 2); // 14 % 8 = 6
|
||
int dim1 = remainder / 2; // 6 / 2 = 3
|
||
int dim2 = remainder % 2; // 6 % 2 = 0
|
||
|
||
printf("Linear index %d → 3D coord [%d, %d, %d]\n",
|
||
linear_index, dim0, dim1, dim2);
|
||
printf("Calculation: 14 = %d×8 + %d×2 + %d\n", dim0, dim1, dim2);
|
||
|
||
// Inverse: Upper (3D) → Lower (1D) - Using transform
|
||
multi_index<3> upper_coord;
|
||
upper_coord[number<0>{}] = 1;
|
||
upper_coord[number<1>{}] = 3;
|
||
upper_coord[number<2>{}] = 0;
|
||
|
||
multi_index<1> lower_coord;
|
||
transform.calculate_lower_index(lower_coord, upper_coord);
|
||
|
||
printf("3D coord [%d, %d, %d] → Linear index %d\n",
|
||
static_cast<int>(upper_coord[number<0>{}]),
|
||
static_cast<int>(upper_coord[number<1>{}]),
|
||
static_cast<int>(upper_coord[number<2>{}]),
|
||
static_cast<int>(lower_coord[number<0>{}]));
|
||
printf("Calculation: %d×8 + %d×2 + %d = %d\n",
|
||
static_cast<int>(upper_coord[number<0>{}]),
|
||
static_cast<int>(upper_coord[number<1>{}]),
|
||
static_cast<int>(upper_coord[number<2>{}]),
|
||
static_cast<int>(lower_coord[number<0>{}]));
|
||
|
||
EmbedTransform
|
||
--------------
|
||
|
||
EmbedTransform expands linear indices from the lower coordinate space into multi-dimensional coordinates in the upper coordinate space using configurable strides, enabling flexible strided tensor layouts and sub-tensor views within larger buffers.
|
||
|
||
|
||
..
|
||
Original mermaid diagram (edit here, then run update_diagrams.py)
|
||
|
||
.. mermaid::
|
||
|
||
graph TB
|
||
subgraph "EmbedTransform: Linear → Multi-D Strided"
|
||
LS["Lower Coordinate Space<br/>1D Linear<br/>Index: 14"]
|
||
US["Upper Coordinate Space<br/>2D: [2, 3]<br/>Coord: (1, 2)"]
|
||
|
||
DATA["Linear Buffer in Memory"]
|
||
end
|
||
|
||
LS -->|"Forward Transform <br/>Strides: [12, 1] <br/>14 ÷ 12 = 1, 14 % 12 = 2"| US
|
||
US -->|"Inverse Transform<br/>1×12 + 2×1 = 14"| LS
|
||
|
||
DATA -.->|"Linear<br/>index view"| LS
|
||
DATA -.->|"Multi-dimensional<br/>strided view"| US
|
||
|
||
style LS fill:#e3f2fd,stroke:#1976d2,stroke-width:3px
|
||
style US fill:#fff3e0,stroke:#f57c00,stroke-width:3px
|
||
style DATA fill:#f0f9ff,stroke:#0284c7,stroke-width:2px,stroke-dasharray: 5 5
|
||
|
||
|
||
|
||
|
||
|
||
.. image:: diagrams/transforms_5.svg
|
||
:alt: Diagram
|
||
:align: center
|
||
|
||
**C++ Implementation:**
|
||
|
||
.. code-block:: cpp
|
||
|
||
using namespace ck_tile;
|
||
|
||
// Create embed transform for 2x3 tensor with strides [12, 1]
|
||
// This is commonly used in descriptors
|
||
auto transform = make_embed_transform(make_tuple(2, 3), make_tuple(12, 1));
|
||
|
||
// Forward: Linear → 2D (Manual calculation)
|
||
int linear_idx = 14;
|
||
int row = linear_idx / 12; // 14 / 12 = 1
|
||
int remainder = linear_idx % 12; // 14 % 12 = 2
|
||
int col = remainder / 1; // 2 / 1 = 2
|
||
printf("Linear index %d → 2D coord [%d, %d]\n", linear_idx, row, col);
|
||
|
||
// Inverse: 2D → Linear (Using transform)
|
||
multi_index<2> upper_coord;
|
||
upper_coord[number<0>{}] = 1;
|
||
upper_coord[number<1>{}] = 2;
|
||
|
||
multi_index<1> lower_coord;
|
||
transform.calculate_lower_index(lower_coord, upper_coord);
|
||
printf("2D coord [%d, %d] → Linear index %d\n",
|
||
static_cast<int>(upper_coord[number<0>{}]),
|
||
static_cast<int>(upper_coord[number<1>{}]),
|
||
static_cast<int>(lower_coord[number<0>{}]));
|
||
|
||
ReplicateTransform
|
||
------------------
|
||
|
||
ReplicateTransform creates a higher-dimensional tensor by replicating (broadcasting) a lower-dimensional tensor. It's essentially a broadcasting operation that takes a tensor with fewer dimensions and logically replicates it across new dimensions without data duplication. An example is taking a scalar (0-dimensional) input and broadcasting it across multiple dimensions, enabling efficient broadcasting patterns where a single value appears at every position in a multi-dimensional coordinate space.
|
||
|
||
..
|
||
Original mermaid diagram (edit here, then run update_diagrams.py)
|
||
|
||
.. mermaid::
|
||
|
||
graph TB
|
||
subgraph "ReplicateTransform: 0D → Multi-D Broadcasting"
|
||
LS["Lower Coordinate Space<br/>0D: Scalar<br/>Empty coordinate []"]
|
||
US["Upper Coordinate Space<br/>2D: [3, 4]<br/>All coords: (i, j)"]
|
||
|
||
DATA["Single Scalar Value"]
|
||
end
|
||
|
||
LS -->|"Forward Transform<br/>[] → (i,j) for any i,j"| US
|
||
US -->|"Inverse Transform<br/>(i,j) → [] for any i,j"| LS
|
||
|
||
DATA -.->|"One scalar<br/>value"| LS
|
||
DATA -.->|"Broadcasted view<br/>at all positions"| US
|
||
|
||
style LS fill:#e3f2fd,stroke:#1976d2,stroke-width:3px
|
||
style US fill:#fff3e0,stroke:#f57c00,stroke-width:3px
|
||
style DATA fill:#f0f9ff,stroke:#0284c7,stroke-width:2px,stroke-dasharray: 5 5
|
||
|
||
|
||
|
||
|
||
|
||
.. image:: diagrams/transforms_6.svg
|
||
:alt: Diagram
|
||
:align: center
|
||
|
||
**C++ Implementation:**
|
||
|
||
.. code-block:: cpp
|
||
|
||
using namespace ck_tile;
|
||
|
||
// Create replicate transform for 3x4 broadcasting
|
||
auto transform = make_replicate_transform(make_tuple(3, 4));
|
||
|
||
// Inverse: Upper (2D) → Lower (0D) - Using transform
|
||
// Any 2D coordinate maps to empty scalar coordinate
|
||
multi_index<2> upper_coord;
|
||
upper_coord[number<0>{}] = 1;
|
||
upper_coord[number<1>{}] = 2;
|
||
|
||
multi_index<0> lower_coord; // Empty coordinate (0 dimensions)
|
||
transform.calculate_lower_index(lower_coord, upper_coord);
|
||
printf("2D [%d, %d] → Empty scalar [] (always empty)\n",
|
||
static_cast<int>(upper_coord[number<0>{}]),
|
||
static_cast<int>(upper_coord[number<1>{}]));
|
||
|
||
// Forward: Scalar → 2D (Conceptual - no coordinate calculation needed)
|
||
// Broadcasting: Single scalar value appears at ALL positions
|
||
printf("Broadcasting: Scalar value appears at every [i,j] where 0≤i<3, 0≤j<4\n");
|
||
printf("Total positions: 3×4 = 12 positions, all contain same scalar value\n");
|
||
|
||
// Test multiple coordinates - all map to empty scalar
|
||
int test_coords[][2] = {{0, 0}, {1, 2}, {2, 3}};
|
||
for(int i = 0; i < 3; i++)
|
||
{
|
||
multi_index<2> test_upper;
|
||
test_upper[number<0>{}] = test_coords[i][0];
|
||
test_upper[number<1>{}] = test_coords[i][1];
|
||
|
||
multi_index<0> test_lower;
|
||
transform.calculate_lower_index(test_lower, test_upper);
|
||
printf("2D [%d, %d] → Empty scalar []\n",
|
||
test_coords[i][0], test_coords[i][1]);
|
||
}
|
||
|
||
OffsetTransform
|
||
---------------
|
||
|
||
OffsetTransform shifts coordinates by a fixed offset, creating a translated view of the coordinate space. It performs translation operations where each coordinate in the upper space is mapped to a coordinate in the lower space by adding a constant offset.
|
||
|
||
|
||
..
|
||
Original mermaid diagram (edit here, then run update_diagrams.py)
|
||
|
||
.. mermaid::
|
||
|
||
graph TB
|
||
subgraph "OffsetTransform: 1D → 1D Translation"
|
||
LS["Lower Coordinate Space<br/>1D: [0, 63]<br/>Coord: index + offset"]
|
||
US["Upper Coordinate Space<br/>1D: [0, 47]<br/>Coord: index"]
|
||
|
||
DATA["Linear Buffer in Memory"]
|
||
end
|
||
|
||
LS -->|"Forward Transform<br/>idx → idx + 16"| US
|
||
US -->|"Inverse Transform<br/>idx + 16 → idx"| LS
|
||
|
||
DATA -.->|"Lower<br/>view"| LS
|
||
DATA -.->|"Upper<br/>view"| US
|
||
|
||
style LS fill:#e3f2fd,stroke:#1976d2,stroke-width:3px
|
||
style US fill:#fff3e0,stroke:#f57c00,stroke-width:3px
|
||
style DATA fill:#f0f9ff,stroke:#0284c7,stroke-width:2px,stroke-dasharray: 5 5
|
||
|
||
|
||
|
||
|
||
|
||
.. image:: diagrams/transforms_7.svg
|
||
:alt: Diagram
|
||
:align: center
|
||
|
||
**C++ Implementation:**
|
||
|
||
.. code-block:: cpp
|
||
|
||
using namespace ck_tile;
|
||
|
||
// Create offset transform for coordinate translation
|
||
// CK Tile formula: lower = upper + offset
|
||
auto transform = make_offset_transform(48, 16);
|
||
|
||
// Using Transform: Original → Translated (adds offset)
|
||
multi_index<1> upper_coord;
|
||
upper_coord[number<0>{}] = 5; // Original index 5
|
||
|
||
multi_index<1> lower_coord;
|
||
transform.calculate_lower_index(lower_coord, upper_coord);
|
||
printf("Original index %d → Translated index %d\n",
|
||
static_cast<int>(upper_coord[number<0>{}]),
|
||
static_cast<int>(lower_coord[number<0>{}]));
|
||
printf("Calculation: %d + 16 = %d\n",
|
||
static_cast<int>(upper_coord[number<0>{}]),
|
||
static_cast<int>(lower_coord[number<0>{}]));
|
||
|
||
// Manual Reverse: Translated → Original (subtract offset)
|
||
int translated_idx = 21;
|
||
int original_idx = translated_idx - 16;
|
||
printf("Translated index %d → Original index %d\n", translated_idx, original_idx);
|
||
|
||
// Test multiple coordinates
|
||
int test_indices[] = {0, 10, 20, 47};
|
||
for(int i = 0; i < 4; i++)
|
||
{
|
||
multi_index<1> test_upper;
|
||
test_upper[number<0>{}] = test_indices[i];
|
||
|
||
multi_index<1> test_lower;
|
||
transform.calculate_lower_index(test_lower, test_upper);
|
||
printf("Original %d → Translated %d\n",
|
||
test_indices[i], static_cast<int>(test_lower[number<0>{}]));
|
||
}
|
||
|
||
PassThroughTransform - Identity
|
||
-------------------------------
|
||
|
||
No-op transform that passes coordinates unchanged. The PassThrough transform is the simplest coordinate transformation in CK Tile, implementing a perfect identity mapping where input coordinates are passed through unchanged to the output. This transform is essential as a placeholder in transformation chains and for dimensions that require no modification.
|
||
|
||
..
|
||
Original mermaid diagram (edit here, then run update_diagrams.py)
|
||
|
||
.. mermaid::
|
||
|
||
graph TB
|
||
subgraph "PassThroughTransform: 1D → 1D Identity"
|
||
LS["Lower Coordinate Space<br/>1D: [0, 59]<br/>Coord: index"]
|
||
US["Upper Coordinate Space<br/>1D: [0, 59]<br/>Coord: index"]
|
||
|
||
DATA["Linear Buffer in Memory"]
|
||
end
|
||
|
||
LS -.->|"Perfect Identity<br/>idx → idx"| US
|
||
US -.->|"Perfect Identity<br/>idx → idx"| LS
|
||
|
||
DATA -->|"Same buffer<br/>same view"| LS
|
||
DATA -->|"Same buffer<br/>same view"| US
|
||
|
||
style LS fill:#e8f5e8,stroke:#2e7d32,stroke-width:3px
|
||
style US fill:#e8f5e8,stroke:#2e7d32,stroke-width:3px
|
||
style DATA fill:#f0f9ff,stroke:#0284c7,stroke-width:2px,stroke-dasharray: 5 5
|
||
|
||
|
||
|
||
|
||
|
||
.. image:: diagrams/transforms_8.svg
|
||
:alt: Diagram
|
||
:align: center
|
||
|
||
**C++ Implementation:**
|
||
|
||
.. code-block:: cpp
|
||
|
||
using namespace ck_tile;
|
||
|
||
// Identity transform - no change
|
||
int length = 60;
|
||
|
||
auto transform = make_pass_through_transform(length);
|
||
|
||
printf("Length: %d\n", length);
|
||
|
||
// Forward: Upper → Lower (identity)
|
||
multi_index<1> upper_coord;
|
||
upper_coord[number<0>{}] = 25;
|
||
|
||
multi_index<1> lower_coord;
|
||
transform.calculate_lower_index(lower_coord, upper_coord);
|
||
|
||
printf("\nForward: [%d] → [%d] (unchanged)\n",
|
||
static_cast<int>(upper_coord[number<0>{}]),
|
||
static_cast<int>(lower_coord[number<0>{}]));
|
||
|
||
// Reverse: Lower → Upper (identity)
|
||
multi_index<1> lower_input;
|
||
lower_input[number<0>{}] = 42;
|
||
|
||
multi_index<1> upper_result;
|
||
// Note: PassThrough is bidirectional identity, so we can use same function
|
||
transform.calculate_lower_index(upper_result, lower_input);
|
||
|
||
printf("Reverse: [%d] → [%d] (unchanged)\n",
|
||
static_cast<int>(lower_input[number<0>{}]),
|
||
static_cast<int>(upper_result[number<0>{}]));
|
||
|
||
PadTransform
|
||
------------
|
||
|
||
PadTransform adds padding to tensor dimensions, mapping coordinates from upper dimension space (with padding) to lower dimension space (original data).
|
||
|
||
..
|
||
Original mermaid diagram (edit here, then run update_diagrams.py)
|
||
|
||
..
|
||
Original mermaid diagram (edit here, then run update_diagrams.py)
|
||
|
||
.. mermaid::
|
||
|
||
graph TB
|
||
subgraph "PadTransform: 1D → 1D with Padding"
|
||
LS["Lower Coordinate Space<br/>1D: [0, 2] (original data)"]
|
||
US["Upper Coordinate Space<br/>1D: [0, 4] (with padding)"]
|
||
|
||
DATA["Tensor Data in Memory"]
|
||
end
|
||
|
||
LS -->|"Forward Transform<br/>idx + left_pad"| US
|
||
US -->|"Inverse Transform<br/>idx - left_pad"| LS
|
||
|
||
DATA -.->|"Original view"| LS
|
||
DATA -.->|"Padded view"| US
|
||
|
||
style LS fill:#e3f2fd,stroke:#1976d2,stroke-width:3px
|
||
style US fill:#fff3e0,stroke:#f57c00,stroke-width:3px
|
||
style DATA fill:#f0f9ff,stroke:#0284c7,stroke-width:2px,stroke-dasharray: 5 5
|
||
|
||
|
||
|
||
|
||
|
||
.. image:: diagrams/transforms_9.svg
|
||
:alt: Diagram
|
||
:align: center
|
||
|
||
**C++ Implementation:**
|
||
|
||
.. code-block:: cpp
|
||
|
||
using namespace ck_tile;
|
||
|
||
// PadTransform for coordinate padding
|
||
int low_length = 3; // Original dimension length
|
||
int left_pad = 1; // Padding on left
|
||
int right_pad = 1; // Padding on right
|
||
|
||
auto transform = make_pad_transform(low_length, left_pad, right_pad);
|
||
|
||
printf("Low length: %d\n", low_length);
|
||
printf("Left pad: %d\n", left_pad);
|
||
printf("Right pad: %d\n", right_pad);
|
||
printf("Upper length: %d (total with padding)\n", low_length + left_pad + right_pad);
|
||
|
||
// Test coordinate mapping
|
||
int test_coords[] = {0, 1, 2, 3, 4};
|
||
for(int i = 0; i < 5; i++)
|
||
{
|
||
multi_index<1> upper;
|
||
upper[number<0>{}] = test_coords[i];
|
||
|
||
multi_index<1> lower;
|
||
transform.calculate_lower_index(lower, upper);
|
||
|
||
int adjusted_idx = static_cast<int>(lower[number<0>{}]);
|
||
bool is_valid = (adjusted_idx >= 0 && adjusted_idx < low_length);
|
||
|
||
printf("Upper %d → Lower %d (%s)\n",
|
||
test_coords[i], adjusted_idx,
|
||
is_valid ? "valid" : "padding");
|
||
}
|
||
|
||
Additional Transform Types
|
||
--------------------------
|
||
|
||
XorTransform
|
||
~~~~~~~~~~~~
|
||
|
||
XorTransform applies a 2D XOR mapping for specialized memory access patterns. It performs XOR operations on coordinates to create transformed memory layouts for specific algorithmic optimizations, particularly useful for avoiding :ref:`LDS bank conflicts <ck_tile_lds_bank_conflicts>`.
|
||
|
||
..
|
||
Original mermaid diagram (edit here, then run update_diagrams.py)
|
||
|
||
.. mermaid::
|
||
|
||
graph TB
|
||
subgraph "XorTransform: 2D → 2D XOR Mapping"
|
||
LS["Lower Coordinate Space<br/>2D: [4, 8]<br/>XOR-transformed coords"]
|
||
US["Upper Coordinate Space<br/>2D: [4, 8]<br/>Normal coords"]
|
||
|
||
DATA["Same Tensor Data"]
|
||
end
|
||
|
||
LS -->|"Forward Transform<br/>apply XOR reverse"| US
|
||
US -->|"Inverse Transform<br/>apply XOR mapping"| LS
|
||
|
||
DATA -.->|"XOR pattern<br/>view"| LS
|
||
DATA -.->|"Normal<br/>view"| US
|
||
|
||
style LS fill:#e3f2fd,stroke:#1976d2,stroke-width:3px
|
||
style US fill:#fff3e0,stroke:#f57c00,stroke-width:3px
|
||
style DATA fill:#f0f9ff,stroke:#0284c7,stroke-width:2px,stroke-dasharray: 5 5
|
||
|
||
|
||
|
||
|
||
|
||
.. image:: diagrams/transforms_10.svg
|
||
:alt: Diagram
|
||
:align: center
|
||
|
||
SliceTransform
|
||
~~~~~~~~~~~~~~
|
||
|
||
SliceTransform extracts a sub-region from a tensor dimension.
|
||
|
||
..
|
||
Original mermaid diagram (edit here, then run update_diagrams.py)
|
||
|
||
.. mermaid::
|
||
|
||
graph TB
|
||
subgraph "SliceTransform: 1D → 1D Sub-region"
|
||
LS["Lower Coordinate Space<br/>1D: [0, 9] (original range)"]
|
||
US["Upper Coordinate Space<br/>1D: [0, 4] (slice range)"]
|
||
|
||
DATA["Tensor Data in Memory"]
|
||
end
|
||
|
||
LS -->|"Forward Transform<br/>idx + slice_begin"| US
|
||
US -->|"Inverse Transform<br/>idx - slice_begin"| LS
|
||
|
||
DATA -.->|"Full tensor<br/>view"| LS
|
||
DATA -.->|"Sub-region<br/>view"| US
|
||
|
||
style LS fill:#e3f2fd,stroke:#1976d2,stroke-width:3px
|
||
style US fill:#fff3e0,stroke:#f57c00,stroke-width:3px
|
||
style DATA fill:#f0f9ff,stroke:#0284c7,stroke-width:2px,stroke-dasharray: 5 5
|
||
|
||
|
||
|
||
|
||
|
||
.. image:: diagrams/transforms_11.svg
|
||
:alt: Diagram
|
||
:align: center
|
||
|
||
ModuloTransform
|
||
~~~~~~~~~~~~~~~
|
||
|
||
ModuloTransform applies cyclic wrapping to coordinates using modulo operations.
|
||
|
||
..
|
||
Original mermaid diagram (edit here, then run update_diagrams.py)
|
||
|
||
.. mermaid::
|
||
|
||
graph TB
|
||
subgraph "ModuloTransform: 1D → 1D Cyclic"
|
||
LS["Lower Coordinate Space<br/>1D: [0, 3] (modulus range)"]
|
||
US["Upper Coordinate Space<br/>1D: [0, 15] (full range)"]
|
||
|
||
DATA["Tensor Data in Memory"]
|
||
end
|
||
|
||
LS -->|"Forward Transform<br/>idx * cycle_count"| US
|
||
US -->|"Inverse Transform<br/>idx % modulus"| LS
|
||
|
||
DATA -.->|" "| LS
|
||
DATA -.->|" "| US
|
||
|
||
style LS fill:#e3f2fd,stroke:#1976d2,stroke-width:3px
|
||
style US fill:#fff3e0,stroke:#f57c00,stroke-width:3px
|
||
style DATA fill:#f0f9ff,stroke:#0284c7,stroke-width:2px,stroke-dasharray: 5 5
|
||
|
||
|
||
|
||
.. image:: diagrams/transforms_12.svg
|
||
:alt: Diagram
|
||
:align: center
|
||
|
||
Summary
|
||
-------
|
||
|
||
Individual transforms provide:
|
||
|
||
- **Modularity**: Each transform does one thing
|
||
- **Composability**: Chain transforms for complex mappings (see :ref:`ck_tile_adaptors`)
|
||
- **Efficiency**: Compile-time optimization in C++
|
||
- **Flexibility**: Handle any coordinate conversion need
|
||
|
||
These transforms enable you to:
|
||
|
||
1. Create custom tensor views
|
||
2. Implement efficient data access patterns
|
||
3. Handle padding and boundaries correctly
|
||
4. Optimize memory layouts for :ref:`GPU access <ck_tile_gpu_basics>`
|
||
|
||
The C++ implementations in Composable Kernel provide:
|
||
|
||
- Zero-overhead abstractions through templates
|
||
- Compile-time composition and optimization
|
||
- Support for complex coordinate transformations
|
||
- Integration with GPU kernel generation
|
||
- Foundation for :ref:`tile windows <ck_tile_tile_window>` and :ref:`load/store traits <ck_tile_load_store_traits>`
|
||
|
||
Next Steps
|
||
----------
|
||
|
||
- :ref:`ck_tile_adaptors` - How to chain transforms together
|
||
- :ref:`ck_tile_descriptors` - Complete tensor descriptions with transforms
|
||
- :ref:`ck_tile_tile_window` - Using transforms for efficient data loading
|
||
- :ref:`ck_tile_thread_mapping` - How transforms enable thread-to-data mapping
|
||
- :ref:`ck_tile_gemm_optimization` - Practical application in GEMM kernels
|