Files
composable_kernel/docs/conceptual/ck_tile/adaptors.rst
spolifroni-amd 0cc83cb8e8 CK: removed the api reference (#3571)
* removed the api reference

* updating to the latest rocm-docs-core min version

* fixed a formatting issue with buffer views

* removed reference links from code snippets

* removed reference links from code snippets

---------

Co-authored-by: John Afaganis <john.afaganis@amd.com>
2026-01-27 07:36:47 -08:00

391 lines
14 KiB
ReStructuredText

.. _ck_tile_adaptors:
Tensor Adaptors - Chaining Transformations
==========================================
Overview
--------
While individual :ref:`transforms <ck_tile_transforms>` are effective, TensorAdaptors enable the chaining of multiple transforms together to create complex coordinate transformations. Adaptors can be thought of as transformation pipelines that can reshape, reorder, and restructure tensors in advanced ways.
TensorAdaptors serve as the bridge between individual transforms and the high-level tensor operations used in applications. They provide a composable abstraction that allows developers to build complex data access patterns from simple building blocks.
TensorAdaptor Basics
--------------------
A TensorAdaptor encapsulates a sequence of :ref:`coordinate transformations <ck_tile_coordinate_systems>`, managing the flow of coordinates through multiple transform stages:
..
Original mermaid diagram (edit here, then run update_diagrams.py)
..
Original mermaid diagram (edit here, then run update_diagrams.py)
.. mermaid::
graph LR
subgraph "Adaptor Composition"
subgraph "Single Transform"
direction TB
I1["Input Coords<br/>[0,1,2]"]
T1["Transform<br/>(e.g., Transpose)"]
O1["Output Coords<br/>[2,0,1]"]
I1 --> T1 --> O1
end
subgraph "Chained Transforms"
direction TB
I2["Input<br/>2D"]
T2A["Transform A<br/>(e.g., Merge)"]
M2["Intermediate<br/>1D"]
T2B["Transform B<br/>(e.g., Pad)"]
O2["Output<br/>1D Padded"]
I2 --> T2A --> M2 --> T2B --> O2
end
end
style T1 fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
style T2A fill:#fff3e0,stroke:#f57c00,stroke-width:2px
style T2B fill:#fff3e0,stroke:#f57c00,stroke-width:2px
.. image:: diagrams/adaptors_1.svg
:alt: Diagram
:align: center
.. image:: diagrams/adaptors_1.svg
:alt: Diagram
:align: center
Core Components
~~~~~~~~~~~~~~~
Each TensorAdaptor contains:
- **transforms**: List of individual :ref:`transforms <ck_tile_transforms>` to apply
- **lower_dimension_hidden_idss**: Mappings between transform stages
- **upper_dimension_hidden_idss**: Hidden dimension mappings for internal stages
- **bottom_dimension_hidden_ids**: Input dimension identifiers
- **top_dimension_hidden_ids**: Output dimension identifiers
The most important method of a TensorAdaptor is ``calculate_bottom_index``, which calculates the lower index from the upper index by applying transforms in reverse order.
Transpose Adaptor: Dimension Reordering
---------------------------------------
The transpose adaptor reorders tensor dimensions according to a permutation pattern. This operation forms the basis for many tensor manipulations in GPU kernels.
.. code-block:: cpp
// Create transpose adaptor: [0, 1, 2] → [2, 0, 1]
auto transpose_adaptor = make_identity_tensor_adaptor<3>(); // Start with identity
// Apply transpose using transform_tensor_adaptor
auto transposed_desc = transform_tensor_descriptor(
original_desc,
make_tuple(make_pass_through_transform(original_desc.get_length(2)),
make_pass_through_transform(original_desc.get_length(0)),
make_pass_through_transform(original_desc.get_length(1))),
make_tuple(sequence<2>{}, sequence<0>{}, sequence<1>{}), // old dims
make_tuple(sequence<0>{}, sequence<1>{}, sequence<2>{}) // new dims
);
// Alternative: Direct coordinate transformation
multi_index<3> top_coord{0, 1, 2};
// After transpose [2, 0, 1]: coord becomes [2, 0, 1]
Single-Stage Adaptors: Custom Transform Chains
----------------------------------------------
Custom adaptors can be created by specifying which transforms to use and how they connect. This provides fine-grained control over the transformation pipeline:
.. code-block:: cpp
// Create a descriptor that merges 2x3 dimensions into single dimension
auto base_desc = make_naive_tensor_descriptor_packed(make_tuple(2, 3));
// Apply merge transform
auto merged_desc = transform_tensor_descriptor(
base_desc,
make_tuple(make_merge_transform(make_tuple(2, 3))),
make_tuple(sequence<0, 1>{}), // merge dims 0,1
make_tuple(sequence<0>{}) // to single dim 0
);
// The adaptor is embedded in the descriptor
// To use it:
multi_index<1> top_coord{5}; // 1D coordinate
// This internally calculates: row = 5/3 = 1, col = 5%3 = 2
Chaining Adaptors: Building Complex Transformations
---------------------------------------------------
The real power of adaptors comes from chaining multiple transformations together to create advanced data access patterns:
..
Original mermaid diagram (edit here, then run update_diagrams.py)
..
Original mermaid diagram (edit here, then run update_diagrams.py)
.. mermaid::
graph LR
subgraph "Adaptor Chaining Flow"
subgraph "Adaptor 1"
A1I["Bottom Dims<br/>[0,1]"]
A1T["Transform:<br/>Merge[2,3]"]
A1O["Top Dims<br/>[0]"]
end
subgraph "Adaptor 2"
A2I["Bottom Dims<br/>[0]"]
A2T["Transform:<br/>Unmerge[2,3]"]
A2O["Top Dims<br/>[0,1]"]
end
subgraph "Chained Result"
CI["Input 2D<br/>Bottom[0,1]"]
CO["Output 2D<br/>Top[0,1]"]
end
end
A1I --> A1T
A1T --> A1O
A1O --> A2I
A2I --> A2T
A2T --> A2O
CI --> A1I
A2O --> CO
style A1T fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
style A2T fill:#fff3e0,stroke:#f57c00,stroke-width:2px
style CI fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
style CO fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
.. image:: diagrams/adaptors_2.svg
:alt: Diagram
:align: center
.. image:: diagrams/adaptors_2.svg
:alt: Diagram
:align: center
.. code-block:: cpp
// Start with a 2D descriptor
auto desc1 = make_naive_tensor_descriptor_packed(make_tuple(2, 3));
// First transformation: merge 2D to 1D
auto merged_desc = transform_tensor_descriptor(
desc1,
make_tuple(make_merge_transform(make_tuple(2, 3))),
make_tuple(sequence<0, 1>{}), // merge dims 0,1
make_tuple(sequence<0>{}) // to dim 0
);
// Second transformation: unmerge 1D back to 2D
auto final_desc = transform_tensor_descriptor(
merged_desc,
make_tuple(make_unmerge_transform(make_tuple(2, 3))),
make_tuple(sequence<0>{}), // from dim 0
make_tuple(sequence<0, 1>{}) // to dims 0,1
);
// The chained transformation is embedded in final_desc
// Result should be identity transformation
Transform Addition: Extending Existing Adaptors
-----------------------------------------------
Existing adaptors can be extended with new transforms using ``transform_tensor_adaptor``. This pattern is useful for adding padding or other modifications to existing transformation pipelines:
.. code-block:: cpp
// Start with transposed descriptor
auto base_desc = make_naive_tensor_descriptor(
make_tuple(3, 4),
make_tuple(1, 3) // transposed strides
);
// Add padding to both dimensions
auto padded_desc = transform_tensor_descriptor(
base_desc,
make_tuple(make_pad_transform(3, 1, 1), // pad dim 0: 3 → 5
make_pad_transform(4, 0, 0)), // keep dim 1: 4 → 4
make_tuple(sequence<0>{}, sequence<1>{}), // input dims
make_tuple(sequence<0>{}, sequence<1>{}) // output dims (keep 2D)
);
// Access pattern
multi_index<2> padded_coord{1, 2}; // In padded space
// Internally calculates: unpadded = [1-1, 2] = [0, 2]
// Then applies transpose strides
Advanced Patterns
-----------------
Complex Nested Transforms
~~~~~~~~~~~~~~~~~~~~~~~~~
CK Tile supports complex nested transform patterns that enable advanced data layouts:
.. code-block:: cpp
// Example: 4D tensor with complex transformations
// Shape: [A, B, C, D] with various transforms
// 1. Create base descriptor
auto base_desc = make_naive_tensor_descriptor_packed(
make_tuple(A, B, C, D)
);
// 2. Apply multiple transformations
// First: merge first 3 dimensions
auto step1_desc = transform_tensor_descriptor(
base_desc,
make_tuple(make_merge_transform(make_tuple(A, B, C)),
make_pass_through_transform(D)),
make_tuple(sequence<0, 1, 2>{}, sequence<3>{}), // input mapping
make_tuple(sequence<0>{}, sequence<1>{}) // output: 2D
);
// 3. Then unmerge back but with different grouping
auto step2_desc = transform_tensor_descriptor(
step1_desc,
make_tuple(make_unmerge_transform(make_tuple(A*B, C)),
make_pass_through_transform(D)),
make_tuple(sequence<0>{}, sequence<1>{}), // from 2D
make_tuple(sequence<0, 1>{}, sequence<2>{}) // to 3D
);
// The adaptor chain is embedded in the descriptors
// CK optimizes these at compile time
GPU Memory Layout Example
~~~~~~~~~~~~~~~~~~~~~~~~~
A practical example showing how adaptors create efficient :ref:`GPU memory access patterns <ck_tile_gpu_basics>`:
.. code-block:: cpp
// Create descriptor for thread block tile: 64x64
// With 8x8 vector loads per thread
constexpr auto BlockM = 64;
constexpr auto BlockN = 64;
constexpr auto VectorM = 8;
constexpr auto VectorN = 8;
// Thread arrangement: 8x8 threads
constexpr auto ThreadM = BlockM / VectorM; // 8
constexpr auto ThreadN = BlockN / VectorN; // 8
// Create block descriptor with proper layout
auto block_desc = transform_tensor_descriptor(
make_naive_tensor_descriptor_packed(
make_tuple(number<BlockM>{}, number<BlockN>{})
),
make_tuple(
make_unmerge_transform(make_tuple(
number<ThreadM>{}, number<VectorM>{}
)),
make_unmerge_transform(make_tuple(
number<ThreadN>{}, number<VectorN>{}
))
),
make_tuple(sequence<0>{}, sequence<1>{}), // from 2D
make_tuple(sequence<0, 2>{}, sequence<1, 3>{}) // to 4D: [TM,TN,VM,VN]
);
// This creates the layout:
// - Dimension 0,1: Thread indices
// - Dimension 2,3: Vector indices within thread
// Enables coalesced memory access on GPU
Common Transform Chains
-----------------------
CK Tile provides several common transform chain patterns used throughout GPU kernels:
**Padding for Convolution**
.. code-block:: cpp
auto padded = transform_tensor_descriptor(
input,
make_tuple(make_pad_transform(H, pad_h, pad_h),
make_pad_transform(W, pad_w, pad_w)),
make_tuple(sequence<0>{}, sequence<1>{}),
make_tuple(sequence<0>{}, sequence<1>{})
);
**Dimension Merging for GEMM**
.. code-block:: cpp
auto merged = transform_tensor_descriptor(
input,
make_tuple(make_merge_transform(make_tuple(M, K))),
make_tuple(sequence<0, 1>{}),
make_tuple(sequence<0>{})
);
For complete GEMM optimization strategies, see :ref:`ck_tile_gemm_optimization`.
**Broadcasting for Elementwise Operations**
.. code-block:: cpp
auto broadcast = transform_tensor_descriptor(
scalar,
make_tuple(make_replicate_transform(make_tuple(M, N))),
make_tuple(sequence<>{}),
make_tuple(sequence<0, 1>{})
);
Key Concepts Summary
--------------------
TensorAdaptors are the coordination layer that makes complex tensor operations possible:
- **Identity Adaptor**: Starting point for building transformations
- **Transpose Adaptor**: Dimension reordering with permutation patterns
- **Single-Stage Adaptors**: Custom transform chains with precise control
- **Chained Adaptors**: Complex multi-stage transformation pipelines
- **Transform Addition**: Extending existing adaptors with new transforms
Core concepts to remember:
- **Bottom/Top Dimensions**: Input and output coordinate spaces
- **Hidden Dimensions**: Internal coordinate mappings between transforms
- **Transform Chains**: Sequential application of multiple transforms
- **Coordinate Transformation**: Bidirectional mapping between coordinate spaces
- **Nested Transforms**: Complex multi-level transformation hierarchies
Key C++ Patterns in Composable Kernel
--------------------------------------
1. **Descriptor-Based Adaptors**: In CK, adaptors are typically embedded within :ref:`tensor descriptors <ck_tile_descriptors>` rather than created separately
2. **Compile-Time Optimization**: All transformations are resolved at compile time for zero overhead
3. **Type Safety**: Template metaprogramming ensures coordinate transformations are type-safe
4. **GPU Optimization**: Transform chains are designed for efficient GPU memory access patterns. See :ref:`ck_tile_lds_bank_conflicts` for LDS optimization.
TensorAdaptors bridge the gap between low-level transforms and high-level tensor operations, providing the flexibility to create advanced data layouts and access patterns that are essential for efficient GPU computing. They build upon the foundation of :ref:`BufferViews <ck_tile_buffer_views>` and :ref:`TensorViews <ck_tile_tensor_views>` to provide complex transformation capabilities.
Next Steps
----------
- :ref:`ck_tile_descriptors` - How adaptors combine with element space to form complete tensor descriptors
- :ref:`ck_tile_transforms` - Individual transform types and their properties
- :ref:`ck_tile_tile_window` - How adaptors enable efficient data loading patterns
- :ref:`ck_tile_space_filling_curve` - Advanced coordinate mapping techniques for cache optimization
- :ref:`ck_tile_static_distributed_tensor` - How adaptors help manage distributed tensor storage