.. _ck_tile_transforms:

Individual Transform Operations
===============================

The transformation engine is built from individual transform types that each handle specific coordinate conversions. 

What Are Transforms?
--------------------

Transform operations convert coordinates between different dimensional spaces. Each transform operates between two :ref:`coordinate spaces <ck_tile_coordinate_systems>`:

- **Lower Dimension Space**: The source coordinate system
- **Upper Dimension Space**: The target coordinate system

Transform Direction
~~~~~~~~~~~~~~~~~~~

Transforms work bidirectionally:

- **Forward Transform**: Converts coordinates from the lower dimension to the upper dimension
- **Inverse Transform**: Converts coordinates back from the upper dimension to the lower dimension

Zero-Copy Logical Operations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Critical Understanding**: All transform operations happen in **logical coordinate space** only. This is a zero-copy system and there is **no data copying or movement** involved.

- **Data Storage**: The actual tensor data remains stored in memory in linear fashion, exactly as specified by the original tensor shape and strides at creation time. See :ref:`ck_tile_buffer_views` for more information about raw memory access.
- **Logical Mapping**: Transforms create different logical views of the same underlying data and only change how access coordinates are interpreted. See :ref:`ck_tile_tensor_views` for more information about tensor views.

.. mermaid::

   graph TB
       subgraph "Tensor Coordinate Transformation"
           US["Lower Dimension Space<br/>Source coordinate system"]
           LS["Upper Dimension Space<br/>Target coordinate system"]

           DATA["Linear Data in Memory<br/>Layout determined by tensor<br/>shape & strides"]
       end

       US -->|"Forward Transform"| LS
       LS -->|"Inverse Transform"| US

       DATA -.->|"Same data,<br/>different views"| US
       DATA -.->|"Same data,<br/>different views"| LS

Index Calculation Operations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The transform system provides two operations for coordinate conversion:

- **calculate_lower_index()**: Takes a coordinate from the **upper dimension space** and transforms it to get the corresponding index or coordinate in the **lower dimension space**. This calculates where to find the actual tensor element using the transformed coordinate system.

- **calculate_upper_index()**: Takes a coordinate from the **lower dimension space** and transforms it back to get the corresponding coordinate in the **upper dimension space**. This performs the inverse transformation to recover the original coordinate representation.

These operations enable bidirectional navigation between different coordinate representations of the same underlying tensor data.

Transform System Architecture
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. mermaid::

   graph TB

       subgraph "Transform Types"
           EMB["EmbedTransform<br/>Linear → Multi-D Strided"]
           UNM["MergeTransform<br/>Multi-D → Linear"]
           MRG["UnmergeTransform<br/>Linear → Multi-D"]
           REP["ReplicateTransform<br/>0D → Multi-D Broadcast"]
           OFF["OffsetTransform<br/>Translation"]
           PAS["PassThroughTransform<br/>Identity"]
           PAD["PadTransform<br/>Boundaries"]
       end

       subgraph "Operations"
           FWD["Forward<br/>calculate_lower_index()"]
           BWD["Backward<br/>calculate_upper_index()"]
           UPD["Update<br/>update_lower_index()"]
       end

       EMB --> FWD
       UNM --> FWD
       MRG --> FWD
       REP --> FWD
       OFF --> FWD
       PAS --> FWD
       PAD --> FWD

MergeTransform
--------------

MergeTransform collapses multiple dimensions from the lower coordinate space into a single dimension in the upper coordinate space, effectively reducing the dimensionality of the tensor representation while preserving data relationships. This transform is fundamental to the :ref:`tile distribution system <ck_tile_tile_distribution>`.
   
.. mermaid::

   graph TB
       subgraph "MergeTransform: Multi-D → Linear"
           LS["Lower Coordinate Space<br/>2D: [4, 5]<br/>Coord: (2, 3)"]
           US["Upper Coordinate Space<br/>1D Linear<br/>Index: 13"]

           DATA["Same Tensor Data<br/>Layout: row-major<br/>Size: 20 elements"]
       end

       LS -->|"Forward Transform<br/>2×5 + 3 = 13"| US
       US -->|"Inverse Transform<br/>13÷5=2, 13%5=3"| LS

       DATA -.->|"Multi-dimensional<br/>view"| LS
       DATA -.->|"Linear<br/>view"| US


**C++ Implementation:**

.. code-block:: cpp

   using namespace ck_tile;
   
   // Create MergeTransform for 4x5 tensor (20 elements total)
   auto transform = make_merge_transform(make_tuple(4, 5));
   
   // Forward: Lower (2D) → Upper (1D) - Manual calculation
   int row = 2, col = 3;
   int linear_index = row * 5 + col;  // Result: 13
   printf("2D coord [%d, %d] → Linear index %d\n", row, col, linear_index);
   printf("Calculation: %d×5 + %d = %d\n", row, col, linear_index);
   
   // Inverse: Upper (1D) → Lower (2D) - Using transform
   multi_index<1> upper_coord;
   upper_coord[number<0>{}] = 13;
   
   multi_index<2> lower_coord;
   transform.calculate_lower_index(lower_coord, upper_coord);
   
   printf("Linear index %d → 2D coord [%d, %d]\n", 
          static_cast<int>(upper_coord[number<0>{}]),
          static_cast<int>(lower_coord[number<0>{}]), 
          static_cast<int>(lower_coord[number<1>{}]));
   printf("Calculation: 13 ÷ 5 = %d remainder %d\n",
          static_cast<int>(lower_coord[number<0>{}]),
          static_cast<int>(lower_coord[number<1>{}]));

UnmergeTransform
----------------

UnmergeTransform expands coordinates from a single dimension in the lower coordinate space into multiple dimensions in the upper coordinate space, effectively increasing the dimensionality of the tensor representation while preserving all data relationships.

.. mermaid::

   graph TB
       subgraph "UnmergeTransform: Linear → Multi-D"
           LS["Lower Coordinate Space<br/>1D Linear<br/>Index: 14"]
           US["Upper Coordinate Space<br/>3D: [3, 4, 2]<br/>Coord: (1, 3, 0)"]

           DATA["Same Tensor Data<br/>Layout: row-major<br/>Size: 24 elements"]
       end

       LS -->|"Forward Transform<br/>14 = 1×8 + 3×2 + 0"| US
       US -->|"Inverse Transform<br/>linearize back"| LS

       DATA -.->|"Linear<br/>view"| LS
       DATA -.->|"Multi-dimensional<br/>view"| US

**C++ Implementation:**

.. code-block:: cpp

   using namespace ck_tile;
   
   // Create UnmergeTransform for 3x4x2 tensor (24 elements total)
   auto transform = make_unmerge_transform(make_tuple(3, 4, 2));
   
   // Forward: Lower (1D) → Upper (3D) - Manual calculation
   int linear_index = 14;
   int dim0 = linear_index / (4 * 2);      // 14 / 8 = 1
   int remainder = linear_index % (4 * 2);  // 14 % 8 = 6
   int dim1 = remainder / 2;               // 6 / 2 = 3
   int dim2 = remainder % 2;               // 6 % 2 = 0
   
   printf("Linear index %d → 3D coord [%d, %d, %d]\n", 
          linear_index, dim0, dim1, dim2);
   printf("Calculation: 14 = %d×8 + %d×2 + %d\n", dim0, dim1, dim2);
   
   // Inverse: Upper (3D) → Lower (1D) - Using transform
   multi_index<3> upper_coord;
   upper_coord[number<0>{}] = 1;
   upper_coord[number<1>{}] = 3;
   upper_coord[number<2>{}] = 0;
   
   multi_index<1> lower_coord;
   transform.calculate_lower_index(lower_coord, upper_coord);
   
   printf("3D coord [%d, %d, %d] → Linear index %d\n",
          static_cast<int>(upper_coord[number<0>{}]),
          static_cast<int>(upper_coord[number<1>{}]),
          static_cast<int>(upper_coord[number<2>{}]),
          static_cast<int>(lower_coord[number<0>{}]));
   printf("Calculation: %d×8 + %d×2 + %d = %d\n",
          static_cast<int>(upper_coord[number<0>{}]),
          static_cast<int>(upper_coord[number<1>{}]),
          static_cast<int>(upper_coord[number<2>{}]),
          static_cast<int>(lower_coord[number<0>{}]));

EmbedTransform
--------------

EmbedTransform expands linear indices from the lower coordinate space into multi-dimensional coordinates in the upper coordinate space using configurable strides, enabling flexible strided tensor layouts and sub-tensor views within larger buffers.

   
.. mermaid::

   graph TB
       subgraph "EmbedTransform: Linear → Multi-D Strided"
           LS["Lower Coordinate Space<br/>1D Linear<br/>Index: 14"]
           US["Upper Coordinate Space<br/>2D: [2, 3]<br/>Coord: (1, 2)"]

           DATA["Linear Buffer in Memory"]
       end

       LS -->|"Forward Transform <br/>Strides: [12, 1] <br/>14 ÷ 12 = 1, 14 % 12 = 2"| US
       US -->|"Inverse Transform<br/>1×12 + 2×1 = 14"| LS

       DATA -.->|"Linear<br/>index view"| LS
       DATA -.->|"Multi-dimensional<br/>strided view"| US

**C++ Implementation:**

.. code-block:: cpp

   using namespace ck_tile;
   
   // Create embed transform for 2x3 tensor with strides [12, 1]
   // This is commonly used in descriptors
   auto transform = make_embed_transform(make_tuple(2, 3), make_tuple(12, 1));
   
   // Forward: Linear → 2D (Manual calculation)
   int linear_idx = 14;
   int row = linear_idx / 12;        // 14 / 12 = 1
   int remainder = linear_idx % 12;  // 14 % 12 = 2  
   int col = remainder / 1;          // 2 / 1 = 2
   printf("Linear index %d → 2D coord [%d, %d]\n", linear_idx, row, col);
   
   // Inverse: 2D → Linear (Using transform)
   multi_index<2> upper_coord;
   upper_coord[number<0>{}] = 1;
   upper_coord[number<1>{}] = 2;
   
   multi_index<1> lower_coord;
   transform.calculate_lower_index(lower_coord, upper_coord);
   printf("2D coord [%d, %d] → Linear index %d\n",
          static_cast<int>(upper_coord[number<0>{}]),
          static_cast<int>(upper_coord[number<1>{}]),
          static_cast<int>(lower_coord[number<0>{}]));

ReplicateTransform
------------------

ReplicateTransform creates a higher-dimensional tensor by replicating (broadcasting) a lower-dimensional tensor. It's essentially a broadcasting operation that takes a tensor with fewer dimensions and logically replicates it across new dimensions without data duplication. An example is taking a scalar (0-dimensional) input and broadcasting it across multiple dimensions, enabling efficient broadcasting patterns where a single value appears at every position in a multi-dimensional coordinate space.

.. mermaid::

   graph TB
       subgraph "ReplicateTransform: 0D → Multi-D Broadcasting"
           LS["Lower Coordinate Space<br/>0D: Scalar<br/>Empty coordinate []"]
           US["Upper Coordinate Space<br/>2D: [3, 4]<br/>All coords: (i, j)"]

           DATA["Single Scalar Value"]
       end

       LS -->|"Forward Transform<br/>[] → (i,j) for any i,j"| US
       US -->|"Inverse Transform<br/>(i,j) → [] for any i,j"| LS

       DATA -.->|"One scalar<br/>value"| LS
       DATA -.->|"Broadcasted view<br/>at all positions"| US
   
**C++ Implementation:**

.. code-block:: cpp

   using namespace ck_tile;
   
   // Create replicate transform for 3x4 broadcasting
   auto transform = make_replicate_transform(make_tuple(3, 4));
   
   // Inverse: Upper (2D) → Lower (0D) - Using transform
   // Any 2D coordinate maps to empty scalar coordinate
   multi_index<2> upper_coord;
   upper_coord[number<0>{}] = 1;
   upper_coord[number<1>{}] = 2;
   
   multi_index<0> lower_coord;  // Empty coordinate (0 dimensions)
   transform.calculate_lower_index(lower_coord, upper_coord);
   printf("2D [%d, %d] → Empty scalar [] (always empty)\n",
          static_cast<int>(upper_coord[number<0>{}]),
          static_cast<int>(upper_coord[number<1>{}]));
   
   // Forward: Scalar → 2D (Conceptual - no coordinate calculation needed)
   // Broadcasting: Single scalar value appears at ALL positions
   printf("Broadcasting: Scalar value appears at every [i,j] where 0≤i<3, 0≤j<4\n");
   printf("Total positions: 3×4 = 12 positions, all contain same scalar value\n");
   
   // Test multiple coordinates - all map to empty scalar
   int test_coords[][2] = {{0, 0}, {1, 2}, {2, 3}};
   for(int i = 0; i < 3; i++)
   {
       multi_index<2> test_upper;
       test_upper[number<0>{}] = test_coords[i][0];
       test_upper[number<1>{}] = test_coords[i][1];
       
       multi_index<0> test_lower;
       transform.calculate_lower_index(test_lower, test_upper);
       printf("2D [%d, %d] → Empty scalar []\n", 
              test_coords[i][0], test_coords[i][1]);
   }

OffsetTransform
---------------

OffsetTransform shifts coordinates by a fixed offset, creating a translated view of the coordinate space. It performs translation operations where each coordinate in the upper space is mapped to a coordinate in the lower space by adding a constant offset.


.. mermaid::

   graph TB
       subgraph "OffsetTransform: 1D → 1D Translation"
           LS["Lower Coordinate Space<br/>1D: [0, 63]<br/>Coord: index + offset"]
           US["Upper Coordinate Space<br/>1D: [0, 47]<br/>Coord: index"]

           DATA["Linear Buffer in Memory"]
       end

       LS -->|"Forward Transform<br/>idx → idx + 16"| US
       US -->|"Inverse Transform<br/>idx + 16 → idx"| LS

       DATA -.->|"Lower<br/>view"| LS
       DATA -.->|"Upper<br/>view"| US

**C++ Implementation:**

.. code-block:: cpp

   using namespace ck_tile;
   
   // Create offset transform for coordinate translation
   // CK Tile formula: lower = upper + offset
   auto transform = make_offset_transform(48, 16);
   
   // Using Transform: Original → Translated (adds offset)
   multi_index<1> upper_coord;
   upper_coord[number<0>{}] = 5;  // Original index 5
   
   multi_index<1> lower_coord;
   transform.calculate_lower_index(lower_coord, upper_coord);
   printf("Original index %d → Translated index %d\n",
          static_cast<int>(upper_coord[number<0>{}]),
          static_cast<int>(lower_coord[number<0>{}]));
   printf("Calculation: %d + 16 = %d\n",
          static_cast<int>(upper_coord[number<0>{}]),
          static_cast<int>(lower_coord[number<0>{}]));
   
   // Manual Reverse: Translated → Original (subtract offset)
   int translated_idx = 21;
   int original_idx = translated_idx - 16;
   printf("Translated index %d → Original index %d\n", translated_idx, original_idx);
   
   // Test multiple coordinates
   int test_indices[] = {0, 10, 20, 47};
   for(int i = 0; i < 4; i++)
   {
       multi_index<1> test_upper;
       test_upper[number<0>{}] = test_indices[i];
       
       multi_index<1> test_lower;
       transform.calculate_lower_index(test_lower, test_upper);
       printf("Original %d → Translated %d\n", 
              test_indices[i], static_cast<int>(test_lower[number<0>{}]));
   }

PassThroughTransform - Identity
-------------------------------

No-op transform that passes coordinates unchanged. The PassThrough transform is the simplest coordinate transformation in CK Tile, implementing a perfect identity mapping where input coordinates are passed through unchanged to the output. This transform is essential as a placeholder in transformation chains and for dimensions that require no modification.

.. mermaid::

   graph TB
       subgraph "PassThroughTransform: 1D → 1D Identity"
           LS["Lower Coordinate Space<br/>1D: [0, 59]<br/>Coord: index"]
           US["Upper Coordinate Space<br/>1D: [0, 59]<br/>Coord: index"]

           DATA["Linear Buffer in Memory"]
       end

       LS -.->|"Perfect Identity<br/>idx → idx"| US
       US -.->|"Perfect Identity<br/>idx → idx"| LS

       DATA -->|"Same buffer<br/>same view"| LS
       DATA -->|"Same buffer<br/>same view"| US
   
**C++ Implementation:**

.. code-block:: cpp

   using namespace ck_tile;
   
   // Identity transform - no change
   int length = 60;
   
   auto transform = make_pass_through_transform(length);
   
   printf("Length: %d\n", length);
   
   // Forward: Upper → Lower (identity)
   multi_index<1> upper_coord;
   upper_coord[number<0>{}] = 25;
   
   multi_index<1> lower_coord;
   transform.calculate_lower_index(lower_coord, upper_coord);
   
   printf("\nForward: [%d] → [%d] (unchanged)\n",
          static_cast<int>(upper_coord[number<0>{}]),
          static_cast<int>(lower_coord[number<0>{}]));
   
   // Reverse: Lower → Upper (identity)  
   multi_index<1> lower_input;
   lower_input[number<0>{}] = 42;
   
   multi_index<1> upper_result;
   // Note: PassThrough is bidirectional identity, so we can use same function
   transform.calculate_lower_index(upper_result, lower_input);
   
   printf("Reverse: [%d] → [%d] (unchanged)\n",
          static_cast<int>(lower_input[number<0>{}]),
          static_cast<int>(upper_result[number<0>{}]));

PadTransform
------------

PadTransform adds padding to tensor dimensions, mapping coordinates from upper dimension space (with padding) to lower dimension space (original data).

.. mermaid::

   graph TB
       subgraph "PadTransform: 1D → 1D with Padding"
           LS["Lower Coordinate Space<br/>1D: [0, 2] (original data)"]
           US["Upper Coordinate Space<br/>1D: [0, 4] (with padding)"]

           DATA["Tensor Data in Memory"]
       end

       LS -->|"Forward Transform<br/>idx + left_pad"| US
       US -->|"Inverse Transform<br/>idx - left_pad"| LS

       DATA -.->|"Original view"| LS
       DATA -.->|"Padded view"| US

**C++ Implementation:**

.. code-block:: cpp

   using namespace ck_tile;
   
   // PadTransform for coordinate padding
   int low_length = 3;   // Original dimension length
   int left_pad = 1;     // Padding on left
   int right_pad = 1;    // Padding on right
   
   auto transform = make_pad_transform(low_length, left_pad, right_pad);
   
   printf("Low length: %d\n", low_length);
   printf("Left pad: %d\n", left_pad);
   printf("Right pad: %d\n", right_pad);
   printf("Upper length: %d (total with padding)\n", low_length + left_pad + right_pad);
   
   // Test coordinate mapping
   int test_coords[] = {0, 1, 2, 3, 4};
   for(int i = 0; i < 5; i++)
   {
       multi_index<1> upper;
       upper[number<0>{}] = test_coords[i];
       
       multi_index<1> lower;
       transform.calculate_lower_index(lower, upper);
       
       int adjusted_idx = static_cast<int>(lower[number<0>{}]);
       bool is_valid = (adjusted_idx >= 0 && adjusted_idx < low_length);
       
       printf("Upper %d → Lower %d (%s)\n", 
              test_coords[i], adjusted_idx, 
              is_valid ? "valid" : "padding");
   }

Additional Transform Types
--------------------------

XorTransform
~~~~~~~~~~~~

XorTransform applies a 2D XOR mapping for specialized memory access patterns. It performs XOR operations on coordinates to create transformed memory layouts for specific algorithmic optimizations, particularly useful for avoiding :ref:`LDS bank conflicts <ck_tile_lds_bank_conflicts>`.

.. mermaid::

   graph TB
       subgraph "XorTransform: 2D → 2D XOR Mapping"
           LS["Lower Coordinate Space<br/>2D: [4, 8]<br/>XOR-transformed coords"]
           US["Upper Coordinate Space<br/>2D: [4, 8]<br/>Normal coords"]

           DATA["Same Tensor Data"]
       end

       LS -->|"Forward Transform<br/>apply XOR reverse"| US
       US -->|"Inverse Transform<br/>apply XOR mapping"| LS

       DATA -.->|"XOR pattern<br/>view"| LS
       DATA -.->|"Normal<br/>view"| US

SliceTransform
~~~~~~~~~~~~~~

SliceTransform extracts a sub-region from a tensor dimension.

.. mermaid::

   graph TB
       subgraph "SliceTransform: 1D → 1D Sub-region"
           LS["Lower Coordinate Space<br/>1D: [0, 9] (original range)"]
           US["Upper Coordinate Space<br/>1D: [0, 4] (slice range)"]

           DATA["Tensor Data in Memory"]
       end

       LS -->|"Forward Transform<br/>idx + slice_begin"| US
       US -->|"Inverse Transform<br/>idx - slice_begin"| LS

       DATA -.->|"Full tensor<br/>view"| LS
       DATA -.->|"Sub-region<br/>view"| US
   
ModuloTransform
~~~~~~~~~~~~~~~

ModuloTransform applies cyclic wrapping to coordinates using modulo operations.

.. mermaid::

   graph TB
       subgraph "ModuloTransform: 1D → 1D Cyclic"
           LS["Lower Coordinate Space<br/>1D: [0, 3] (modulus range)"]
           US["Upper Coordinate Space<br/>1D: [0, 15] (full range)"]

           DATA["Tensor Data in Memory"]
       end

       LS -->|"Forward Transform<br/>idx * cycle_count"| US
       US -->|"Inverse Transform<br/>idx % modulus"| LS

       DATA -.->|" "| LS
       DATA -.->|" "| US

Summary
-------

Individual transforms provide:

- **Modularity**: Each transform does one thing 
- **Composability**: Chain transforms for complex mappings (see :ref:`ck_tile_adaptors`)
- **Efficiency**: Compile-time optimization in C++
- **Flexibility**: Handle any coordinate conversion need

These transforms enable you to:

1. Create custom tensor views
2. Implement efficient data access patterns
3. Handle padding and boundaries correctly
4. Optimize memory layouts for :ref:`GPU access <ck_tile_gpu_basics>`

The C++ implementations in Composable Kernel provide:

- Zero-overhead abstractions through templates
- Compile-time composition and optimization
- Support for complex coordinate transformations
- Integration with GPU kernel generation
- Foundation for :ref:`tile windows <ck_tile_tile_window>` and :ref:`load/store traits <ck_tile_load_store_traits>`

Next Steps
----------

- :ref:`ck_tile_adaptors` - How to chain transforms together
- :ref:`ck_tile_descriptors` - Complete tensor descriptions with transforms
- :ref:`ck_tile_tile_window` - Using transforms for efficient data loading
- :ref:`ck_tile_thread_mapping` - How transforms enable thread-to-data mapping
- :ref:`ck_tile_gemm_optimization` - Practical application in GEMM kernels