Files
composable_kernel/example/39_permute/README.md
2025-10-16 10:13:27 +00:00

57 lines
2.2 KiB
Markdown

# Tensor Permutation (Dimension Reordering)
## Theory
This example demonstrates **tensor permutation operations**, which reorder the dimensions of tensors according to a specified permutation pattern. Permutation is fundamental for many machine learning operations, including tensor layout transformations, data format conversions, and implementing complex tensor operations.
**Mathematical Formulation:**
Given an input tensor $X$ with shape $[D_0, D_1, ..., D_{n-1}]$ and a permutation pattern $P = [p_0, p_1, ..., p_{n-1}]$, the permutation operation produces an output tensor $Y$ with shape $[D_{p_0}, D_{p_1}, ..., D_{p_{n-1}}]$ such that:
$$
Y_{i_{p_0}, i_{p_1}, ..., i_{p_{n-1}}} = X_{i_0, i_1, ..., i_{n-1}}
$$
**Algorithmic Background:**
- Permutation is used for matrix transpose, NCHW/NHWC layout conversion, attention head reshaping, and more.
- Efficient permutation requires optimizing memory access patterns for coalescing and bandwidth.
## How to Run
### Prerequisites
Please follow the instructions in the main [Build Guide](../../README.md#building-ck) section as a prerequisite to building and running this example.
### Build and run
```bash
cd composable_kernel/example/39_permute
mkdir build && cd build
cmake -DCMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc ..
make -j
# Example run (matrix transpose)
./permute_xdl --input_shape=4096,4096 --permutation=1,0 --verify=1 --time=1
# Example run (NCHW to NHWC)
./permute_xdl --input_shape=32,256,56,56 --permutation=0,2,3,1 --verify=1 --time=1
```
## Source Code Structure
### Directory Layout
```
example/39_permute/
├── permute_xdl.cpp # Main example: sets up, runs, and verifies tensor permutation
include/ck/tensor_operation/gpu/device/
│ └── device_permute.hpp # Device-level permutation API
include/ck/tensor_operation/gpu/grid/
│ └── gridwise_permute.hpp # Grid-level permutation kernel
```
### Key Classes and Functions
- **DevicePermute** (in `device_permute.hpp`):
Device API for tensor permutation.
- **gridwise_permute** (in `gridwise_permute.hpp`):
Implements the tiled/blocking permutation kernel.
This example demonstrates how Composable Kernel implements efficient tensor dimension reordering for layout transformations and deep learning operations.