mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-21 13:29:20 +00:00
Merge commit 'e980d4351c43396398a5171e943771624a5a51eb' into develop
This commit is contained in:
64
client_example/23_elementwise_transpose/README.md
Normal file
64
client_example/23_elementwise_transpose/README.md
Normal file
@@ -0,0 +1,64 @@
|
||||
# Client Example: Elementwise Operation with 3D Transpose
|
||||
|
||||
## Theory
|
||||
|
||||
This client example demonstrates **elementwise operations fused with 3D tensor transpose**. This pattern is used in deep learning for applying activation functions or scaling while simultaneously reordering tensor dimensions (e.g., for layout conversion or attention head reshaping).
|
||||
|
||||
**Mathematical Formulation:**
|
||||
- Elementwise: $Z = f(X)$ or $Z = f(X, Y)$
|
||||
- Transpose: $Y_{i_0, i_1, i_2} = Z_{i_{\pi(0)}, i_{\pi(1)}, i_{\pi(2)}}$
|
||||
- $\pi$ is a permutation of the axes.
|
||||
|
||||
**Algorithmic Background:**
|
||||
- The elementwise operation and transpose are fused in a single kernel.
|
||||
- Intermediate results are kept in registers, not written to global memory.
|
||||
- Used for layout conversion with activation, attention head reshaping, and more.
|
||||
|
||||
## How to Run
|
||||
|
||||
### Prerequisites
|
||||
|
||||
Please follow the instructions in the main [Build Guide](../../README.md#building-ck) section as a prerequisite to building and running this example.
|
||||
|
||||
### Build and run
|
||||
```bash
|
||||
cd composable_kernel/client_example/23_elementwise_transpose
|
||||
mkdir build && cd build
|
||||
cmake -DCMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc ..
|
||||
make -j
|
||||
|
||||
# Example run (elementwise + 3D transpose)
|
||||
./elementwise_transpose_3d
|
||||
```
|
||||
|
||||
## Source Code Structure
|
||||
|
||||
### Directory Layout
|
||||
```
|
||||
client_example/23_elementwise_transpose/
|
||||
├── elementwise_transpose_3d.cpp # Main client example: elementwise + 3D transpose
|
||||
├── CMakeLists.txt # Build configuration for the example
|
||||
```
|
||||
|
||||
### Key Functions
|
||||
|
||||
- **main()** (in `elementwise_transpose_3d.cpp`):
|
||||
Sets up input tensors, configures elementwise and transpose parameters, launches the fused kernel, and verifies the result.
|
||||
- **Fused kernel invocation**:
|
||||
Uses the Composable Kernel device API to launch the elementwise+transpose operation.
|
||||
|
||||
---
|
||||
|
||||
## Additional Details
|
||||
|
||||
- Supports fusion of elementwise operations with 3D transpose.
|
||||
- Example parameters can be adjusted in the source for different workloads.
|
||||
|
||||
---
|
||||
|
||||
## Related Examples
|
||||
|
||||
- [44_elementwise_permute](../../example/44_elementwise_permute/README.md): Elementwise operation with permutation in the main example directory
|
||||
|
||||
---
|
||||
[Back to Client Examples](../README.md)
|
||||
Reference in New Issue
Block a user