mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-03-20 23:27:39 +00:00
* chore(copyright) update library wide CMakeLists.txt files copyright header template * Fix build --------- Co-authored-by: Sami Remes <samremes@amd.com>
Client Example: GEMM with LayerNorm Fusion
Theory
This client example demonstrates GEMM fused with layer normalization and additional elementwise operations. This pattern is common in transformer feed-forward networks and other architectures where a linear transformation is followed by normalization and activation.
Mathematical Formulation:
- GEMM:
Y = A \times B - Additions:
Z = Y + D_0 + D_1(bias, residual, etc.) - Activation:
A = \text{ReLU}(Z)(or other activation) - LayerNorm:
\text{LayerNorm}(A) = \gamma \cdot \frac{A - \mu}{\sqrt{\sigma^2 + \epsilon}} + \beta
\mu, \sigma^2 are mean and variance over the normalization axis; \gamma, \beta are learnable scale and shift.
Algorithmic Background:
- The GEMM result is kept in registers, elementwise ops and layer normalization are fused in the epilogue.
- LayerNorm is typically applied over the last dimension (features).
- This fusion reduces memory traffic and is common in transformer MLP blocks.
How to Run
Prerequisites
Please follow the instructions in the main Build Guide section as a prerequisite to building and running this example.
Build and run
cd composable_kernel/client_example/03_gemm_layernorm
mkdir build && cd build
cmake -DCMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc ..
make -j
# Example run (naive)
./gemm_add_add_layernorm_naive
# Example run (with ReLU and Welford)
./gemm_add_relu_add_layernorm_welford
Source Code Structure
Directory Layout
client_example/03_gemm_layernorm/
├── gemm_add_add_layernorm_naive.cpp # GEMM + Add + Add + LayerNorm (naive)
├── gemm_add_relu_add_layernorm_welford.cpp # GEMM + Add + ReLU + Add + LayerNorm (Welford)
├── CMakeLists.txt # Build configuration for the example
Key Functions
- main() (in each
.cpp):
Sets up input matrices, configures GEMM and epilogue parameters, launches the fused kernel, and verifies the result. - LayerNorm implementation:
Demonstrates both naive and numerically stable (Welford) algorithms for mean/variance.
This client example provides variants to demonstrate different levels of fusion and normalization for transformer-style MLP layers.