# N-Dimensional Convolution Forward with Bias ## Theory This example demonstrates **N-dimensional convolution forward** with bias addition. This is a common pattern in convolutional neural networks (CNNs), where a bias term is added to each output channel after the convolution operation. **Mathematical Formulation:** $$ Y[n, c_{out}, o_1, ..., o_n] = \sum_{c_{in}} \sum_{k_1} ... \sum_{k_n} X[n, c_{in}, o_1 + k_1, ..., o_n + k_n] \cdot W[c_{out}, c_{in}, k_1, ..., k_n] + B[c_{out}] $$ - $X$: [N, C_in, D1, D2, ..., Dn] input tensor - $W$: [C_out, C_in, K1, K2, ..., Kn] weight tensor - $B$: [C_out] bias tensor - $Y$: [N, C_out, O1, O2, ..., On] output tensor **Algorithmic Background:** - Composable Kernel implements convolution as an implicit GEMM, with bias addition fused in the epilogue for efficiency. ## How to Run ### Prerequisites Please follow the instructions in the main [Build Guide](../../README.md#building-ck) section as a prerequisite to building and running this example. ### Build and run ```bash cd composable_kernel/example/11_convnd_fwd_bias mkdir build && cd build cmake -DCMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc .. make -j # Example run ./convnd_fwd_bias_xdl --verify=1 --time=1 ``` ## Source Code Structure ### Directory Layout ``` example/11_convnd_fwd_bias/ ├── convnd_fwd_bias_xdl.cpp # Main example: sets up, runs, and verifies N-D convolution with bias include/ck/tensor_operation/gpu/device/ │ └── device_convnd_fwd_bias.hpp # Device-level convolution API with bias include/ck/tensor_operation/gpu/device/impl/ │ └── device_convnd_fwd_bias_impl.hpp # Implementation include/ck/tensor_operation/gpu/grid/ └── gridwise_convnd_fwd_bias.hpp # Grid-level kernel ``` ### Key Classes and Functions - **DeviceConvNdFwdBias** (in `device_convnd_fwd_bias.hpp`): Device API for N-dimensional convolution with bias. - **gridwise_convnd_fwd_bias** (in `gridwise_convnd_fwd_bias.hpp`): Implements the tiled/blocking convolution kernel with bias epilogue. This example demonstrates how Composable Kernel fuses bias addition into the convolution forward pass for efficient CNN layer implementation.