* chore(copyright): update copyright header for codegen directory * chore(copyright): update copyright header for example directory
3D Average Pooling Backward
This example demonstrates the backward pass of 3D average pooling. This operation computes the gradient of the loss with respect to the input of a 3D average pooling layer, which is essential for training 3D convolutional neural networks used in video analysis, medical imaging, and volumetric data processing.
Mathematical Formulation
The backward pass of 3D average pooling distributes the output gradients uniformly across all input positions that contributed to each pooling window.
Given:
- Input tensor
Xwith shape[N, C, D_in, H_in, W_in] - Output gradients
dL/dYwith shape[N, C, D_out, H_out, W_out] - Pooling parameters: window size
(pool_d, pool_h, pool_w), stride(stride_d, stride_h, stride_w), padding(pad_d, pad_h, pad_w)
The backward pass computes input gradients dL/dX with the same shape as X.
For 3D average pooling, the gradient is distributed uniformly across all positions in each pooling window:
\frac{\partial L}{\partial X_{ncdhw}} = \sum_{\text{windows containing } (d,h,w)} \frac{1}{|W|} \cdot \frac{\partial L}{\partial Y_{ncd'h'w'}}
Where |W| is the effective window size (accounting for padding and boundaries), and the sum is over all output positions whose pooling windows include the input position (d,h,w).
Algorithmic Strategy: Parallel Gradient Distribution
The backward pass distributes gradients from output positions to all input positions that contributed to each pooling window.
-
Grid Scheduling: The computation can be parallelized over either input or output tensor elements, depending on the implementation strategy.
-
Gradient Distribution Algorithm (output-centric approach):
- Initialize: Set all input gradients to zero.
- For each output position: Each thread processes one output gradient position
(n, c, d_out, h_out, w_out). - Calculate Input Window: Determine the 3D input window that contributed to this output position.
- Effective Window Size: Calculate the actual number of input elements in the window (accounting for padding and boundaries).
- Distribute Gradient: Add
grad_output / window_sizeto each input position in the window (using atomic operations for thread safety).
-
Boundary Handling: Careful handling of:
- Padding: Input positions outside the valid range should not receive gradients
- Partial Windows: Windows at boundaries may have fewer than
pool_d × pool_h × pool_welements - Edge Cases: Zero-sized windows or invalid configurations
-
Memory Access Optimization:
- Coalesced reading from output gradients
- Efficient atomic operations for gradient accumulation
- Minimized redundant boundary checks
Source Code Organization
avgpool3d_bwd_xdl.cpp: The main example file. It sets up the input tensor, output gradients, pooling parameters, and instantiates theDeviceAvgpool3dBwdoperation.../../include/ck/tensor_operation/gpu/device/device_avgpool3d_bwd.hpp: The high-level device interface for 3D average pooling backward operations.../../include/ck/tensor_operation/gpu/grid/gridwise_avgpool3d_bwd.hpp: The grid-wise kernel implementing the gradient distribution algorithm.
Build and Run
Prerequisites
Ensure the Composable Kernel library is built and installed.
cd /path/to/composable_kernel/build
make -j install
Build the Example
cd /path/to/composable_kernel/example/51_avgpool3d_bwd
mkdir build && cd build
cmake \
-DCMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc \
-DCMAKE_PREFIX_PATH="/opt/rocm;${CK_INSTALL_PATH}" \
..
make -j
Run the Example
# Run the example with default settings
./avgpool3d_bwd_xdl
# Run with verification, data initialization, and timing
./avgpool3d_bwd_xdl 1 2 1
Comparison with Max Pooling Backward
3D average pooling backward differs significantly from max pooling backward:
| Aspect | Max Pooling | Average Pooling |
|---|---|---|
| Gradient Flow | Sparse (only to argmax positions) | Dense (to all window positions) |
| Distribution | Single position per window | Uniform across window |
| Computation | Requires argmax information | Simple arithmetic division |
| Memory Pattern | Irregular write pattern | Regular, predictable pattern |
| Atomic Operations | Needed for gradient routing | Needed for accumulation |
Applications in 3D Deep Learning
3D average pooling backward is essential for training models that process volumetric data:
- Video Understanding: 3D CNNs for action recognition, video classification, and temporal modeling
- Medical Imaging: 3D segmentation and classification of CT scans, MRI, and other volumetric medical data
- 3D Object Recognition: Processing 3D point clouds, voxel grids, and depth data
- Scientific Computing: Climate modeling, fluid dynamics, and other physics simulations
- Augmented Reality: 3D scene understanding and object tracking in real-time applications