Files

3D Average Pooling Backward

This example demonstrates the backward pass of 3D average pooling. This operation computes the gradient of the loss with respect to the input of a 3D average pooling layer, which is essential for training 3D convolutional neural networks used in video analysis, medical imaging, and volumetric data processing.

Mathematical Formulation

The backward pass of 3D average pooling distributes the output gradients uniformly across all input positions that contributed to each pooling window.

Given:

  • Input tensor X with shape [N, C, D_in, H_in, W_in]
  • Output gradients dL/dY with shape [N, C, D_out, H_out, W_out]
  • Pooling parameters: window size (pool_d, pool_h, pool_w), stride (stride_d, stride_h, stride_w), padding (pad_d, pad_h, pad_w)

The backward pass computes input gradients dL/dX with the same shape as X.

For 3D average pooling, the gradient is distributed uniformly across all positions in each pooling window: \frac{\partial L}{\partial X_{ncdhw}} = \sum_{\text{windows containing } (d,h,w)} \frac{1}{|W|} \cdot \frac{\partial L}{\partial Y_{ncd'h'w'}}

Where |W| is the effective window size (accounting for padding and boundaries), and the sum is over all output positions whose pooling windows include the input position (d,h,w).

Algorithmic Strategy: Parallel Gradient Distribution

The backward pass distributes gradients from output positions to all input positions that contributed to each pooling window.

  1. Grid Scheduling: The computation can be parallelized over either input or output tensor elements, depending on the implementation strategy.

  2. Gradient Distribution Algorithm (output-centric approach):

    • Initialize: Set all input gradients to zero.
    • For each output position: Each thread processes one output gradient position (n, c, d_out, h_out, w_out).
    • Calculate Input Window: Determine the 3D input window that contributed to this output position.
    • Effective Window Size: Calculate the actual number of input elements in the window (accounting for padding and boundaries).
    • Distribute Gradient: Add grad_output / window_size to each input position in the window (using atomic operations for thread safety).
  3. Boundary Handling: Careful handling of:

    • Padding: Input positions outside the valid range should not receive gradients
    • Partial Windows: Windows at boundaries may have fewer than pool_d × pool_h × pool_w elements
    • Edge Cases: Zero-sized windows or invalid configurations
  4. Memory Access Optimization:

    • Coalesced reading from output gradients
    • Efficient atomic operations for gradient accumulation
    • Minimized redundant boundary checks

Source Code Organization

Build and Run

Prerequisites

Ensure the Composable Kernel library is built and installed.

cd /path/to/composable_kernel/build
make -j install

Build the Example

cd /path/to/composable_kernel/example/51_avgpool3d_bwd
mkdir build && cd build

cmake \
  -DCMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc \
  -DCMAKE_PREFIX_PATH="/opt/rocm;${CK_INSTALL_PATH}" \
  ..

make -j

Run the Example

# Run the example with default settings
./avgpool3d_bwd_xdl

# Run with verification, data initialization, and timing
./avgpool3d_bwd_xdl 1 2 1

Comparison with Max Pooling Backward

3D average pooling backward differs significantly from max pooling backward:

Aspect Max Pooling Average Pooling
Gradient Flow Sparse (only to argmax positions) Dense (to all window positions)
Distribution Single position per window Uniform across window
Computation Requires argmax information Simple arithmetic division
Memory Pattern Irregular write pattern Regular, predictable pattern
Atomic Operations Needed for gradient routing Needed for accumulation

Applications in 3D Deep Learning

3D average pooling backward is essential for training models that process volumetric data:

  • Video Understanding: 3D CNNs for action recognition, video classification, and temporal modeling
  • Medical Imaging: 3D segmentation and classification of CT scans, MRI, and other volumetric medical data
  • 3D Object Recognition: Processing 3D point clouds, voxel grids, and depth data
  • Scientific Computing: Climate modeling, fluid dynamics, and other physics simulations
  • Augmented Reality: 3D scene understanding and object tracking in real-time applications