mirror of https://github.com/ROCm/composable_kernel.git synced 2026-04-19 22:39:03 +00:00

Files

Aviral Goel d85f065b15 chore(copyright): update copyright header for example directory (#3273 )

* chore(copyright): update copyright header for codegen directory

* chore(copyright): update copyright header for example directory

2025-11-24 18:02:41 -08:00

avgpool3d_bwd_bf16.cpp

chore(copyright): update copyright header for example directory (#3273 )

2025-11-24 18:02:41 -08:00

avgpool3d_bwd_common.hpp

chore(copyright): update copyright header for example directory (#3273 )

2025-11-24 18:02:41 -08:00

avgpool3d_bwd_fp16.cpp

chore(copyright): update copyright header for example directory (#3273 )

2025-11-24 18:02:41 -08:00

avgpool3d_bwd_fp32.cpp

chore(copyright): update copyright header for example directory (#3273 )

2025-11-24 18:02:41 -08:00

CMakeLists.txt

Average pool backward deviceOP and example (#797 )

2023-08-10 12:04:35 +08:00

README.md

[DOCS] Documentation Addition (Readme updates) (#2495 )

2025-10-16 03:10:57 -07:00

README.md

3D Average Pooling Backward

This example demonstrates the backward pass of 3D average pooling. This operation computes the gradient of the loss with respect to the input of a 3D average pooling layer, which is essential for training 3D convolutional neural networks used in video analysis, medical imaging, and volumetric data processing.

Mathematical Formulation

The backward pass of 3D average pooling distributes the output gradients uniformly across all input positions that contributed to each pooling window.

Given:

Input tensor X with shape [N, C, D_in, H_in, W_in]
Output gradients dL/dY with shape [N, C, D_out, H_out, W_out]
Pooling parameters: window size (pool_d, pool_h, pool_w), stride (stride_d, stride_h, stride_w), padding (pad_d, pad_h, pad_w)

The backward pass computes input gradients dL/dX with the same shape as X.

For 3D average pooling, the gradient is distributed uniformly across all positions in each pooling window: \frac{\partial L}{\partial X_{ncdhw}} = \sum_{\text{windows containing } (d,h,w)} \frac{1}{|W|} \cdot \frac{\partial L}{\partial Y_{ncd'h'w'}}

Where |W| is the effective window size (accounting for padding and boundaries), and the sum is over all output positions whose pooling windows include the input position (d,h,w).

Algorithmic Strategy: Parallel Gradient Distribution

The backward pass distributes gradients from output positions to all input positions that contributed to each pooling window.

Grid Scheduling: The computation can be parallelized over either input or output tensor elements, depending on the implementation strategy.
Gradient Distribution Algorithm (output-centric approach):
- Initialize: Set all input gradients to zero.
- For each output position: Each thread processes one output gradient position (n, c, d_out, h_out, w_out).
- Calculate Input Window: Determine the 3D input window that contributed to this output position.
- Effective Window Size: Calculate the actual number of input elements in the window (accounting for padding and boundaries).
- Distribute Gradient: Add grad_output / window_size to each input position in the window (using atomic operations for thread safety).
Boundary Handling: Careful handling of:
- Padding: Input positions outside the valid range should not receive gradients
- Partial Windows: Windows at boundaries may have fewer than pool_d × pool_h × pool_w elements
- Edge Cases: Zero-sized windows or invalid configurations
Memory Access Optimization:
- Coalesced reading from output gradients
- Efficient atomic operations for gradient accumulation
- Minimized redundant boundary checks

Source Code Organization

avgpool3d_bwd_xdl.cpp: The main example file. It sets up the input tensor, output gradients, pooling parameters, and instantiates the DeviceAvgpool3dBwd operation.
../../include/ck/tensor_operation/gpu/device/device_avgpool3d_bwd.hpp: The high-level device interface for 3D average pooling backward operations.
../../include/ck/tensor_operation/gpu/grid/gridwise_avgpool3d_bwd.hpp: The grid-wise kernel implementing the gradient distribution algorithm.

Build and Run

Prerequisites

Ensure the Composable Kernel library is built and installed.

cd /path/to/composable_kernel/build
make -j install

Build the Example

cd /path/to/composable_kernel/example/51_avgpool3d_bwd
mkdir build && cd build

cmake \
  -DCMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc \
  -DCMAKE_PREFIX_PATH="/opt/rocm;${CK_INSTALL_PATH}" \
  ..

make -j

Run the Example

# Run the example with default settings
./avgpool3d_bwd_xdl

# Run with verification, data initialization, and timing
./avgpool3d_bwd_xdl 1 2 1

Comparison with Max Pooling Backward

3D average pooling backward differs significantly from max pooling backward:

Aspect	Max Pooling	Average Pooling
Gradient Flow	Sparse (only to argmax positions)	Dense (to all window positions)
Distribution	Single position per window	Uniform across window
Computation	Requires argmax information	Simple arithmetic division
Memory Pattern	Irregular write pattern	Regular, predictable pattern
Atomic Operations	Needed for gradient routing	Needed for accumulation

Applications in 3D Deep Learning

3D average pooling backward is essential for training models that process volumetric data:

Video Understanding: 3D CNNs for action recognition, video classification, and temporal modeling
Medical Imaging: 3D segmentation and classification of CT scans, MRI, and other volumetric medical data
3D Object Recognition: Processing 3D point clouds, voxel grids, and depth data
Scientific Computing: Climate modeling, fluid dynamics, and other physics simulations
Augmented Reality: 3D scene understanding and object tracking in real-time applications

README.md Unescape Escape

3D Average Pooling Backward

Mathematical Formulation

Algorithmic Strategy: Parallel Gradient Distribution

Source Code Organization

Build and Run

Prerequisites

Build the Example

Run the Example

Comparison with Max Pooling Backward

Applications in 3D Deep Learning

README.md