Files

3D Pooling Forward

This example demonstrates a 3D pooling forward operation. Pooling is a fundamental operation in convolutional neural networks that reduces the spatial dimensions of feature maps while retaining important information. 3D pooling extends this concept to three-dimensional data, commonly used in video analysis, medical imaging, and 3D computer vision applications.

Mathematical Formulation

3D pooling operates on 5D tensors with shape [N, C, D, H, W] where:

  • N is the batch size
  • C is the number of channels
  • D, H, W are the depth, height, and width dimensions

The operation applies a pooling function over 3D windows of the input tensor.

For each output position (n, c, d_out, h_out, w_out): \text{Out}_{ncd_{out}h_{out}w_{out}} = \text{Pool}(\{X_{ncd'h'w'} : d' \in W_d, h' \in W_h, w' \in W_w\})

Where:

  • W_d, W_h, W_w define the 3D pooling window
  • Pool is the pooling function (e.g., max or average)

Max Pooling: \text{Pool}(S) = \max(S) Average Pooling: \text{Pool}(S) = \frac{1}{|S|} \sum_{x \in S} x

The window positions are determined by:

  • Window size: (pool_d, pool_h, pool_w)
  • Stride: (stride_d, stride_h, stride_w)
  • Padding: (pad_d, pad_h, pad_w)

Algorithmic Strategy: Parallel Window-based Computation

3D pooling is implemented as a parallel algorithm where each thread computes one output element.

  1. Grid Scheduling: The output tensor elements are distributed across GPU threads. Each thread is assigned to compute one element of the output tensor.

  2. Window Processing: For each output position, a thread:

    • Calculate Input Window: Determines the 3D input window corresponding to the current output position based on stride, padding, and window size.
    • Boundary Handling: Checks for boundary conditions and padding, ensuring that only valid input positions are processed.
    • Apply Pooling Function:
      • Max Pooling: Iterates through the window and finds the maximum value.
      • Average Pooling: Iterates through the window, accumulates values, and computes the average.
    • Store Result: Writes the computed result to the output tensor.
  3. Memory Access Optimization: The kernel is optimized for memory access patterns, using techniques like:

    • Coalesced memory access where possible
    • Shared memory for frequently accessed data
    • Efficient handling of boundary conditions

Source Code Organization

Build and Run

Prerequisites

Ensure the Composable Kernel library is built and installed.

cd /path/to/composable_kernel/build
make -j install

Build the Example

cd /path/to/composable_kernel/example/48_pool3d_fwd
mkdir build && cd build

cmake \
  -DCMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc \
  -DCMAKE_PREFIX_PATH="/opt/rocm;${CK_INSTALL_PATH}" \
  ..

make -j

Run the Example

# Run the example with default settings
./pool3d_fwd_xdl

# Run with verification, data initialization, and timing
./pool3d_fwd_xdl 1 2 1

Applications

3D pooling is essential in several domains that process volumetric or temporal data.

  • Video Analysis: In video understanding tasks, 3D CNNs use 3D pooling to reduce temporal and spatial dimensions while preserving important motion and appearance features.
  • Medical Imaging: 3D medical images (CT scans, MRI) require 3D pooling for feature extraction while maintaining spatial relationships in all three dimensions.
  • 3D Computer Vision: Object detection and segmentation in 3D point clouds or voxel grids use 3D pooling for hierarchical feature learning.
  • Action Recognition: Video action recognition models use 3D pooling to aggregate features across temporal and spatial dimensions.
  • Volumetric Data Processing: Scientific applications processing 3D volumetric data (weather modeling, fluid dynamics) use 3D pooling for multi-scale analysis.