* chore(copyright) update library wide CMakeLists.txt files copyright header template * Fix build --------- Co-authored-by: Sami Remes <samremes@amd.com>
3D Pooling Forward
This example demonstrates a 3D pooling forward operation. Pooling is a fundamental operation in convolutional neural networks that reduces the spatial dimensions of feature maps while retaining important information. 3D pooling extends this concept to three-dimensional data, commonly used in video analysis, medical imaging, and 3D computer vision applications.
Mathematical Formulation
3D pooling operates on 5D tensors with shape [N, C, D, H, W] where:
Nis the batch sizeCis the number of channelsD,H,Ware the depth, height, and width dimensions
The operation applies a pooling function over 3D windows of the input tensor.
For each output position (n, c, d_out, h_out, w_out):
\text{Out}_{ncd_{out}h_{out}w_{out}} = \text{Pool}(\{X_{ncd'h'w'} : d' \in W_d, h' \in W_h, w' \in W_w\})
Where:
W_d,W_h,W_wdefine the 3D pooling windowPoolis the pooling function (e.g., max or average)
Max Pooling: \text{Pool}(S) = \max(S)
Average Pooling: \text{Pool}(S) = \frac{1}{|S|} \sum_{x \in S} x
The window positions are determined by:
- Window size:
(pool_d, pool_h, pool_w) - Stride:
(stride_d, stride_h, stride_w) - Padding:
(pad_d, pad_h, pad_w)
Algorithmic Strategy: Parallel Window-based Computation
3D pooling is implemented as a parallel algorithm where each thread computes one output element.
-
Grid Scheduling: The output tensor elements are distributed across GPU threads. Each thread is assigned to compute one element of the output tensor.
-
Window Processing: For each output position, a thread:
- Calculate Input Window: Determines the 3D input window corresponding to the current output position based on stride, padding, and window size.
- Boundary Handling: Checks for boundary conditions and padding, ensuring that only valid input positions are processed.
- Apply Pooling Function:
- Max Pooling: Iterates through the window and finds the maximum value.
- Average Pooling: Iterates through the window, accumulates values, and computes the average.
- Store Result: Writes the computed result to the output tensor.
-
Memory Access Optimization: The kernel is optimized for memory access patterns, using techniques like:
- Coalesced memory access where possible
- Shared memory for frequently accessed data
- Efficient handling of boundary conditions
Source Code Organization
pool3d_fwd_xdl.cpp: The main example file. It sets up a 3D input tensor, defines pooling parameters (window size, stride, padding), and instantiates theDevicePool3dFwdoperation.../../include/ck/tensor_operation/gpu/device/device_pool3d_fwd.hpp: The high-level device interface for 3D pooling operations.../../include/ck/tensor_operation/gpu/grid/gridwise_pool3d_fwd.hpp: The grid-wise kernel implementing the parallel 3D pooling algorithm.
Build and Run
Prerequisites
Ensure the Composable Kernel library is built and installed.
cd /path/to/composable_kernel/build
make -j install
Build the Example
cd /path/to/composable_kernel/example/48_pool3d_fwd
mkdir build && cd build
cmake \
-DCMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc \
-DCMAKE_PREFIX_PATH="/opt/rocm;${CK_INSTALL_PATH}" \
..
make -j
Run the Example
# Run the example with default settings
./pool3d_fwd_xdl
# Run with verification, data initialization, and timing
./pool3d_fwd_xdl 1 2 1
Applications
3D pooling is essential in several domains that process volumetric or temporal data.
- Video Analysis: In video understanding tasks, 3D CNNs use 3D pooling to reduce temporal and spatial dimensions while preserving important motion and appearance features.
- Medical Imaging: 3D medical images (CT scans, MRI) require 3D pooling for feature extraction while maintaining spatial relationships in all three dimensions.
- 3D Computer Vision: Object detection and segmentation in 3D point clouds or voxel grids use 3D pooling for hierarchical feature learning.
- Action Recognition: Video action recognition models use 3D pooling to aggregate features across temporal and spatial dimensions.
- Volumetric Data Processing: Scientific applications processing 3D volumetric data (weather modeling, fluid dynamics) use 3D pooling for multi-scale analysis.