Files
composable_kernel/example/48_pool3d_fwd/README.md
2025-10-16 10:13:27 +00:00

94 lines
4.5 KiB
Markdown

# 3D Pooling Forward
This example demonstrates a **3D pooling forward operation**. Pooling is a fundamental operation in convolutional neural networks that reduces the spatial dimensions of feature maps while retaining important information. 3D pooling extends this concept to three-dimensional data, commonly used in video analysis, medical imaging, and 3D computer vision applications.
## Mathematical Formulation
3D pooling operates on 5D tensors with shape `[N, C, D, H, W]` where:
- `N` is the batch size
- `C` is the number of channels
- `D`, `H`, `W` are the depth, height, and width dimensions
The operation applies a pooling function over 3D windows of the input tensor.
For each output position `(n, c, d_out, h_out, w_out)`:
$\text{Out}_{ncd_{out}h_{out}w_{out}} = \text{Pool}(\{X_{ncd'h'w'} : d' \in W_d, h' \in W_h, w' \in W_w\})$
Where:
- $W_d$, $W_h$, $W_w$ define the 3D pooling window
- `Pool` is the pooling function (e.g., max or average)
**Max Pooling**: $\text{Pool}(S) = \max(S)$
**Average Pooling**: $\text{Pool}(S) = \frac{1}{|S|} \sum_{x \in S} x$
The window positions are determined by:
- **Window size**: `(pool_d, pool_h, pool_w)`
- **Stride**: `(stride_d, stride_h, stride_w)`
- **Padding**: `(pad_d, pad_h, pad_w)`
## Algorithmic Strategy: Parallel Window-based Computation
3D pooling is implemented as a parallel algorithm where each thread computes one output element.
1. **Grid Scheduling**: The output tensor elements are distributed across GPU threads. Each thread is assigned to compute one element of the output tensor.
2. **Window Processing**: For each output position, a thread:
- **Calculate Input Window**: Determines the 3D input window corresponding to the current output position based on stride, padding, and window size.
- **Boundary Handling**: Checks for boundary conditions and padding, ensuring that only valid input positions are processed.
- **Apply Pooling Function**:
- **Max Pooling**: Iterates through the window and finds the maximum value.
- **Average Pooling**: Iterates through the window, accumulates values, and computes the average.
- **Store Result**: Writes the computed result to the output tensor.
3. **Memory Access Optimization**: The kernel is optimized for memory access patterns, using techniques like:
- Coalesced memory access where possible
- Shared memory for frequently accessed data
- Efficient handling of boundary conditions
## Source Code Organization
- [`pool3d_fwd_xdl.cpp`](./pool3d_fwd_xdl.cpp): The main example file. It sets up a 3D input tensor, defines pooling parameters (window size, stride, padding), and instantiates the `DevicePool3dFwd` operation.
- [`../../include/ck/tensor_operation/gpu/device/device_pool3d_fwd.hpp`](../../include/ck/tensor_operation/gpu/device/device_pool3d_fwd.hpp): The high-level device interface for 3D pooling operations.
- [`../../include/ck/tensor_operation/gpu/grid/gridwise_pool3d_fwd.hpp`](../../include/ck/tensor_operation/gpu/grid/gridwise_pool3d_fwd.hpp): The grid-wise kernel implementing the parallel 3D pooling algorithm.
## Build and Run
### Prerequisites
Ensure the Composable Kernel library is built and installed.
```bash
cd /path/to/composable_kernel/build
make -j install
```
### Build the Example
```bash
cd /path/to/composable_kernel/example/48_pool3d_fwd
mkdir build && cd build
cmake \
-DCMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc \
-DCMAKE_PREFIX_PATH="/opt/rocm;${CK_INSTALL_PATH}" \
..
make -j
```
### Run the Example
```bash
# Run the example with default settings
./pool3d_fwd_xdl
# Run with verification, data initialization, and timing
./pool3d_fwd_xdl 1 2 1
```
## Applications
3D pooling is essential in several domains that process volumetric or temporal data.
- **Video Analysis**: In video understanding tasks, 3D CNNs use 3D pooling to reduce temporal and spatial dimensions while preserving important motion and appearance features.
- **Medical Imaging**: 3D medical images (CT scans, MRI) require 3D pooling for feature extraction while maintaining spatial relationships in all three dimensions.
- **3D Computer Vision**: Object detection and segmentation in 3D point clouds or voxel grids use 3D pooling for hierarchical feature learning.
- **Action Recognition**: Video action recognition models use 3D pooling to aggregate features across temporal and spatial dimensions.
- **Volumetric Data Processing**: Scientific applications processing 3D volumetric data (weather modeling, fluid dynamics) use 3D pooling for multi-scale analysis.