mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-18 12:00:07 +00:00
94 lines
4.5 KiB
Markdown
94 lines
4.5 KiB
Markdown
# 3D Pooling Forward
|
|
|
|
This example demonstrates a **3D pooling forward operation**. Pooling is a fundamental operation in convolutional neural networks that reduces the spatial dimensions of feature maps while retaining important information. 3D pooling extends this concept to three-dimensional data, commonly used in video analysis, medical imaging, and 3D computer vision applications.
|
|
|
|
## Mathematical Formulation
|
|
|
|
3D pooling operates on 5D tensors with shape `[N, C, D, H, W]` where:
|
|
- `N` is the batch size
|
|
- `C` is the number of channels
|
|
- `D`, `H`, `W` are the depth, height, and width dimensions
|
|
|
|
The operation applies a pooling function over 3D windows of the input tensor.
|
|
|
|
For each output position `(n, c, d_out, h_out, w_out)`:
|
|
$\text{Out}_{ncd_{out}h_{out}w_{out}} = \text{Pool}(\{X_{ncd'h'w'} : d' \in W_d, h' \in W_h, w' \in W_w\})$
|
|
|
|
Where:
|
|
- $W_d$, $W_h$, $W_w$ define the 3D pooling window
|
|
- `Pool` is the pooling function (e.g., max or average)
|
|
|
|
**Max Pooling**: $\text{Pool}(S) = \max(S)$
|
|
**Average Pooling**: $\text{Pool}(S) = \frac{1}{|S|} \sum_{x \in S} x$
|
|
|
|
The window positions are determined by:
|
|
- **Window size**: `(pool_d, pool_h, pool_w)`
|
|
- **Stride**: `(stride_d, stride_h, stride_w)`
|
|
- **Padding**: `(pad_d, pad_h, pad_w)`
|
|
|
|
## Algorithmic Strategy: Parallel Window-based Computation
|
|
|
|
3D pooling is implemented as a parallel algorithm where each thread computes one output element.
|
|
|
|
1. **Grid Scheduling**: The output tensor elements are distributed across GPU threads. Each thread is assigned to compute one element of the output tensor.
|
|
|
|
2. **Window Processing**: For each output position, a thread:
|
|
- **Calculate Input Window**: Determines the 3D input window corresponding to the current output position based on stride, padding, and window size.
|
|
- **Boundary Handling**: Checks for boundary conditions and padding, ensuring that only valid input positions are processed.
|
|
- **Apply Pooling Function**:
|
|
- **Max Pooling**: Iterates through the window and finds the maximum value.
|
|
- **Average Pooling**: Iterates through the window, accumulates values, and computes the average.
|
|
- **Store Result**: Writes the computed result to the output tensor.
|
|
|
|
3. **Memory Access Optimization**: The kernel is optimized for memory access patterns, using techniques like:
|
|
- Coalesced memory access where possible
|
|
- Shared memory for frequently accessed data
|
|
- Efficient handling of boundary conditions
|
|
|
|
## Source Code Organization
|
|
|
|
- [`pool3d_fwd_xdl.cpp`](./pool3d_fwd_xdl.cpp): The main example file. It sets up a 3D input tensor, defines pooling parameters (window size, stride, padding), and instantiates the `DevicePool3dFwd` operation.
|
|
- [`../../include/ck/tensor_operation/gpu/device/device_pool3d_fwd.hpp`](../../include/ck/tensor_operation/gpu/device/device_pool3d_fwd.hpp): The high-level device interface for 3D pooling operations.
|
|
- [`../../include/ck/tensor_operation/gpu/grid/gridwise_pool3d_fwd.hpp`](../../include/ck/tensor_operation/gpu/grid/gridwise_pool3d_fwd.hpp): The grid-wise kernel implementing the parallel 3D pooling algorithm.
|
|
|
|
## Build and Run
|
|
|
|
### Prerequisites
|
|
Ensure the Composable Kernel library is built and installed.
|
|
```bash
|
|
cd /path/to/composable_kernel/build
|
|
make -j install
|
|
```
|
|
|
|
### Build the Example
|
|
```bash
|
|
cd /path/to/composable_kernel/example/48_pool3d_fwd
|
|
mkdir build && cd build
|
|
|
|
cmake \
|
|
-DCMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc \
|
|
-DCMAKE_PREFIX_PATH="/opt/rocm;${CK_INSTALL_PATH}" \
|
|
..
|
|
|
|
make -j
|
|
```
|
|
|
|
### Run the Example
|
|
```bash
|
|
# Run the example with default settings
|
|
./pool3d_fwd_xdl
|
|
|
|
# Run with verification, data initialization, and timing
|
|
./pool3d_fwd_xdl 1 2 1
|
|
```
|
|
|
|
## Applications
|
|
|
|
3D pooling is essential in several domains that process volumetric or temporal data.
|
|
|
|
- **Video Analysis**: In video understanding tasks, 3D CNNs use 3D pooling to reduce temporal and spatial dimensions while preserving important motion and appearance features.
|
|
- **Medical Imaging**: 3D medical images (CT scans, MRI) require 3D pooling for feature extraction while maintaining spatial relationships in all three dimensions.
|
|
- **3D Computer Vision**: Object detection and segmentation in 3D point clouds or voxel grids use 3D pooling for hierarchical feature learning.
|
|
- **Action Recognition**: Video action recognition models use 3D pooling to aggregate features across temporal and spatial dimensions.
|
|
- **Volumetric Data Processing**: Scientific applications processing 3D volumetric data (weather modeling, fluid dynamics) use 3D pooling for multi-scale analysis.
|