* Wrap ck host utitlies in CK namespace.
The CK and CK-Tile source code bases are incompatible because CK is not properly using namespaces everywhere. In particular, we need to put hip_check_error in the ck namespace.
Move all functions in include/ck_/host_utility that were in global namespace into the ck namespace.
There may be additional namespace problems like this, and it's possible we'll have namespace clashes. But it is good design to properly guard our to code bases (CK and CKTile) so that they can both coexist. Moreover, estabilishing this compatiblity is essential if we are going to allow the builder to instantiate kernels from either template library.
* Add using declarations to test code.
After moving some of the untils into the ck namespace, most examples and a few tests had to be updated to recognize the new namespace declarations. We add using declarations to individual compute units for functions that were previously in the global namespace.
* Add using declarations to client examples.
[ROCm/composable_kernel commit: ad57f6ef0b]
3D Pooling Forward
This example demonstrates a 3D pooling forward operation. Pooling is a fundamental operation in convolutional neural networks that reduces the spatial dimensions of feature maps while retaining important information. 3D pooling extends this concept to three-dimensional data, commonly used in video analysis, medical imaging, and 3D computer vision applications.
Mathematical Formulation
3D pooling operates on 5D tensors with shape [N, C, D, H, W] where:
Nis the batch sizeCis the number of channelsD,H,Ware the depth, height, and width dimensions
The operation applies a pooling function over 3D windows of the input tensor.
For each output position (n, c, d_out, h_out, w_out):
\text{Out}_{ncd_{out}h_{out}w_{out}} = \text{Pool}(\{X_{ncd'h'w'} : d' \in W_d, h' \in W_h, w' \in W_w\})
Where:
W_d,W_h,W_wdefine the 3D pooling windowPoolis the pooling function (e.g., max or average)
Max Pooling: \text{Pool}(S) = \max(S)
Average Pooling: \text{Pool}(S) = \frac{1}{|S|} \sum_{x \in S} x
The window positions are determined by:
- Window size:
(pool_d, pool_h, pool_w) - Stride:
(stride_d, stride_h, stride_w) - Padding:
(pad_d, pad_h, pad_w)
Algorithmic Strategy: Parallel Window-based Computation
3D pooling is implemented as a parallel algorithm where each thread computes one output element.
-
Grid Scheduling: The output tensor elements are distributed across GPU threads. Each thread is assigned to compute one element of the output tensor.
-
Window Processing: For each output position, a thread:
- Calculate Input Window: Determines the 3D input window corresponding to the current output position based on stride, padding, and window size.
- Boundary Handling: Checks for boundary conditions and padding, ensuring that only valid input positions are processed.
- Apply Pooling Function:
- Max Pooling: Iterates through the window and finds the maximum value.
- Average Pooling: Iterates through the window, accumulates values, and computes the average.
- Store Result: Writes the computed result to the output tensor.
-
Memory Access Optimization: The kernel is optimized for memory access patterns, using techniques like:
- Coalesced memory access where possible
- Shared memory for frequently accessed data
- Efficient handling of boundary conditions
Source Code Organization
pool3d_fwd_xdl.cpp: The main example file. It sets up a 3D input tensor, defines pooling parameters (window size, stride, padding), and instantiates theDevicePool3dFwdoperation.../../include/ck/tensor_operation/gpu/device/device_pool3d_fwd.hpp: The high-level device interface for 3D pooling operations.../../include/ck/tensor_operation/gpu/grid/gridwise_pool3d_fwd.hpp: The grid-wise kernel implementing the parallel 3D pooling algorithm.
Build and Run
Prerequisites
Ensure the Composable Kernel library is built and installed.
cd /path/to/composable_kernel/build
make -j install
Build the Example
cd /path/to/composable_kernel/example/48_pool3d_fwd
mkdir build && cd build
cmake \
-DCMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc \
-DCMAKE_PREFIX_PATH="/opt/rocm;${CK_INSTALL_PATH}" \
..
make -j
Run the Example
# Run the example with default settings
./pool3d_fwd_xdl
# Run with verification, data initialization, and timing
./pool3d_fwd_xdl 1 2 1
Applications
3D pooling is essential in several domains that process volumetric or temporal data.
- Video Analysis: In video understanding tasks, 3D CNNs use 3D pooling to reduce temporal and spatial dimensions while preserving important motion and appearance features.
- Medical Imaging: 3D medical images (CT scans, MRI) require 3D pooling for feature extraction while maintaining spatial relationships in all three dimensions.
- 3D Computer Vision: Object detection and segmentation in 3D point clouds or voxel grids use 3D pooling for hierarchical feature learning.
- Action Recognition: Video action recognition models use 3D pooling to aggregate features across temporal and spatial dimensions.
- Volumetric Data Processing: Scientific applications processing 3D volumetric data (weather modeling, fluid dynamics) use 3D pooling for multi-scale analysis.