* GH-2368 Adding a basic glossary GH-2368 Minor edits GH-2368 Adding missing READMEs and standardization. resolving readme updates GH-2368 Minor improvements to documentation. Improving some readmes. Further improvement for readmes. Cleaned up the documentation in 'client_example' (#2468) Update for PR Update ACRONYMS.md to remove trivial terms Update ACRONYMS.md to provide detailed explanations for BF16 and BF8 formats Apply suggestion from @spolifroni-amd Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> Apply suggestion from @spolifroni-amd Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> Update README.md to clarify CK Tile API description and remove outdated references to the Tile Engine. revise 37_transpose readme revise 36_copy readme Remove references to the Tile Engine in README files for 19_gemm_multi_d and 35_batched_transpose, and update distribution links for clarity. Remove references to the Tile Engine in multiple README files and update distribution links for consistency and clarity. Remove references to the Tile Engine in README files across multiple examples * GH-2368 Adding a basic glossary GH-2368 Minor edits GH-2368 Adding missing READMEs and standardization. resolving readme updates GH-2368 Minor improvements to documentation. Improving some readmes. Further improvement for readmes. Cleaned up the documentation in 'client_example' (#2468) Update for PR Update ACRONYMS.md to remove trivial terms Update ACRONYMS.md to provide detailed explanations for BF16 and BF8 formats Apply suggestion from @spolifroni-amd Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> Apply suggestion from @spolifroni-amd Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> Update README.md to clarify CK Tile API description and remove outdated references to the Tile Engine. revise 37_transpose readme revise 36_copy readme Remove references to the Tile Engine in README files for 19_gemm_multi_d and 35_batched_transpose, and update distribution links for clarity. Remove references to the Tile Engine in multiple README files and update distribution links for consistency and clarity. Remove references to the Tile Engine in README files across multiple examples Refine README files by removing outdated references to the Tile Engine * Updates based on PR feedback 1 * Updates based on PR feedback 2 * Updates based on PR feedback 3 * Updates based on PR feedback 4 * Updates based on PR feedback 5 * Updates based on PR feedback 6 * Updates based on PR feedback 7 * Updates based on PR feedback 8 * Content Modification of CK Tile Example * Modify the ck_tile gemm config --------- Co-authored-by: AviralGoelAMD <aviral.goel@amd.com> Co-authored-by: ThomasNing <thomas.ning@amd.com>
4.5 KiB
3D Pooling Forward
This example demonstrates a 3D pooling forward operation. Pooling is a fundamental operation in convolutional neural networks that reduces the spatial dimensions of feature maps while retaining important information. 3D pooling extends this concept to three-dimensional data, commonly used in video analysis, medical imaging, and 3D computer vision applications.
Mathematical Formulation
3D pooling operates on 5D tensors with shape [N, C, D, H, W] where:
Nis the batch sizeCis the number of channelsD,H,Ware the depth, height, and width dimensions
The operation applies a pooling function over 3D windows of the input tensor.
For each output position (n, c, d_out, h_out, w_out):
\text{Out}_{ncd_{out}h_{out}w_{out}} = \text{Pool}(\{X_{ncd'h'w'} : d' \in W_d, h' \in W_h, w' \in W_w\})
Where:
W_d,W_h,W_wdefine the 3D pooling windowPoolis the pooling function (e.g., max or average)
Max Pooling: \text{Pool}(S) = \max(S)
Average Pooling: \text{Pool}(S) = \frac{1}{|S|} \sum_{x \in S} x
The window positions are determined by:
- Window size:
(pool_d, pool_h, pool_w) - Stride:
(stride_d, stride_h, stride_w) - Padding:
(pad_d, pad_h, pad_w)
Algorithmic Strategy: Parallel Window-based Computation
3D pooling is implemented as a parallel algorithm where each thread computes one output element.
-
Grid Scheduling: The output tensor elements are distributed across GPU threads. Each thread is assigned to compute one element of the output tensor.
-
Window Processing: For each output position, a thread:
- Calculate Input Window: Determines the 3D input window corresponding to the current output position based on stride, padding, and window size.
- Boundary Handling: Checks for boundary conditions and padding, ensuring that only valid input positions are processed.
- Apply Pooling Function:
- Max Pooling: Iterates through the window and finds the maximum value.
- Average Pooling: Iterates through the window, accumulates values, and computes the average.
- Store Result: Writes the computed result to the output tensor.
-
Memory Access Optimization: The kernel is optimized for memory access patterns, using techniques like:
- Coalesced memory access where possible
- Shared memory for frequently accessed data
- Efficient handling of boundary conditions
Source Code Organization
pool3d_fwd_xdl.cpp: The main example file. It sets up a 3D input tensor, defines pooling parameters (window size, stride, padding), and instantiates theDevicePool3dFwdoperation.../../include/ck/tensor_operation/gpu/device/device_pool3d_fwd.hpp: The high-level device interface for 3D pooling operations.../../include/ck/tensor_operation/gpu/grid/gridwise_pool3d_fwd.hpp: The grid-wise kernel implementing the parallel 3D pooling algorithm.
Build and Run
Prerequisites
Ensure the Composable Kernel library is built and installed.
cd /path/to/composable_kernel/build
make -j install
Build the Example
cd /path/to/composable_kernel/example/48_pool3d_fwd
mkdir build && cd build
cmake \
-DCMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc \
-DCMAKE_PREFIX_PATH="/opt/rocm;${CK_INSTALL_PATH}" \
..
make -j
Run the Example
# Run the example with default settings
./pool3d_fwd_xdl
# Run with verification, data initialization, and timing
./pool3d_fwd_xdl 1 2 1
Applications
3D pooling is essential in several domains that process volumetric or temporal data.
- Video Analysis: In video understanding tasks, 3D CNNs use 3D pooling to reduce temporal and spatial dimensions while preserving important motion and appearance features.
- Medical Imaging: 3D medical images (CT scans, MRI) require 3D pooling for feature extraction while maintaining spatial relationships in all three dimensions.
- 3D Computer Vision: Object detection and segmentation in 3D point clouds or voxel grids use 3D pooling for hierarchical feature learning.
- Action Recognition: Video action recognition models use 3D pooling to aggregate features across temporal and spatial dimensions.
- Volumetric Data Processing: Scientific applications processing 3D volumetric data (weather modeling, fluid dynamics) use 3D pooling for multi-scale analysis.