mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-04 21:51:28 +00:00
* GH-2368 Adding a basic glossary GH-2368 Minor edits GH-2368 Adding missing READMEs and standardization. resolving readme updates GH-2368 Minor improvements to documentation. Improving some readmes. Further improvement for readmes. Cleaned up the documentation in 'client_example' (#2468) Update for PR Update ACRONYMS.md to remove trivial terms Update ACRONYMS.md to provide detailed explanations for BF16 and BF8 formats Apply suggestion from @spolifroni-amd Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> Apply suggestion from @spolifroni-amd Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> Update README.md to clarify CK Tile API description and remove outdated references to the Tile Engine. revise 37_transpose readme revise 36_copy readme Remove references to the Tile Engine in README files for 19_gemm_multi_d and 35_batched_transpose, and update distribution links for clarity. Remove references to the Tile Engine in multiple README files and update distribution links for consistency and clarity. Remove references to the Tile Engine in README files across multiple examples * GH-2368 Adding a basic glossary GH-2368 Minor edits GH-2368 Adding missing READMEs and standardization. resolving readme updates GH-2368 Minor improvements to documentation. Improving some readmes. Further improvement for readmes. Cleaned up the documentation in 'client_example' (#2468) Update for PR Update ACRONYMS.md to remove trivial terms Update ACRONYMS.md to provide detailed explanations for BF16 and BF8 formats Apply suggestion from @spolifroni-amd Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> Apply suggestion from @spolifroni-amd Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> Update README.md to clarify CK Tile API description and remove outdated references to the Tile Engine. revise 37_transpose readme revise 36_copy readme Remove references to the Tile Engine in README files for 19_gemm_multi_d and 35_batched_transpose, and update distribution links for clarity. Remove references to the Tile Engine in multiple README files and update distribution links for consistency and clarity. Remove references to the Tile Engine in README files across multiple examples Refine README files by removing outdated references to the Tile Engine * Updates based on PR feedback 1 * Updates based on PR feedback 2 * Updates based on PR feedback 3 * Updates based on PR feedback 4 * Updates based on PR feedback 5 * Updates based on PR feedback 6 * Updates based on PR feedback 7 * Updates based on PR feedback 8 * Content Modification of CK Tile Example * Modify the ck_tile gemm config --------- Co-authored-by: AviralGoelAMD <aviral.goel@amd.com> Co-authored-by: ThomasNing <thomas.ning@amd.com>
94 lines
4.5 KiB
Markdown
94 lines
4.5 KiB
Markdown
# 3D Pooling Forward
|
|
|
|
This example demonstrates a **3D pooling forward operation**. Pooling is a fundamental operation in convolutional neural networks that reduces the spatial dimensions of feature maps while retaining important information. 3D pooling extends this concept to three-dimensional data, commonly used in video analysis, medical imaging, and 3D computer vision applications.
|
|
|
|
## Mathematical Formulation
|
|
|
|
3D pooling operates on 5D tensors with shape `[N, C, D, H, W]` where:
|
|
- `N` is the batch size
|
|
- `C` is the number of channels
|
|
- `D`, `H`, `W` are the depth, height, and width dimensions
|
|
|
|
The operation applies a pooling function over 3D windows of the input tensor.
|
|
|
|
For each output position `(n, c, d_out, h_out, w_out)`:
|
|
$\text{Out}_{ncd_{out}h_{out}w_{out}} = \text{Pool}(\{X_{ncd'h'w'} : d' \in W_d, h' \in W_h, w' \in W_w\})$
|
|
|
|
Where:
|
|
- $W_d$, $W_h$, $W_w$ define the 3D pooling window
|
|
- `Pool` is the pooling function (e.g., max or average)
|
|
|
|
**Max Pooling**: $\text{Pool}(S) = \max(S)$
|
|
**Average Pooling**: $\text{Pool}(S) = \frac{1}{|S|} \sum_{x \in S} x$
|
|
|
|
The window positions are determined by:
|
|
- **Window size**: `(pool_d, pool_h, pool_w)`
|
|
- **Stride**: `(stride_d, stride_h, stride_w)`
|
|
- **Padding**: `(pad_d, pad_h, pad_w)`
|
|
|
|
## Algorithmic Strategy: Parallel Window-based Computation
|
|
|
|
3D pooling is implemented as a parallel algorithm where each thread computes one output element.
|
|
|
|
1. **Grid Scheduling**: The output tensor elements are distributed across GPU threads. Each thread is assigned to compute one element of the output tensor.
|
|
|
|
2. **Window Processing**: For each output position, a thread:
|
|
- **Calculate Input Window**: Determines the 3D input window corresponding to the current output position based on stride, padding, and window size.
|
|
- **Boundary Handling**: Checks for boundary conditions and padding, ensuring that only valid input positions are processed.
|
|
- **Apply Pooling Function**:
|
|
- **Max Pooling**: Iterates through the window and finds the maximum value.
|
|
- **Average Pooling**: Iterates through the window, accumulates values, and computes the average.
|
|
- **Store Result**: Writes the computed result to the output tensor.
|
|
|
|
3. **Memory Access Optimization**: The kernel is optimized for memory access patterns, using techniques like:
|
|
- Coalesced memory access where possible
|
|
- Shared memory for frequently accessed data
|
|
- Efficient handling of boundary conditions
|
|
|
|
## Source Code Organization
|
|
|
|
- [`pool3d_fwd_xdl.cpp`](./pool3d_fwd_xdl.cpp): The main example file. It sets up a 3D input tensor, defines pooling parameters (window size, stride, padding), and instantiates the `DevicePool3dFwd` operation.
|
|
- [`../../include/ck/tensor_operation/gpu/device/device_pool3d_fwd.hpp`](../../include/ck/tensor_operation/gpu/device/device_pool3d_fwd.hpp): The high-level device interface for 3D pooling operations.
|
|
- [`../../include/ck/tensor_operation/gpu/grid/gridwise_pool3d_fwd.hpp`](../../include/ck/tensor_operation/gpu/grid/gridwise_pool3d_fwd.hpp): The grid-wise kernel implementing the parallel 3D pooling algorithm.
|
|
|
|
## Build and Run
|
|
|
|
### Prerequisites
|
|
Ensure the Composable Kernel library is built and installed.
|
|
```bash
|
|
cd /path/to/composable_kernel/build
|
|
make -j install
|
|
```
|
|
|
|
### Build the Example
|
|
```bash
|
|
cd /path/to/composable_kernel/example/48_pool3d_fwd
|
|
mkdir build && cd build
|
|
|
|
cmake \
|
|
-DCMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc \
|
|
-DCMAKE_PREFIX_PATH="/opt/rocm;${CK_INSTALL_PATH}" \
|
|
..
|
|
|
|
make -j
|
|
```
|
|
|
|
### Run the Example
|
|
```bash
|
|
# Run the example with default settings
|
|
./pool3d_fwd_xdl
|
|
|
|
# Run with verification, data initialization, and timing
|
|
./pool3d_fwd_xdl 1 2 1
|
|
```
|
|
|
|
## Applications
|
|
|
|
3D pooling is essential in several domains that process volumetric or temporal data.
|
|
|
|
- **Video Analysis**: In video understanding tasks, 3D CNNs use 3D pooling to reduce temporal and spatial dimensions while preserving important motion and appearance features.
|
|
- **Medical Imaging**: 3D medical images (CT scans, MRI) require 3D pooling for feature extraction while maintaining spatial relationships in all three dimensions.
|
|
- **3D Computer Vision**: Object detection and segmentation in 3D point clouds or voxel grids use 3D pooling for hierarchical feature learning.
|
|
- **Action Recognition**: Video action recognition models use 3D pooling to aggregate features across temporal and spatial dimensions.
|
|
- **Volumetric Data Processing**: Scientific applications processing 3D volumetric data (weather modeling, fluid dynamics) use 3D pooling for multi-scale analysis.
|