mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-15 18:42:06 +00:00
86 lines
3.4 KiB
Markdown
86 lines
3.4 KiB
Markdown
# Client Example: Quantization for GEMM and Conv2D
|
|
|
|
## Theory
|
|
|
|
This client example demonstrates **quantized GEMM and 2D convolution** operations, including per-layer and per-channel quantization, and fusion with bias and activation functions. Quantization reduces memory and computation by representing values with lower-precision integer types (e.g., int8), enabling efficient inference in deep learning.
|
|
|
|
**Mathematical Formulation:**
|
|
- Quantized GEMM: $C = \text{dequant}(A_q) \times \text{dequant}(B_q)$
|
|
- Quantized Conv2D: $Y = \text{dequant}(X_q) * \text{dequant}(W_q)$
|
|
- $\text{dequant}(x_q) = (x_q - z) \cdot s$ (scale $s$, zero-point $z$)
|
|
- Per-layer: one scale/zero-point per tensor
|
|
- Per-channel: scale/zero-point per output channel
|
|
|
|
**Algorithmic Background:**
|
|
- Quantized values are dequantized on-the-fly during computation.
|
|
- Accumulation is performed in higher precision for accuracy.
|
|
- Supports bias addition and activation fusion (ReLU, Tanh).
|
|
- Per-channel quantization improves accuracy for convolutional layers.
|
|
|
|
## How to Run
|
|
|
|
### Prerequisites
|
|
|
|
Please follow the instructions in the main [Build Guide](../../README.md#building-ck) section as a prerequisite to building and running this example.
|
|
|
|
### Build and run
|
|
```bash
|
|
cd composable_kernel/client_example/09_quantization
|
|
mkdir build && cd build
|
|
cmake -DCMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc ..
|
|
make -j
|
|
|
|
# Example run (GEMM quantization)
|
|
./gemm_quantization
|
|
|
|
# Example run (Conv2D per-layer quantization)
|
|
./conv2d_fwd_perlayer_quantization
|
|
|
|
# Example run (Conv2D per-channel quantization)
|
|
./conv2d_fwd_perchannel_quantization
|
|
|
|
# Example run (Conv2D + bias + ReLU + per-channel quantization)
|
|
./conv2d_fwd_bias_relu_perchannel_quantization
|
|
```
|
|
|
|
## Source Code Structure
|
|
|
|
### Directory Layout
|
|
```
|
|
client_example/09_quantization/
|
|
├── gemm_quantization.cpp # Quantized GEMM
|
|
├── conv2d_fwd_perlayer_quantization.cpp # Conv2D per-layer quantization
|
|
├── conv2d_fwd_perchannel_quantization.cpp # Conv2D per-channel quantization
|
|
├── conv2d_fwd_bias_relu_perlayer_quantization.cpp # Conv2D + bias + ReLU + per-layer quantization
|
|
├── conv2d_fwd_bias_relu_perchannel_quantization.cpp # Conv2D + bias + ReLU + per-channel quantization
|
|
├── conv2d_fwd_bias_tanh_perlayer_quantization.cpp # Conv2D + bias + Tanh + per-layer quantization
|
|
├── conv2d_fwd_bias_tanh_perchannel_quantization.cpp # Conv2D + bias + Tanh + per-channel quantization
|
|
├── CMakeLists.txt # Build configuration for the example
|
|
```
|
|
|
|
### Key Functions
|
|
|
|
- **main()** (in each `.cpp`):
|
|
Sets up input tensors, configures quantization parameters, launches the quantized kernel, and verifies the result.
|
|
- **Quantization kernel invocation**:
|
|
Uses the Composable Kernel device API to launch quantized GEMM or Conv2D with optional bias and activation.
|
|
|
|
---
|
|
|
|
## Additional Details
|
|
|
|
- Supports int8 quantization, per-layer and per-channel scaling.
|
|
- Demonstrates fusion with bias and activation (ReLU, Tanh).
|
|
- Example parameters can be adjusted in the source for different workloads.
|
|
|
|
---
|
|
|
|
## Related Examples
|
|
|
|
- [01_gemm](../01_gemm/README.md): GEMM for quantized matrix multiplication
|
|
- [14_gemm_quantization](../../example/14_gemm_quantization/README.md): GEMM quantization in the main example directory
|
|
- [40_conv2d_fwd_quantization](../../example/40_conv2d_fwd_quantization/README.md): Conv2D quantization in the main example directory
|
|
|
|
---
|
|
[Back to Client Examples](../README.md)
|