Files
Johannes Graner 0a474aa62f [CI, CK examples] Disable time_kernel for CI tests and examples (#3464)
* Disable kernel timing in tests

* default time_kernel = false in old CK examples
2026-01-07 16:30:57 +01:00
..

2D Convolution Forward with Quantization

Theory

This example demonstrates 2D convolution forward with quantized weights or activations. Quantization is used to reduce memory and computation by representing values with lower-precision integer types (e.g., int8), enabling efficient inference in deep learning.

Mathematical Formulation:

  • Quantized convolution: Y = \text{dequant}(X_q) * \text{dequant}(W_q)
  • X_q, W_q: quantized input and weight tensors (e.g., int8)
  • \text{dequant}(x_q) = (x_q - z) \cdot s (scale s, zero-point z)
  • Y: output tensor (often in higher precision, e.g., float32 or float16)

Algorithmic Background:

  • Quantized values are dequantized on-the-fly during convolution.
  • Accumulation is performed in higher precision for accuracy.
  • Supports symmetric and asymmetric quantization.
  • Convolution is implemented as implicit GEMM for efficiency.

How to Run

Prerequisites

Please follow the instructions in the main Build Guide section as a prerequisite to building and running this example.

Build and run

cd composable_kernel/example/40_conv2d_fwd_quantization
mkdir build && cd build
cmake -DCMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc ..
make -j

# Example run
./conv2d_fwd_quantization_xdl --verify=1 --time=1

Source Code Structure

Directory Layout

example/40_conv2d_fwd_quantization/
├── conv2d_fwd_quantization_xdl.cpp         # Main example: sets up, runs, and verifies quantized conv2d
include/ck/tensor_operation/gpu/device/
│   └── device_conv2d_fwd_quantization.hpp       # Device-level quantized conv2d API
include/ck/tensor_operation/gpu/device/impl/
│   └── device_conv2d_fwd_quantization_impl.hpp  # Implementation
include/ck/tensor_operation/gpu/grid/
│   └── gridwise_conv2d_fwd_quantization.hpp     # Grid-level quantized conv2d kernel
include/ck/tensor_operation/gpu/element/
    └── quantization_operations.hpp              # Quantization/dequantization utilities

Key Classes and Functions

  • DeviceConv2dFwdQuantization (in device_conv2d_fwd_quantization.hpp):
    Device API for quantized 2D convolution.
  • gridwise_conv2d_fwd_quantization (in gridwise_conv2d_fwd_quantization.hpp):
    Implements the tiled/blocking quantized conv2d kernel.
  • quantization_operations (in quantization_operations.hpp):
    Defines quantization and dequantization functions.

This example demonstrates how Composable Kernel supports efficient quantized convolution for deep learning inference.