mirror of https://github.com/ROCm/composable_kernel.git synced 2026-06-29 11:16:59 +00:00

Files

Aviral Goel 004784ef98 chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

* chore(copyright) update library wide CMakeLists.txt files copyright header template

* Fix build

---------

Co-authored-by: Sami Remes <samremes@amd.com>

2025-11-28 13:49:54 -08:00

CMakeLists.txt

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

conv2d_fwd_bias_relu_perchannel_quantization.cpp

chore(copyright): update copyright header for test directory (#3252 )

2025-11-20 20:36:57 -05:00

conv2d_fwd_bias_relu_perlayer_quantization.cpp

chore(copyright): update copyright header for test directory (#3252 )

2025-11-20 20:36:57 -05:00

conv2d_fwd_bias_tanh_perchannel_quantization.cpp

chore(copyright): update copyright header for test directory (#3252 )

2025-11-20 20:36:57 -05:00

conv2d_fwd_bias_tanh_perlayer_quantization.cpp

chore(copyright): update copyright header for test directory (#3252 )

2025-11-20 20:36:57 -05:00

conv2d_fwd_perchannel_quantization.cpp

chore(copyright): update copyright header for test directory (#3252 )

2025-11-20 20:36:57 -05:00

conv2d_fwd_perlayer_quantization.cpp

chore(copyright): update copyright header for test directory (#3252 )

2025-11-20 20:36:57 -05:00

gemm_quantization.cpp

chore(copyright): update copyright header for test directory (#3252 )

2025-11-20 20:36:57 -05:00

README.md

[DOCS] Documentation Addition (Readme updates) (#2495 )

2025-10-16 03:10:57 -07:00

README.md

Client Example: Quantization for GEMM and Conv2D

Theory

This client example demonstrates quantized GEMM and 2D convolution operations, including per-layer and per-channel quantization, and fusion with bias and activation functions. Quantization reduces memory and computation by representing values with lower-precision integer types (e.g., int8), enabling efficient inference in deep learning.

Mathematical Formulation:

Quantized GEMM: C = \text{dequant}(A_q) \times \text{dequant}(B_q)
Quantized Conv2D: Y = \text{dequant}(X_q) * \text{dequant}(W_q)
\text{dequant}(x_q) = (x_q - z) \cdot s (scale s, zero-point z)
Per-layer: one scale/zero-point per tensor
Per-channel: scale/zero-point per output channel

Algorithmic Background:

Quantized values are dequantized on-the-fly during computation.
Accumulation is performed in higher precision for accuracy.
Supports bias addition and activation fusion (ReLU, Tanh).
Per-channel quantization improves accuracy for convolutional layers.

How to Run

Prerequisites

Please follow the instructions in the main Build Guide section as a prerequisite to building and running this example.

Build and run

cd composable_kernel/client_example/09_quantization
mkdir build && cd build
cmake -DCMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc ..
make -j

# Example run (GEMM quantization)
./gemm_quantization

# Example run (Conv2D per-layer quantization)
./conv2d_fwd_perlayer_quantization

# Example run (Conv2D per-channel quantization)
./conv2d_fwd_perchannel_quantization

# Example run (Conv2D + bias + ReLU + per-channel quantization)
./conv2d_fwd_bias_relu_perchannel_quantization

Source Code Structure

Directory Layout

client_example/09_quantization/
├── gemm_quantization.cpp                         # Quantized GEMM
├── conv2d_fwd_perlayer_quantization.cpp          # Conv2D per-layer quantization
├── conv2d_fwd_perchannel_quantization.cpp        # Conv2D per-channel quantization
├── conv2d_fwd_bias_relu_perlayer_quantization.cpp # Conv2D + bias + ReLU + per-layer quantization
├── conv2d_fwd_bias_relu_perchannel_quantization.cpp # Conv2D + bias + ReLU + per-channel quantization
├── conv2d_fwd_bias_tanh_perlayer_quantization.cpp # Conv2D + bias + Tanh + per-layer quantization
├── conv2d_fwd_bias_tanh_perchannel_quantization.cpp # Conv2D + bias + Tanh + per-channel quantization
├── CMakeLists.txt                                # Build configuration for the example

Key Functions

main() (in each .cpp):
Sets up input tensors, configures quantization parameters, launches the quantized kernel, and verifies the result.
Quantization kernel invocation:
Uses the Composable Kernel device API to launch quantized GEMM or Conv2D with optional bias and activation.

Additional Details

Supports int8 quantization, per-layer and per-channel scaling.
Demonstrates fusion with bias and activation (ReLU, Tanh).
Example parameters can be adjusted in the source for different workloads.

01_gemm: GEMM for quantized matrix multiplication
14_gemm_quantization: GEMM quantization in the main example directory
40_conv2d_fwd_quantization: Conv2D quantization in the main example directory

Back to Client Examples

README.md

Client Example: Quantization for GEMM and Conv2D

Theory

How to Run

Prerequisites

Build and run

Source Code Structure

Directory Layout

Key Functions

Additional Details

Related Examples