Files
composable_kernel/example/22_cgemm
John Shumway ad57f6ef0b [CK_BUILDER] Put global CK functions in an the CK namespace (#3232)
* Wrap ck host utitlies in CK namespace.

The CK and CK-Tile source code bases are incompatible because CK is not properly using namespaces everywhere. In particular, we need to put hip_check_error in the ck namespace.

Move all functions in include/ck_/host_utility that were in global namespace into the ck namespace.

There may be additional namespace problems like this, and it's possible we'll have namespace clashes. But it is good design to properly guard our to code bases (CK and CKTile) so that they can both coexist. Moreover, estabilishing this compatiblity is essential if we are going to allow the builder to instantiate  kernels from either template library.

* Add using declarations to test code.

After moving some of the untils into the ck namespace, most examples and a few tests had to be updated to recognize the new namespace declarations. We add using declarations to individual compute units for functions that were previously in the global namespace.

* Add using declarations to client examples.
2025-11-19 11:23:02 +01:00
..

Complex General Matrix-Matrix Multiplication (CGEMM)

This example demonstrates a General Matrix-Matrix Multiplication for complex-valued tensors (CGEMM). This operation is a fundamental building block in many scientific and engineering domains, including signal processing, quantum computing, and electromagnetics, where computations are naturally expressed using complex numbers.

Mathematical Formulation

A complex number z can be represented as z = a + bi, where a is the real part and b is the imaginary part. The multiplication of two complex numbers z1 = a + bi and z2 = c + di is:

z_1 \cdot z_2 = (a+bi)(c+di) = (ac - bd) + (ad + bc)i

A CGEMM operation, D = \alpha \cdot (A \times B) + \beta \cdot C, involves matrices where each element is a complex number. The core matrix multiplication A \times B is defined as:

C_{ik} = \sum_j A_{ij} \cdot B_{jk}

Where each multiplication and addition is a complex operation. This can be broken down into four real-valued GEMM operations:

Let A = A_r + iA_i and B = B_r + iB_i. Then the product C = A \times B is: C = (A_r + iA_i) \times (B_r + iB_i) = (A_r B_r - A_i B_i) + i(A_r B_i + A_i B_r)

This shows that one CGEMM can be decomposed into four real GEMMs and two real matrix additions/subtractions.

Algorithmic Strategy: Fused Complex Arithmetic

A naive implementation would launch six separate real-valued kernels (4 GEMMs, 2 additions). A much more efficient approach, and the one used by Composable Kernel, is to implement CGEMM in a single, fused kernel.

  1. Data Layout: Complex numbers are typically stored in an interleaved format, where the real and imaginary parts of an element are adjacent in memory (e.g., [r1, i1, r2, i2, ...]). The kernel is designed to work efficiently with this layout.

  2. Tiled CGEMM: The kernel uses a standard tiled GEMM algorithm, but the fundamental operations are adapted for complex numbers.

    • Loading: A thread block loads tiles of the complex-valued matrices A and B from global memory into shared memory.
    • Complex Multiply-Accumulate: The core of the algorithm is the multiply-accumulate (MAC) operation. Instead of a single fma instruction, each complex MAC involves multiple real-valued fma instructions to compute the real and imaginary parts of the product, as shown in the mathematical formulation.
      • real_part = (a_r * b_r) - (a_i * b_i)
      • imag_part = (a_r * b_i) + (a_i * b_r)
    • These operations are carefully scheduled to maximize instruction-level parallelism and hide latency. The accumulators for both the real and imaginary parts are held in private registers.
  3. Storing: After the tile is fully computed, the complex-valued result is written from registers back to the output matrix D in global memory.

By fusing the complex arithmetic directly into the GEMM kernel, we avoid launching multiple kernels and storing large intermediate real-valued matrices, which dramatically reduces kernel launch overhead and memory bandwidth requirements.

Source Code Organization

  • cgemm_xdl.cpp: The main example file. It defines complex-valued input matrices and instantiates the DeviceGemm operation, specialized for complex data types.
  • The standard DeviceGemm interface from ../../include/ck/tensor_operation/gpu/device/device_gemm.hpp is used. Composable Kernel overloads this interface for complex types (ck::complex<T>).
  • The grid-wise GEMM kernel is specialized to handle complex types. When the template arguments for data types are ck::complex, the compiler instantiates a version of the kernel where the MAC operations are replaced with the sequence of real-valued operations required for complex multiplication.

Build and Run

Prerequisites

Ensure the Composable Kernel library is built and installed.

cd /path/to/composable_kernel/build
make -j install

Build the Example

cd /path/to/composable_kernel/example/22_cgemm
mkdir build && cd build

cmake \
  -DCMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc \
  -DCMAKE_PREFIX_PATH="/opt/rocm;${CK_INSTALL_PATH}" \
  ..

make -j

Run the Example

# Run the example with default settings
./cgemm_xdl

# Run with verification, data initialization, and timing
./cgemm_xdl 1 2 1

Applications

CGEMM is a critical kernel in many high-performance computing applications:

  • Digital Signal Processing (DSP): The Fast Fourier Transform (FFT), a cornerstone of DSP, can be implemented using complex matrix multiplications. Filtering and convolution in the frequency domain also rely on complex arithmetic.
  • Quantum Computing Simulation: The state of a quantum system is described by a vector of complex numbers, and quantum gates are represented by unitary matrices (a special type of complex matrix). Simulating a quantum circuit involves a sequence of CGEMM operations.
  • Electromagnetics and Wave Physics: Simulating the propagation of electromagnetic or acoustic waves often involves solving systems of equations with complex numbers to represent the phase and amplitude of the waves.
  • Communications: Modern communication systems (like 5G and Wi-Fi) use complex modulation schemes (like QAM) where signals are represented by complex numbers.