mirror of https://github.com/ROCm/composable_kernel.git synced 2026-03-30 12:05:52 +00:00

Files

Johannes Graner 0a474aa62f [CI, CK examples] Disable time_kernel for CI tests and examples (#3464 )

* Disable kernel timing in tests

* default time_kernel = false in old CK examples

2026-01-07 16:30:57 +01:00

CMakeLists.txt

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

gemm_mx_bf6.cpp

chore(copyright): update copyright header for example directory (#3273 )

2025-11-24 18:02:41 -08:00

gemm_mx_bf8.cpp

chore(copyright): update copyright header for example directory (#3273 )

2025-11-24 18:02:41 -08:00

gemm_mx_common.hpp

chore(copyright): update copyright header for example directory (#3273 )

2025-11-24 18:02:41 -08:00

gemm_mx_fp4_bpreshuffle.cpp

chore(copyright): update copyright header for example directory (#3273 )

2025-11-24 18:02:41 -08:00

gemm_mx_fp4.cpp

chore(copyright): update copyright header for example directory (#3273 )

2025-11-24 18:02:41 -08:00

gemm_mx_fp6.cpp

MX GEMM - FP6 Example (#2419 )

2025-07-07 10:33:26 -06:00

gemm_mx_fp8_bf8.cpp

chore(copyright): update copyright header for example directory (#3273 )

2025-11-24 18:02:41 -08:00

gemm_mx_fp8.cpp

chore(copyright): update copyright header for example directory (#3273 )

2025-11-24 18:02:41 -08:00

moe_gemm1_xdl_mx_fp4_bns.cpp

[CI, CK examples] Disable time_kernel for CI tests and examples (#3464 )

2026-01-07 16:30:57 +01:00

moe_gemm1_xdl_mx_fp4_bpreshuffle.cpp

[CI, CK examples] Disable time_kernel for CI tests and examples (#3464 )

2026-01-07 16:30:57 +01:00

moe_gemm1_xdl_mx_fp4.cpp

[CI, CK examples] Disable time_kernel for CI tests and examples (#3464 )

2026-01-07 16:30:57 +01:00

moe_gemm2_xdl_mx_fp4_bns.cpp

[CI, CK examples] Disable time_kernel for CI tests and examples (#3464 )

2026-01-07 16:30:57 +01:00

moe_gemm2_xdl_mx_fp4_bpreshuffle.cpp

[CI, CK examples] Disable time_kernel for CI tests and examples (#3464 )

2026-01-07 16:30:57 +01:00

moe_gemm2_xdl_mx_fp4.cpp

[CI, CK examples] Disable time_kernel for CI tests and examples (#3464 )

2026-01-07 16:30:57 +01:00

README.md

[DOCS] Documentation Addition (Readme updates) (#2495 )

2025-10-16 03:10:57 -07:00

README.md

GEMM with Microscaling

This example demonstrates a GEMM operation with microscaling, an advanced quantization technique that applies fine-grained scaling to small blocks of data. Microscaling enables more precise quantization than traditional methods by using different scale factors for small groups of elements, leading to better accuracy preservation in quantized neural network inference.

Source Code Organization

gemm_microscaling_xdl.cpp: The main example file. It sets up microscaled matrices with quantized data and scale factors, and instantiates the DeviceGemmMicroscaling operation.
../../include/ck/tensor_operation/gpu/device/device_gemm_microscaling.hpp: The device interface for GEMM with microscaling support.
The underlying kernel implements sophisticated block-wise dequantization integrated into the GEMM computation pipeline.

Build and Run

example_gemm_mx_fp8

Custom verification parameters:

# arg1: verification (0=no, 1=CPU)
# arg2: initialization (0=constant values, 1=integer values, 2=decimal values)
# arg3: time kernel (0=no, 1=yes)
# arg4: verbosity (0=no info, 1=verbose info)
# arg5 to 10: M(256x), N(256x), K(512x), StrideA, StrideB, StrideC
# arg11: KBatch
# arg12: warmup runs pre-timing
# arg13: repeat run count for timing
./bin/example_gemm_mx_fp8 1 1 0 1

Custom tensor shapes:

./bin/example_gemm_mx_fp8 1 2 1 0 256  256  512 -1 -1 -1 1 10 10

Run the Example

Custom verification parameters:

# arg1: verification (0=no, 1=CPU)
# arg2: initialization (0=constant values, 1=integer values, 2=decimal values)
# arg3: time kernel (0=no, 1=yes)
# arg4: verbosity (0=no info, 1=verbose info)
# arg5 to 10: M(128x), N(128x), K(64x), StrideA, StrideB, StrideC
# arg11: KBatch
./bin/example_gemm_mx_fp8 1 1 0 1

Custom tensor shapes:

./bin/example_gemm_mx_fp8 1 2 1 0 128  128  256 -1 -1 -1 1

Default invocation:

# Implies: ./bin/example_gemm_mx_fp8 1 2 0 0
./bin/example_gemm_mx_fp8