mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-04 21:51:28 +00:00

Files

Cong Ma 82890192dd [CK TILE] Support fp8/fp16 with pk_int4_t as data types for tensors A and B (#2805 )

- Add support for tensor A/B in both fp16+pk_int4_t and fp8+pk_int4_t formats
- Implement A(bf8) B(i4) support in universal GEMM
- Use new implementation for i4 to fp8 conversion in Block Scale

2025-09-09 16:40:52 -07:00

CMakeLists.txt

[CK_TILE] Row/Col quant gemm (#2729 )

2025-09-04 16:17:12 -07:00

gemm_aquant_preshuffle.cpp

[CK_TILE] Row/Col quant gemm (#2729 )

2025-09-04 16:17:12 -07:00

gemm_bquant_basic.cpp

[CK TILE] Support fp8/fp16 with pk_int4_t as data types for tensors A and B (#2805 )

2025-09-09 16:40:52 -07:00

gemm_quant_basic.cpp

[CK_TILE] Row/Col quant gemm (#2729 )

2025-09-04 16:17:12 -07:00

gemm_utils.hpp

[CK_TILE] Row/Col quant gemm (#2729 )

2025-09-04 16:17:12 -07:00

README.md

[CK_TILE] Row/Col quant gemm (#2729 )

2025-09-04 16:17:12 -07:00

run_gemm_aquant_example.inc

[CK TILE] Support fp8/fp16 with pk_int4_t as data types for tensors A and B (#2805 )

2025-09-09 16:40:52 -07:00

run_gemm_bquant_example.inc

[CK TILE] Support fp8/fp16 with pk_int4_t as data types for tensors A and B (#2805 )

2025-09-09 16:40:52 -07:00

run_gemm_quant_example.inc

[CK TILE] Support fp8/fp16 with pk_int4_t as data types for tensors A and B (#2805 )

2025-09-09 16:40:52 -07:00

README.md

Quant GEMM Matrix Multiplication

This folder contains examples of quant GEMMs using the ck_tile tile-programming implementation.

AQuant kernel with blocks of A matrix sharing scales: custom GEMM pipeline
Row and Column-wise scaled: scaling implemented in Epilogue

build

# in the root of ck_tile
mkdir build && cd build
# you can replace <arch> with the appropriate architecture (for example gfx942) or leave it blank
../script/cmake-ck-dev.sh  ../ <arch>
# Compile the quant kernels
make tile_example_gemm_quant_basic -j
make tile_example_gemm_bquant_basic -j

This will result in an executable build/bin/tile_example_gemm_quant_basic

example

args:
          -b    batch size (default:1)
          -m    m dimension (default:1024)
          -n    n dimension (default:2048)
          -k    k dimension (default:64)
   -a_layout    Tensor A data layout (default: R)
   -b_layout    Tensor B data layout (default: C)
   -c_layout    Tensor C data layout (default: R)
   -stride_a    Tensor A stride (default:0)
   -stride_b    Tensor B stride (default:0)
   -stride_c    Tensor C stride (default:0)
          -v    0. No validation, 1. Validation on CPU, 2. Validation on GPU (default:1)
          -e    Absolute error tolerance (default:1e-5)
       -prec    data type. fp8/bf8/i4fp8/i4bf8/i4f32fp8/i4f32bf8 (default:fp8)
     -warmup    number of iterations before benchmark the kernel (default:10)
     -repeat    number of iterations to benchmark the kernel (default:100)
      -timer    gpu:gpu timer, cpu:cpu timer (default:gpu)
 -quant_mode    Which quant method to use (aquant, rowcol)