Files
Max Podkorytov d184eed823 [CK-Tile] Refactor base pipeline usage (#3251)
* initial poc

* factor out common parts in operator()

* cv4

* rest of the universal gemm pipelines

* fix test

* remove boilerplate from tile engine

* fix example

* fix example

* format

* fix tests build for gemm

* remove base pipeline codegen from gemm instance builder

* unify v3 logic with the rest of universal gemm pipelines

* fix build for multi abd test

* fix test gemm multi d

* fix build for weight preshuffle

* fix grouped gemm test

* fix grouped gemm multi d test

* fix grouped gemm preshuffle

* fix grouped gemm example except for quant

* fix gemm preshuffle

* fix splitk 2 stage example

* fix batched gemm example

* fix multid example

* fix multiabd example

* fix batched gemm test

* fixup

* fix examples build

* fix grouped gemm test build

* fix smoke builder
2025-12-04 11:45:49 -08:00
..

#Multiple ABD GEMM

This folder contains example for Multiple ABD GEMM using ck_tile tile-programming implementation.

build

#in the root of ck_tile
mkdir build && cd build
#you can replace < arch> with the appropriate architecture(for example gfx90a or gfx942) or \
    leave it blank
sh ../script/cmake-ck-dev.sh  ../ <arch>
#The basic pipeline method on the gemm calculation
make tile_example_gemm_multi_abd_fp16 -j

This will result in an executable build/bin/tile_example_gemm_multi_abd_fp16

example

args:
       -m  M dimensions - (Default: 3840)
       -n  N dimensions - (Default: 4096)
       -k  K dimensions - (Default: 4096)
-as_layout  Tensor A layout (default:R)
-bs_layout  Tensor B layout (default:C)
-ds_layout  Tensor D layout (default:R)
-e_layout   Tensor E layout (default:R)
-stride_as  Tensor A strides - (Default: 0)
-stride_bs  Tensor B strides - (Default: 0)
-stride_e   Tensor C strides - (Default: 0)
-stride_ds  Tensor D strides - (Default: 0)
-validate   0. No validation, 1. Validation on GPU. (Default: 1)
  -warmup   Number of iterations before benchmark the kernel. (Default: 10)
  -repeat   Number of iterations to benchmark the kernel. (Default: 100)
  -kbatch   kbatch for SplitK. (Default: 1)