composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-14 10:09:41 +00:00

Files

Mateusz Ozga 2a150508c8 [CK_TILE] Multiple-ABD GEMM example (#2788 )

* Multi ABD - initial commit

* Clang-foramt fix

* block gemm, unify the name of CDataType

* Apply chnages to mem-pipeline

* Rollback prefix for DType and Layout

* Gemm Kernel Basic, rename

* WMMA config

* Grouped GEMM

* Clang-format

* Dropout, name

* Review v2

* Move element_wise fn to unnary, remov old ones fn

* clang-format

* Fix issue review

* WP operator adjust to universal gemm

* v2 prepare

* Remove unused comment

* Remove vectorsize

* Rollback

* Adjust pipeline for abd

* Shuffle argument

* CI-fail fix quant

* Fix ag_br pipeline

* Failing tests

* Typo

* Single argument support

[ROCm/composable_kernel commit: 30ab1d6a71]

2025-09-19 01:14:11 +02:00

CMakeLists.txt

[CK_TILE] Multiple-ABD GEMM example (#2788 )

2025-09-19 01:14:11 +02:00

gemm_multi_abd_fp16.cpp

[CK_TILE] Multiple-ABD GEMM example (#2788 )

2025-09-19 01:14:11 +02:00

gemm_multi_abd_fp16.hpp

[CK_TILE] Multiple-ABD GEMM example (#2788 )

2025-09-19 01:14:11 +02:00

README.md

[CK_TILE] Multiple-ABD GEMM example (#2788 )

2025-09-19 01:14:11 +02:00

run_gemm_multi_abd_fp16_example.inc

[CK_TILE] Multiple-ABD GEMM example (#2788 )

2025-09-19 01:14:11 +02:00

utils.hpp

[CK_TILE] Multiple-ABD GEMM example (#2788 )

2025-09-19 01:14:11 +02:00

README.md

#Multiple ABD GEMM

This folder contains example for Multiple ABD GEMM using ck_tile tile-programming implementation.

build

#in the root of ck_tile
mkdir build && cd build
#you can replace < arch> with the appropriate architecture(for example gfx90a or gfx942) or \
    leave it blank
sh ../script/cmake-ck-dev.sh  ../ <arch>
#The basic pipeline method on the gemm calculation
make tile_example_gemm_multi_abd_fp16 -j

This will result in an executable build/bin/tile_example_gemm_multi_abd_fp16

example

args:
       -m  M dimensions - (Default: 3840)
       -n  N dimensions - (Default: 4096)
       -k  K dimensions - (Default: 4096)
-as_layout  Tensor A layout (default:R)
-bs_layout  Tensor B layout (default:C)
-ds_layout  Tensor D layout (default:R)
-e_layout   Tensor E layout (default:R)
-stride_as  Tensor A strides - (Default: 0)
-stride_bs  Tensor B strides - (Default: 0)
-stride_e   Tensor C strides - (Default: 0)
-stride_ds  Tensor D strides - (Default: 0)
-validate   0. No validation, 1. Validation on GPU. (Default: 1)
  -warmup   Number of iterations before benchmark the kernel. (Default: 10)
  -repeat   Number of iterations to benchmark the kernel. (Default: 100)
  -kbatch   kbatch for SplitK. (Default: 1)