composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-04-19 22:39:03 +00:00

Files

Mateusz Ozga bd96ac9742 [CK_TILE] Multiple-D GEMM example (#2219 )

* Multiple d, initial commit

* Check Ds Layout

* Readme and clang format

* Update branch & conflicts

* Multiple D - fix clang-formatter

* Rename elemetwise_op

* Fix CI

* Code review part1

* Remove printf

* Remove unnecessary comment

* Add new tests with Col layout

* Review part 2

* Added support for Multiple D GEMM

* Update comment

* Remove maybe_unused

* Clang-format

* Review part 3

* Add comment to function

* Add comment to function: another

* Take number of params for a refrence function

* Remove additional d param for 0 tensor

* Change name of function

* Fix CI fails

2025-06-13 19:39:11 +02:00

CMakeLists.txt

[CK_TILE] Multiple-D GEMM example (#2219 )

2025-06-13 19:39:11 +02:00

gemm_multi_d_fp16.cpp

[CK_TILE] Multiple-D GEMM example (#2219 )

2025-06-13 19:39:11 +02:00

gemm_multi_d_fp16.hpp

[CK_TILE] Multiple-D GEMM example (#2219 )

2025-06-13 19:39:11 +02:00

README.md

[CK_TILE] Multiple-D GEMM example (#2219 )

2025-06-13 19:39:11 +02:00

run_gemm_multi_d_fp16_example.inc

[CK_TILE] Multiple-D GEMM example (#2219 )

2025-06-13 19:39:11 +02:00

utils.hpp

[CK_TILE] Multiple-D GEMM example (#2219 )

2025-06-13 19:39:11 +02:00

README.md

#Multiple D GEMM

This folder contains example for Multiple D GEMM using ck_tile tile-programming implementation.

build

#in the root of ck_tile
mkdir build && cd build
#you can replace < arch> with the appropriate architecture(for example gfx90a or gfx942) or \
    leave it blank
sh ../script/cmake-ck-dev.sh  ../ <arch>
#The basic pipeline method on the gemm calculation
make tile_example_gemm_multi_d_fp16 -j

This will result in an executable build/bin/tile_example_gemm_multi_d_fp16

example

args:
       -m  M dimensions - (Default: 3840)
       -n  N dimensions - (Default: 4096)
       -k  K dimensions - (Default: 4096)
-a_layout  Tensor A layout (default:R)
-b_layout  Tensor B layout (default:C)
-ds_layout Tensor D layout (default:R)
-e_layout  Tensor E layout (default:R)
-stride_a  Tensor A strides - (Default: 0)
-stride_b  Tensor B strides - (Default: 0)
-stride_e  Tensor C strides - (Default: 0)
-stride_ds Tensor D strides - (Default: 0)
-validate  0. No validation, 1. Validation on GPU. (Default: 1)
  -warmup  Number of iterations before benchmark the kernel. (Default: 10)
  -repeat  Number of iterations to benchmark the kernel. (Default: 100)
  -kbatch  kbatch for SplitK. (Default 1)