mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-04 21:51:28 +00:00

Files

Mateusz Ozga 3c93d3c444 CK-Tile Grouped GEMM refactor and post PR fixes (#1756 )

* Grouped gemm simple code refactor

* Offset invoker

* Invoke generic Run, and replace name of parrtitioner variable

* Tests fix type

* Removed namespaces

* Add template param to avoid implicit cast

* Remove generic function

* Constant value

* underline enum to int16_t

* Generalize partitioner function

* Remove whitespaces

* Rename function

* Using support

* Clang-format

* Clang-format

* Fn-partitioner description fn

* Typo

* Typo 2

* Better description

* Better description

* Refactor after review

* Use ctr instead of set fn

* Inovke ctr and typo

* Comments

* Remove unnecessary comment

* Review, remove modulo

2025-01-21 21:06:10 +01:00

CMakeLists.txt

Ck tile grouped GEMM example (#1713 )

2024-12-04 21:40:01 +01:00

grouped_gemm.cpp

CK-Tile Grouped GEMM refactor and post PR fixes (#1756 )

2025-01-21 21:06:10 +01:00

grouped_gemm.hpp

CK-Tile Grouped GEMM refactor and post PR fixes (#1756 )

2025-01-21 21:06:10 +01:00

README.md

Ck tile grouped GEMM example (#1713 )

2024-12-04 21:40:01 +01:00

run_grouped_gemm_example.inc

CK-Tile Grouped GEMM refactor and post PR fixes (#1756 )

2025-01-21 21:06:10 +01:00

README.md

Grouped CShuffle GEMM

This folder contains example for Grouped GEMM using ck_tile tile-programming implementation. Currently, it only supports the basic feature of the CK Tile GEMM, but creates the placeholders for the future support on different GEMM pipeline and different GEMM modules. In the near future, we will gradually migrate all the GEMM features from old CK to CK Tile.

build

# in the root of ck_tile
mkdir build && cd build
# you can replace <arch> with the appropriate architecture (for example gfx90a or gfx942) or leave it blank
sh ../script/cmake-ck-dev.sh  ../ <arch>
# The basic pipeline method on the gemm calculation
make tile_example_grouped_gemm -j

This will result in an executable build/bin/tile_example_grouped_gemm

example

args:
   -a_layout    Tensor A layout (default:R)
   -b_layout    Tensor B layout (default:R)
   -c_layout    Tensor C layout (default:R)
          -v    0. No validation, 1. Validation on CPU
     -warmup    number of iterations before benchmark the kernel (default:10)
     -repeat    number of iterations to benchmark the kernel (default:100)