mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-14 10:09:41 +00:00

Files

Vidyasagar Ananthan 7ce0127e8f Adding dispatcher architecture (#3300 )

* WIP POC of dispatcher

* Dispatcher python workflow setup.

* Dispatcher cleanup and updates.

Further dispatcher cleanup and updates.

Build fixes

Improvements and python to CK example

Improvements to readme

* Fixes to python paths

* Cleaning up code

* Improving dispatcher support for different arch

Fixing typos

* Fix formatting errors

* Cleaning up examples

* Improving codegeneration

* Improving and fixing C++ examples

* Adding conv functionality (fwd,bwd,bwdw) and examples.

* Fixes based on feedback.

* Further fixes based on feedback.

* Adding stress test for autogeneration and autocorrection, and fixing preshuffle bug.

* Another round of improvements  based on feedback.

* Trimming out unnecessary code.

* Fixing the multi-D implementation.

* Using gpu verification for gemms and fixing convolutions tflops calculation.

* Fix counter usage issue and arch filtering per ops.

* Adding changelog and other fixes.

* Improve examples and resolve critical bugs.

* Reduce build time for python examples.

* Fixing minor bug.

* Fix compilation error.

* Improve installation instructions for dispatcher.

* Add docker based  installation instructions for dispatcher.

* Fixing arch-based filtering to match tile engine.

* Remove dead code and fix arch filtering.

* Minor bugfix.

* Updates after rebase.

* Trimming code.

* Fix copyright headers.

* Consolidate examples, cut down code.

* Minor fixes.

* Improving python examples.

* Update readmes.

* Remove conv functionality.

* Cleanup following conv removable.

[ROCm/composable_kernel commit: 9e049a32a1]

2026-01-22 09:34:33 -08:00

ADDING_NEW_GPU.md

Adding dispatcher architecture (#3300 )

2026-01-22 09:34:33 -08:00

arch_filter.py

Adding dispatcher architecture (#3300 )

2026-01-22 09:34:33 -08:00

arch_specs_generated.py

Adding dispatcher architecture (#3300 )

2026-01-22 09:34:33 -08:00

arch_specs.json

Adding dispatcher architecture (#3300 )

2026-01-22 09:34:33 -08:00

CMakeLists.txt

Adding dispatcher architecture (#3300 )

2026-01-22 09:34:33 -08:00

default_config.json

Adding dispatcher architecture (#3300 )

2026-01-22 09:34:33 -08:00

generate_arch_specs.py

Adding dispatcher architecture (#3300 )

2026-01-22 09:34:33 -08:00

generate_dispatcher_registration.py

Adding dispatcher architecture (#3300 )

2026-01-22 09:34:33 -08:00

generate_kernel_wrappers.py

Adding dispatcher architecture (#3300 )

2026-01-22 09:34:33 -08:00

kernel_config_loader.py

Adding dispatcher architecture (#3300 )

2026-01-22 09:34:33 -08:00

preselected_kernels.py

Adding dispatcher architecture (#3300 )

2026-01-22 09:34:33 -08:00

README.md

Adding dispatcher architecture (#3300 )

2026-01-22 09:34:33 -08:00

unified_gemm_codegen.py

Adding dispatcher architecture (#3300 )

2026-01-22 09:34:33 -08:00

README.md

CK Tile GEMM Unified Code Generator

Single source of truth for all GEMM kernel generation.

See also: Main Dispatcher README for installation and core concepts.

Quick Start

cd dispatcher/codegen

# Generate standard FP16 kernels
python3 unified_gemm_codegen.py \
    --output-dir ../build/generated_kernels \
    --datatype fp16 \
    --layout rcr \
    --variants standard

# Generate all variants
python3 unified_gemm_codegen.py \
    --output-dir ../build/generated_kernels \
    --variants standard preshuffle multi_d

Using from Python

from ctypes_utils import CodegenRunner, KernelConfig

# Generate from specific config
config = KernelConfig(tile_m=256, tile_n=256, tile_k=64)
codegen = CodegenRunner()
result = codegen.generate_from_config(config)

# Generate variant
result = codegen.generate("preshuffle")

# Generate all
results = codegen.generate_all()

Command Line Options

Option	Values	Description
`--output-dir`	path	Output directory
`--datatype`	`fp16`, `bf16`, `fp32`, `int8`	Data type
`--layout`	`rcr`, `rrr`, `crr`, `ccr`	Matrix layouts
`--gpu-target`	`gfx942`, `gfx90a`, `gfx950`	Target GPU
`--variants`	`standard`, `preshuffle`, `multi_d`	Kernel variants
`--preselected`	`fp16_rcr_essential`, etc.	Predefined kernel set

Layout Notation

R = Row-major, C = Column-major
Order: A, B, C (e.g., rcr = A row, B col, C row)

Variants

Standard

Basic GEMM: C = A × B

PreShuffle

Optimized weight access with LDS pre-shuffling. Best for large matrices.

Multi-D

Element-wise fusion: C = op(A × B + D0 + D1 + ...)

Supported ops: PassThrough, MultiDAdd, Relu, Gelu, Sigmoid, Tanh

Output Structure

generated_kernels/
├── gemm_fp16_rcr_compv4_..._128x128x32_....hpp
├── gemm_fp16_rcr_compv4_..._preshuffle.hpp
├── gemm_fp16_rcr_compv4_..._multid_Relu_d1.hpp
└── ...

Configuration Files

arch_specs.json

GPU architecture specifications (single source of truth):

{
  "architectures": {
    "gfx942": {
      "family": "cdna3",
      "warp_size": 64,
      "warp_configs": [[2, 2, 1], [4, 4, 1]],
      ...
    }
  }
}

preselected_kernels.py

Curated kernel sets for common use cases.

Adding New GPU Support

See ADDING_NEW_GPU.md for complete guide.

Quick steps:

Edit arch_specs.json
Run python generate_arch_specs.py
Rebuild

Troubleshooting

Issue	Solution
"Arguments not supported"	Check tile config validity
Missing element-wise op	Check `elementwise_ops.hpp`
Compilation errors	Verify C++17, include paths

More info: See ../README.md for full documentation.

README.md Unescape Escape

CK Tile GEMM Unified Code Generator

Quick Start

Using from Python

Command Line Options

Layout Notation

Variants

Standard

PreShuffle

Multi-D

Output Structure

Configuration Files

arch_specs.json

preselected_kernels.py

Adding New GPU Support

Troubleshooting

README.md