mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-14 02:02:46 +00:00

Files

Yaswanth Raparti 907c6e94ae [CK][CK_TILE] Fix dispatcher cpp tests - registry key mismatch and string assertions (#6528 )

## Motivation

CPP tests in dispatcher were failing due to a mismatch in registry key
and string representation.

## Technical Details
Bug 1 - Registry key mismatch: The registry stored kernels using
get_name() but lookups used encode_identifier(), causing all registry
lookups to fail. Fixed by changing registry.cpp:58 to use
encode_identifier() for storage.
Bug 2 - String representation changes: Tests checked for
"persist"/"nopers" substrings, but the code emits "True"/"False". Fixed
by replacing brittle substring checks with comparison-based assertions
in test_kernel_key.cpp and test_kernel_key_extended.cpp.

## Test Plan

Tested with CPP tests in dispatcher 

## Test Result

Validation: All three core cpp tests now pass:
  - test_kernel_key - 6/6 tests passing
  - test_kernel_key_extended - 25/25 tests passing
  - test_registry - 8/8 tests passing
  
 
## Submission Checklist

- [ x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

2026-04-17 23:14:02 -06:00

ADDING_NEW_GPU.md

[CK] [CK_Tile] Add GroupConv to Kernel Dispatcher (#5168 )

2026-04-09 10:38:33 -07:00

arch_filter.py

Adding dispatcher architecture (#3300 )

2026-01-22 09:34:33 -08:00

arch_specs_generated.py

Adding dispatcher architecture (#3300 )

2026-01-22 09:34:33 -08:00

arch_specs.json

Adding dispatcher architecture (#3300 )

2026-01-22 09:34:33 -08:00

CMakeLists.txt

Adding dispatcher architecture (#3300 )

2026-01-22 09:34:33 -08:00

codegen_common.py

[CK] [CK_Tile] Add GroupConv to Kernel Dispatcher (#5168 )

2026-04-09 10:38:33 -07:00

default_config.json

Adding dispatcher architecture (#3300 )

2026-01-22 09:34:33 -08:00

generate_arch_specs.py

Adding dispatcher architecture (#3300 )

2026-01-22 09:34:33 -08:00

generate_dispatcher_registration.py

[CK] [CK_Tile] Add GroupConv to Kernel Dispatcher (#5168 )

2026-04-09 10:38:33 -07:00

generate_kernel_wrappers.py

[CK] [CK_Tile] Add GroupConv to Kernel Dispatcher (#5168 )

2026-04-09 10:38:33 -07:00

kernel_config_loader.py

[CK] [CK_Tile] Add GroupConv to Kernel Dispatcher (#5168 )

2026-04-09 10:38:33 -07:00

preselected_kernels.py

Adding dispatcher architecture (#3300 )

2026-01-22 09:34:33 -08:00

README.md

[CK] [CK_Tile] Add GroupConv to Kernel Dispatcher (#5168 )

2026-04-09 10:38:33 -07:00

unified_gemm_codegen.py

[CK][CK_TILE] Fix dispatcher cpp tests - registry key mismatch and string assertions (#6528 )

2026-04-17 23:14:02 -06:00

unified_grouped_conv_codegen.py

[CK][CK_TILE] Fix dispatcher cpp tests - registry key mismatch and string assertions (#6528 )

2026-04-17 23:14:02 -06:00

README.md

CK Tile Unified Code Generators

Single source of truth for GEMM and Grouped Convolution kernel generation.

See also: Main Dispatcher README for installation and core concepts.

Shared Infrastructure

Both GEMM and Grouped Conv generators share common code via codegen_common.py:

TileConfig - Dataclass for tile dimensions
TraitConfigBase - Base for kernel trait configurations with arch-aware validation
CommonTypeMappings - Dtype-to-C++ type mappings
parallel_generate() - Parallel kernel generation with per-kernel progress logging
Arch-aware expansion helpers (valid_wave_configs, valid_warp_configs, etc.)

Quick Start

GEMM

cd dispatcher/codegen

# Generate standard FP16 kernels
python3 unified_gemm_codegen.py \
    --output-dir ../build/generated_kernels \
    --datatype fp16 \
    --layout rcr \
    --variants standard

# Generate all variants
python3 unified_gemm_codegen.py \
    --output-dir ../build/generated_kernels \
    --variants standard preshuffle multi_d

Grouped Convolution

cd dispatcher/codegen

# Generate forward FP16 grouped conv kernels
python3 unified_grouped_conv_codegen.py \
    --output-dir ../build/generated_kernels \
    --datatype fp16 \
    --variant forward \
    --ndim-spatial 2

# Generate backward data kernels
python3 unified_grouped_conv_codegen.py \
    --output-dir ../build/generated_kernels \
    --variant backward_data \
    --ndim-spatial 2

Using from Python

from ctypes_utils import CodegenRunner, KernelConfig

# Generate from specific config
config = KernelConfig(tile_m=256, tile_n=256, tile_k=64)
codegen = CodegenRunner()
result = codegen.generate_from_config(config)

# Generate variant
result = codegen.generate("preshuffle")

# Generate all
results = codegen.generate_all()

Command Line Options

Option	Values	Description
`--output-dir`	path	Output directory
`--datatype`	`fp16`, `bf16`, `fp32`, `int8`	Data type
`--layout`	`rcr`, `rrr`, `crr`, `ccr`	Matrix layouts
`--gpu-target`	`gfx942`, `gfx90a`, `gfx950`	Target GPU
`--variants`	`standard`, `preshuffle`, `multi_d`	Kernel variants
`--preselected`	`fp16_rcr_essential`, etc.	Predefined kernel set

Layout Notation

R = Row-major, C = Column-major
Order: A, B, C (e.g., rcr = A row, B col, C row)

Variants

Standard

Basic GEMM: C = A x B

PreShuffle

Optimized weight access with LDS pre-shuffling. Best for large matrices.

Multi-D

Element-wise fusion: C = op(A x B + D0 + D1 + ...)

Supported ops: PassThrough, MultiDAdd, Relu, Gelu, Sigmoid, Tanh

Output Structure

generated_kernels/
|---- gemm_fp16_rcr_compv4_..._128x128x32_....hpp          # GEMM kernels
|---- gemm_fp16_rcr_compv4_..._preshuffle.hpp
|---- gemm_fp16_rcr_compv4_..._multid_Relu_d1.hpp
|---- grouped_conv_fwd_fp16_nhwgc_..._128x128x32_....hpp   # Grouped conv kernels
+---- ...

Configuration Files

arch_specs.json

GPU architecture specifications (single source of truth):

{
  "architectures": {
    "gfx942": {
      "family": "cdna3",
      "warp_size": 64,
      "warp_configs": [[2, 2, 1], [4, 4, 1]],
      ...
    }
  }
}

preselected_kernels.py

Curated kernel sets for common use cases.

Adding New GPU Support

See ADDING_NEW_GPU.md for complete guide.

Quick steps:

Edit arch_specs.json
Run python generate_arch_specs.py
Rebuild

Troubleshooting

Issue	Solution
"Arguments not supported"	Check tile config validity
Missing element-wise op	Check `elementwise_ops.hpp`
Compilation errors	Verify C++17, include paths

More info: See ../README.md for full documentation.