Files
composable_kernel/dispatcher/codegen/README.md
Vidyasagar Ananthan 9e049a32a1 Adding dispatcher architecture (#3300)
* WIP POC of dispatcher

* Dispatcher python workflow setup.

* Dispatcher cleanup and updates.

Further dispatcher cleanup and updates.

Build fixes

Improvements and python to CK example

Improvements to readme

* Fixes to python paths

* Cleaning up code

* Improving dispatcher support for different arch

Fixing typos

* Fix formatting errors

* Cleaning up examples

* Improving codegeneration

* Improving and fixing C++ examples

* Adding conv functionality (fwd,bwd,bwdw) and examples.

* Fixes based on feedback.

* Further fixes based on feedback.

* Adding stress test for autogeneration and autocorrection, and fixing preshuffle bug.

* Another round of improvements  based on feedback.

* Trimming out unnecessary code.

* Fixing the multi-D implementation.

* Using gpu verification for gemms and fixing convolutions tflops calculation.

* Fix counter usage issue and arch filtering per ops.

* Adding changelog and other fixes.

* Improve examples and resolve critical bugs.

* Reduce build time for python examples.

* Fixing minor bug.

* Fix compilation error.

* Improve installation instructions for dispatcher.

* Add docker based  installation instructions for dispatcher.

* Fixing arch-based filtering to match tile engine.

* Remove dead code and fix arch filtering.

* Minor bugfix.

* Updates after rebase.

* Trimming code.

* Fix copyright headers.

* Consolidate examples, cut down code.

* Minor fixes.

* Improving python examples.

* Update readmes.

* Remove conv functionality.

* Cleanup following conv removable.
2026-01-22 09:34:33 -08:00

124 lines
2.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# CK Tile GEMM Unified Code Generator
Single source of truth for all GEMM kernel generation.
> **See also:** [Main Dispatcher README](../README.md) for installation and core concepts.
## Quick Start
```bash
cd dispatcher/codegen
# Generate standard FP16 kernels
python3 unified_gemm_codegen.py \
--output-dir ../build/generated_kernels \
--datatype fp16 \
--layout rcr \
--variants standard
# Generate all variants
python3 unified_gemm_codegen.py \
--output-dir ../build/generated_kernels \
--variants standard preshuffle multi_d
```
## Using from Python
```python
from ctypes_utils import CodegenRunner, KernelConfig
# Generate from specific config
config = KernelConfig(tile_m=256, tile_n=256, tile_k=64)
codegen = CodegenRunner()
result = codegen.generate_from_config(config)
# Generate variant
result = codegen.generate("preshuffle")
# Generate all
results = codegen.generate_all()
```
## Command Line Options
| Option | Values | Description |
|--------|--------|-------------|
| `--output-dir` | path | Output directory |
| `--datatype` | `fp16`, `bf16`, `fp32`, `int8` | Data type |
| `--layout` | `rcr`, `rrr`, `crr`, `ccr` | Matrix layouts |
| `--gpu-target` | `gfx942`, `gfx90a`, `gfx950` | Target GPU |
| `--variants` | `standard`, `preshuffle`, `multi_d` | Kernel variants |
| `--preselected` | `fp16_rcr_essential`, etc. | Predefined kernel set |
### Layout Notation
- `R` = Row-major, `C` = Column-major
- Order: A, B, C (e.g., `rcr` = A row, B col, C row)
## Variants
### Standard
Basic GEMM: `C = A × B`
### PreShuffle
Optimized weight access with LDS pre-shuffling. Best for large matrices.
### Multi-D
Element-wise fusion: `C = op(A × B + D0 + D1 + ...)`
Supported ops: `PassThrough`, `MultiDAdd`, `Relu`, `Gelu`, `Sigmoid`, `Tanh`
## Output Structure
```
generated_kernels/
├── gemm_fp16_rcr_compv4_..._128x128x32_....hpp
├── gemm_fp16_rcr_compv4_..._preshuffle.hpp
├── gemm_fp16_rcr_compv4_..._multid_Relu_d1.hpp
└── ...
```
## Configuration Files
### arch_specs.json
GPU architecture specifications (single source of truth):
```json
{
"architectures": {
"gfx942": {
"family": "cdna3",
"warp_size": 64,
"warp_configs": [[2, 2, 1], [4, 4, 1]],
...
}
}
}
```
### preselected_kernels.py
Curated kernel sets for common use cases.
## Adding New GPU Support
See [ADDING_NEW_GPU.md](ADDING_NEW_GPU.md) for complete guide.
Quick steps:
1. Edit `arch_specs.json`
2. Run `python generate_arch_specs.py`
3. Rebuild
## Troubleshooting
| Issue | Solution |
|-------|----------|
| "Arguments not supported" | Check tile config validity |
| Missing element-wise op | Check `elementwise_ops.hpp` |
| Compilation errors | Verify C++17, include paths |
---
> **More info:** See [../README.md](../README.md) for full documentation.