Files
composable_kernel/dispatcher/codegen/ADDING_NEW_GPU.md
Vidyasagar Ananthan 9e049a32a1 Adding dispatcher architecture (#3300)
* WIP POC of dispatcher

* Dispatcher python workflow setup.

* Dispatcher cleanup and updates.

Further dispatcher cleanup and updates.

Build fixes

Improvements and python to CK example

Improvements to readme

* Fixes to python paths

* Cleaning up code

* Improving dispatcher support for different arch

Fixing typos

* Fix formatting errors

* Cleaning up examples

* Improving codegeneration

* Improving and fixing C++ examples

* Adding conv functionality (fwd,bwd,bwdw) and examples.

* Fixes based on feedback.

* Further fixes based on feedback.

* Adding stress test for autogeneration and autocorrection, and fixing preshuffle bug.

* Another round of improvements  based on feedback.

* Trimming out unnecessary code.

* Fixing the multi-D implementation.

* Using gpu verification for gemms and fixing convolutions tflops calculation.

* Fix counter usage issue and arch filtering per ops.

* Adding changelog and other fixes.

* Improve examples and resolve critical bugs.

* Reduce build time for python examples.

* Fixing minor bug.

* Fix compilation error.

* Improve installation instructions for dispatcher.

* Add docker based  installation instructions for dispatcher.

* Fixing arch-based filtering to match tile engine.

* Remove dead code and fix arch filtering.

* Minor bugfix.

* Updates after rebase.

* Trimming code.

* Fix copyright headers.

* Consolidate examples, cut down code.

* Minor fixes.

* Improving python examples.

* Update readmes.

* Remove conv functionality.

* Cleanup following conv removable.
2026-01-22 09:34:33 -08:00

198 lines
4.7 KiB
Markdown

# Adding New GPU Architecture Support
Guide for adding support for a new AMD GPU architecture to the CK Tile Dispatcher.
> **See also:** [Main Dispatcher README](../README.md) | [Codegen README](README.md)
## Overview
The dispatcher uses `arch_specs.json` as the **single source of truth** for GPU specifications:
```
arch_specs.json → generate_arch_specs.py → arch_specs_generated.py (Python)
→ arch_specs_generated.hpp (C++)
```
## Quick Start
```bash
# 1. Edit arch_specs.json
# 2. Run generator
python generate_arch_specs.py
# 3. Rebuild
cd ../build && cmake --build . -j8
# 4. Test
ctest
```
## Step-by-Step Guide
### Step 1: Edit arch_specs.json
Add new architecture under `"architectures"`:
```json
{
"architectures": {
"gfx1100": {
"family": "rdna3",
"description": "AMD Radeon RX 7000 series (RDNA3)",
"warp_size": 32,
"lds_capacity_kb": 64,
"warp_configs": [
[2, 4, 1],
[4, 2, 1]
],
"warp_tile_combos": {
"fp16_fp16_fp16": [[16, 16, 16], [32, 32, 16]],
"bf16_bf16_bf16": [[16, 16, 16], [32, 32, 16]]
}
}
}
}
```
### Step 2: Configuration Fields
| Field | Description | Example |
|-------|-------------|---------|
| `family` | GPU family | `"cdna3"`, `"rdna4"` |
| `description` | Human-readable name | `"AMD Instinct MI300"` |
| `warp_size` | Wave/warp size | `64` (CDNA), `32` (RDNA) |
| `lds_capacity_kb` | LDS memory in KB | `64` |
| `warp_configs` | Valid `[warp_m, warp_n, warp_k]` | `[[2,2,1], [4,4,1]]` |
| `warp_tile_combos` | Warp tiles per dtype | See below |
### Step 3: Warp Tile Combinations
Map data type combinations to valid warp tile sizes:
```json
"warp_tile_combos": {
"fp16_fp16_fp16": [[32, 32, 8], [16, 16, 16], [32, 32, 16]],
"bf16_bf16_bf16": [[32, 32, 8], [16, 16, 16]],
"fp8_fp8_fp16": [[32, 32, 16], [32, 32, 32]],
"int8_int8_int32": [[16, 16, 32], [32, 32, 16]]
}
```
Key format: `{A_dtype}_{B_dtype}_{C_dtype}`
### Step 4: Run Generator
```bash
cd dispatcher/codegen
python generate_arch_specs.py
```
This generates:
- `arch_specs_generated.py` (Python module)
- `../include/ck_tile/dispatcher/arch_specs_generated.hpp` (C++ header)
### Step 5: Rebuild and Test
```bash
cd ../build
cmake --build . -j8
ctest --output-on-failure
```
### Step 6: Verify
```python
from arch_filter import ArchFilter
filter = ArchFilter("gfx1100")
is_valid = filter.is_kernel_valid(
datatype_a="fp16", datatype_b="fp16", datatype_c="fp16",
tile_m=128, tile_n=128, tile_k=32,
warp_m=2, warp_n=2, warp_k=1,
warp_tile_m=16, warp_tile_n=16, warp_tile_k=16
)
print(f"Valid: {is_valid}")
```
## Reference
### Supported Data Types
| Key | Description |
|-----|-------------|
| `fp16` | Half precision (16-bit) |
| `bf16` | Brain float 16 |
| `fp32` | Single precision (32-bit) |
| `fp64` | Double precision (64-bit) |
| `fp8` | 8-bit float (E4M3) |
| `bf8` | 8-bit brain float (E5M2) |
| `int8` | 8-bit integer |
| `int4` | 4-bit integer |
### GPU Families
| Family | Description |
|--------|-------------|
| `cdna2` | MI200 series (gfx90a) |
| `cdna3` | MI300 series (gfx942) |
| `cdna4` | MI350 series (gfx950) |
| `rdna3` | RX 7000 series (gfx1100) |
| `rdna4` | RX 9000 series (gfx1201) |
### Pipeline LDS Limits
| Pipeline | LDS Limit |
|----------|-----------|
| `compv4` | 32 KB |
| `preshufflev2` | 32 KB |
| `default` | 64 KB |
## Troubleshooting
### "Unknown GPU architecture"
1. Check architecture key matches exactly (e.g., `"gfx942"` not `"GFX942"`)
2. Verify you ran `generate_arch_specs.py`
3. Rebuild C++ code
### Kernels being rejected
```python
from arch_filter import ArchFilter, KernelConfig
filter = ArchFilter("gfx942")
result = filter.validate_kernel(config)
print(f"Valid: {result.valid}")
for error in result.errors:
print(f" Error: {error}")
```
### Missing warp tile combination
1. Check `warp_tile_combos` in `arch_specs.json`
2. Ensure `[warp_tile_m, warp_tile_n, warp_tile_k]` is in the list
3. Verify data type key format
## File Structure
```
codegen/
├── arch_specs.json # Single source of truth (EDIT THIS)
├── generate_arch_specs.py # Generator script
├── arch_specs_generated.py # Generated Python module
└── ADDING_NEW_GPU.md # This file
include/ck_tile/dispatcher/
├── arch_specs_generated.hpp # Generated C++ header
└── arch_filter.hpp # C++ filter
```
## Best Practices
1. **Test thoroughly** - Run all tests after adding a new GPU
2. **Start minimal** - Add only validated configurations
3. **Document sources** - Note where warp tile combinations came from
4. **Keep in sync** - If using tile_engine, keep both updated
---
> **More info:** See [../README.md](../README.md) for full documentation.