mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-03 13:11:25 +00:00
* WIP POC of dispatcher * Dispatcher python workflow setup. * Dispatcher cleanup and updates. Further dispatcher cleanup and updates. Build fixes Improvements and python to CK example Improvements to readme * Fixes to python paths * Cleaning up code * Improving dispatcher support for different arch Fixing typos * Fix formatting errors * Cleaning up examples * Improving codegeneration * Improving and fixing C++ examples * Adding conv functionality (fwd,bwd,bwdw) and examples. * Fixes based on feedback. * Further fixes based on feedback. * Adding stress test for autogeneration and autocorrection, and fixing preshuffle bug. * Another round of improvements based on feedback. * Trimming out unnecessary code. * Fixing the multi-D implementation. * Using gpu verification for gemms and fixing convolutions tflops calculation. * Fix counter usage issue and arch filtering per ops. * Adding changelog and other fixes. * Improve examples and resolve critical bugs. * Reduce build time for python examples. * Fixing minor bug. * Fix compilation error. * Improve installation instructions for dispatcher. * Add docker based installation instructions for dispatcher. * Fixing arch-based filtering to match tile engine. * Remove dead code and fix arch filtering. * Minor bugfix. * Updates after rebase. * Trimming code. * Fix copyright headers. * Consolidate examples, cut down code. * Minor fixes. * Improving python examples. * Update readmes. * Remove conv functionality. * Cleanup following conv removable.
198 lines
4.7 KiB
Markdown
198 lines
4.7 KiB
Markdown
# Adding New GPU Architecture Support
|
|
|
|
Guide for adding support for a new AMD GPU architecture to the CK Tile Dispatcher.
|
|
|
|
> **See also:** [Main Dispatcher README](../README.md) | [Codegen README](README.md)
|
|
|
|
## Overview
|
|
|
|
The dispatcher uses `arch_specs.json` as the **single source of truth** for GPU specifications:
|
|
|
|
```
|
|
arch_specs.json → generate_arch_specs.py → arch_specs_generated.py (Python)
|
|
→ arch_specs_generated.hpp (C++)
|
|
```
|
|
|
|
## Quick Start
|
|
|
|
```bash
|
|
# 1. Edit arch_specs.json
|
|
# 2. Run generator
|
|
python generate_arch_specs.py
|
|
# 3. Rebuild
|
|
cd ../build && cmake --build . -j8
|
|
# 4. Test
|
|
ctest
|
|
```
|
|
|
|
## Step-by-Step Guide
|
|
|
|
### Step 1: Edit arch_specs.json
|
|
|
|
Add new architecture under `"architectures"`:
|
|
|
|
```json
|
|
{
|
|
"architectures": {
|
|
"gfx1100": {
|
|
"family": "rdna3",
|
|
"description": "AMD Radeon RX 7000 series (RDNA3)",
|
|
"warp_size": 32,
|
|
"lds_capacity_kb": 64,
|
|
"warp_configs": [
|
|
[2, 4, 1],
|
|
[4, 2, 1]
|
|
],
|
|
"warp_tile_combos": {
|
|
"fp16_fp16_fp16": [[16, 16, 16], [32, 32, 16]],
|
|
"bf16_bf16_bf16": [[16, 16, 16], [32, 32, 16]]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### Step 2: Configuration Fields
|
|
|
|
| Field | Description | Example |
|
|
|-------|-------------|---------|
|
|
| `family` | GPU family | `"cdna3"`, `"rdna4"` |
|
|
| `description` | Human-readable name | `"AMD Instinct MI300"` |
|
|
| `warp_size` | Wave/warp size | `64` (CDNA), `32` (RDNA) |
|
|
| `lds_capacity_kb` | LDS memory in KB | `64` |
|
|
| `warp_configs` | Valid `[warp_m, warp_n, warp_k]` | `[[2,2,1], [4,4,1]]` |
|
|
| `warp_tile_combos` | Warp tiles per dtype | See below |
|
|
|
|
### Step 3: Warp Tile Combinations
|
|
|
|
Map data type combinations to valid warp tile sizes:
|
|
|
|
```json
|
|
"warp_tile_combos": {
|
|
"fp16_fp16_fp16": [[32, 32, 8], [16, 16, 16], [32, 32, 16]],
|
|
"bf16_bf16_bf16": [[32, 32, 8], [16, 16, 16]],
|
|
"fp8_fp8_fp16": [[32, 32, 16], [32, 32, 32]],
|
|
"int8_int8_int32": [[16, 16, 32], [32, 32, 16]]
|
|
}
|
|
```
|
|
|
|
Key format: `{A_dtype}_{B_dtype}_{C_dtype}`
|
|
|
|
### Step 4: Run Generator
|
|
|
|
```bash
|
|
cd dispatcher/codegen
|
|
python generate_arch_specs.py
|
|
```
|
|
|
|
This generates:
|
|
- `arch_specs_generated.py` (Python module)
|
|
- `../include/ck_tile/dispatcher/arch_specs_generated.hpp` (C++ header)
|
|
|
|
### Step 5: Rebuild and Test
|
|
|
|
```bash
|
|
cd ../build
|
|
cmake --build . -j8
|
|
ctest --output-on-failure
|
|
```
|
|
|
|
### Step 6: Verify
|
|
|
|
```python
|
|
from arch_filter import ArchFilter
|
|
|
|
filter = ArchFilter("gfx1100")
|
|
is_valid = filter.is_kernel_valid(
|
|
datatype_a="fp16", datatype_b="fp16", datatype_c="fp16",
|
|
tile_m=128, tile_n=128, tile_k=32,
|
|
warp_m=2, warp_n=2, warp_k=1,
|
|
warp_tile_m=16, warp_tile_n=16, warp_tile_k=16
|
|
)
|
|
print(f"Valid: {is_valid}")
|
|
```
|
|
|
|
## Reference
|
|
|
|
### Supported Data Types
|
|
|
|
| Key | Description |
|
|
|-----|-------------|
|
|
| `fp16` | Half precision (16-bit) |
|
|
| `bf16` | Brain float 16 |
|
|
| `fp32` | Single precision (32-bit) |
|
|
| `fp64` | Double precision (64-bit) |
|
|
| `fp8` | 8-bit float (E4M3) |
|
|
| `bf8` | 8-bit brain float (E5M2) |
|
|
| `int8` | 8-bit integer |
|
|
| `int4` | 4-bit integer |
|
|
|
|
### GPU Families
|
|
|
|
| Family | Description |
|
|
|--------|-------------|
|
|
| `cdna2` | MI200 series (gfx90a) |
|
|
| `cdna3` | MI300 series (gfx942) |
|
|
| `cdna4` | MI350 series (gfx950) |
|
|
| `rdna3` | RX 7000 series (gfx1100) |
|
|
| `rdna4` | RX 9000 series (gfx1201) |
|
|
|
|
### Pipeline LDS Limits
|
|
|
|
| Pipeline | LDS Limit |
|
|
|----------|-----------|
|
|
| `compv4` | 32 KB |
|
|
| `preshufflev2` | 32 KB |
|
|
| `default` | 64 KB |
|
|
|
|
## Troubleshooting
|
|
|
|
### "Unknown GPU architecture"
|
|
|
|
1. Check architecture key matches exactly (e.g., `"gfx942"` not `"GFX942"`)
|
|
2. Verify you ran `generate_arch_specs.py`
|
|
3. Rebuild C++ code
|
|
|
|
### Kernels being rejected
|
|
|
|
```python
|
|
from arch_filter import ArchFilter, KernelConfig
|
|
|
|
filter = ArchFilter("gfx942")
|
|
result = filter.validate_kernel(config)
|
|
print(f"Valid: {result.valid}")
|
|
for error in result.errors:
|
|
print(f" Error: {error}")
|
|
```
|
|
|
|
### Missing warp tile combination
|
|
|
|
1. Check `warp_tile_combos` in `arch_specs.json`
|
|
2. Ensure `[warp_tile_m, warp_tile_n, warp_tile_k]` is in the list
|
|
3. Verify data type key format
|
|
|
|
## File Structure
|
|
|
|
```
|
|
codegen/
|
|
├── arch_specs.json # Single source of truth (EDIT THIS)
|
|
├── generate_arch_specs.py # Generator script
|
|
├── arch_specs_generated.py # Generated Python module
|
|
└── ADDING_NEW_GPU.md # This file
|
|
|
|
include/ck_tile/dispatcher/
|
|
├── arch_specs_generated.hpp # Generated C++ header
|
|
└── arch_filter.hpp # C++ filter
|
|
```
|
|
|
|
## Best Practices
|
|
|
|
1. **Test thoroughly** - Run all tests after adding a new GPU
|
|
2. **Start minimal** - Add only validated configurations
|
|
3. **Document sources** - Note where warp tile combinations came from
|
|
4. **Keep in sync** - If using tile_engine, keep both updated
|
|
|
|
---
|
|
|
|
> **More info:** See [../README.md](../README.md) for full documentation.
|