[CK_TILE] Stream-K Tile Engine Test Config File Generation (#3662)

* Stream-K smoke test config file generation

This change converts the stream-k smoke tests to use tile engine. Since
the m, n, and k values dependent on the CU count of a device, the
configs are generated during the Configuration Phase.

* Compute GEMM reference on GPU

* Remove redundant Stream-K tests

Removing redundant tests that are now run via tile engine.

* Fix relative and absolute tolerance calculation

This change updates the Stream-K tile engine interface to ensure that
num_wgs_per_tile is propaged and passed into the compare_results
function to calculate the rel and abs tolerance. Before, split-k was
used, which is incorrect for Stream-K since the split-k value is
always 1.

* Cleanup imports, types, and other misc items

This commit makes the following changes:
- Uses Typing module for nested type hints
- Uses quotes around cu_count_arg argument in generate_configs.cmake in
  if statements
- Adds explicit include for tuple in test_gemm_streamk_simple.cpp
- Adds a type for the tiles argument in argparser to check argument
  validity

* Use CU count as return value for better parsing

* Add reduction tests for bf16, fp8, and bf8
This commit is contained in:
Emily Martins
2026-02-03 09:12:15 -07:00
committed by GitHub
parent 3f04d27b68
commit 8cbd09c84a
22 changed files with 522 additions and 406 deletions

View File

@@ -34,17 +34,25 @@ Each test configuration can specify optimized problem sizes in its JSON file:
The key idea: **Unit tests that use tile_engine's exact kernel generation and verification methodology** instead of creating separate test infrastructure.
## Test Configurations
Test configs are generated during the Generation Phase. They are stored under the build directory at test/ck_tile/gemm_streamk_tile_engine/configs. The Compute Unit (CU) count of the device is required to generate the configs. If the Generation Phase occurs on a machine without a GPU or does not contain same GPU architecture on which you will run the tests, you can manually set the CU count using the `CU_COUNT` option:
```bash
# Assuming you are at the root of the repo
cd build
../script/cmake-ck-dev.sh .. gfx90a -G Ninja -DCU_COUNT=100
```
You can reference the public whitepaper for your specific GPU to get the appropriate CU count.
If no `CU_COUNT` option is given and no HIP device is found, then the default value of 100 CUs will be used to determine the problem sizes tested.
### 1. **Simple Test** (`simple_test_config.json`)
- **Purpose**: Basic functionality validation for fp16/bf16 data types
- **Config**: 128x128x32, warp 2x2x1, warp_tile 32x32x16
### 1. **Smoke Tests**
- **Purpose**: Basic functionality validation for fp16/bf16/fp8/bf8 data types
- **Config**: 256x256x32 (for bf16/fp16) or 128x128x32 (for bf8/fp8), warp 2x2x1, warp_tile 32x32x16
- **Traits**: compv3 pipeline only
- **Coverage**: All 4 layouts (rcr, rrr, ccr, crr) for fp16, bf16
- **Coverage**: All 4 layouts (rcr, rrr, ccr, crr)
## Data Type Support
-**fp16, bf16**: Fully supported - all layouts (rcr, rrr, ccr, crr)
-**fp16, bf16, fp8, bf8**: Fully supported - all layouts (rcr, rrr, ccr, crr)
-**fp64**: Not supported (hardware MFMA limitation)
-**fp32, bf8, pk-int4-t**: Not yet supported by gemm_instance_builder (will be added later)
-**fp32, pk-int4-t**: Not yet supported by gemm_instance_builder (will be added later)
## Test Result Behavior