[CK_TILE] Stream-K Gemm Example for fp8 and bf8 (#3041)

* Addition of streamk fp8 example for CK Tile

* Adding in bf8 streamk example in CK Tile

* Refactoring fp8/bf8 unit tests

Refactored the unit tests for fp8/bf8 to utilize the test harness.
Implemented smoke tests with layouts: CCR, CRR, RCR, RRR for fp8/bf8.
The tests are using 128x128x32 for the tile configuration, as other
configurations revealed implementation gaps that are currently being
documented.
This commit is contained in:
arai713
2025-10-27 19:29:03 -07:00
committed by GitHub
parent 7fc0a38e90
commit 715395bc86
17 changed files with 314 additions and 13 deletions

View File

@@ -28,10 +28,10 @@ args:
-stride_b tensor B stride (default:0)
-stride_c tensor C stride (default:0)
-v validation strategy. 0. No validation, 1. Validation on CPU, 2. Validation on GPU (default:1)
-prec data type. fp16/bf16 (default:fp16)
-prec data type. fp16/bf16/fp8/bf8 (default:fp16)
-warmup number of iterations before benchmarking the kernel (default:50)
-repeat number of iterations to benchmark the kernel (default:100)
-timer timing mode. gpu:gpu timer, cpu:cpu timer (default:gpu)
-init data initialization strategy. 0:random, 1:linear, 2:constant(1) (default:0)
-flush_cache flush the cache before running the kernel (default:true)
```
```