[CK_TILE] Stream-K Gemm Example for fp8 and bf8 (#3041)

* Addition of streamk fp8 example for CK Tile

* Adding in bf8 streamk example in CK Tile

* Refactoring fp8/bf8 unit tests

Refactored the unit tests for fp8/bf8 to utilize the test harness.
Implemented smoke tests with layouts: CCR, CRR, RCR, RRR for fp8/bf8.
The tests are using 128x128x32 for the tile configuration, as other
configurations revealed implementation gaps that are currently being
documented.
This commit is contained in:
arai713
2025-10-27 19:29:03 -07:00
committed by GitHub
parent 7fc0a38e90
commit 715395bc86
17 changed files with 314 additions and 13 deletions

View File

@@ -75,6 +75,18 @@ struct DataTypeTraits<ck_tile::bf16_t>
static constexpr const char* name = "bf16";
};
template <>
struct DataTypeTraits<ck_tile::fp8_t>
{
static constexpr const char* name = "fp8";
};
template <>
struct DataTypeTraits<ck_tile::bf8_t>
{
static constexpr const char* name = "bf8";
};
auto create_args(int argc, char* argv[])
{
ck_tile::ArgParser arg_parser;
@@ -94,7 +106,7 @@ auto create_args(int argc, char* argv[])
.insert("stride_b", "0", "Tensor B stride")
.insert("stride_c", "0", "Tensor C stride")
.insert("v", "2", "0. No validation, 1. Validation on CPU, 2. Validation on GPU")
.insert("prec", "fp16", "data type. fp16/bf16")
.insert("prec", "fp16", "data type. fp16/bf16/fp8/bf8")
.insert("warmup", "50", "number of iterations before benchmarking the kernel")
.insert("repeat", "100", "number of iterations to benchmark the kernel")
.insert("timer", "gpu", "gpu:gpu timer, cpu:cpu timer")