I've only verfied that the kernel compiles.
Some of my choices, like float32 types and having the epilogue set the member, are not valid template parameters. I now have this indentical to
a default GEMM universal kernel.
I also fixed some other small logical mistakes I made.
The code currently outputs the GetName results for some of the classes:
```
Kernel name: gemm_bf16_pipeline_AgBgCrCompV3_16x64x128_256_1x4_0x0x0
Shape: tile_gemm_shape_16x64x128x4_1x4x1_16x16x32
Problem: gemm_problem_256_0x0x0_Intrawave
Pipeline: pipeline_AgBgCrCompV3_16x64x128_256_1x4_0x0x0
```