Without this DataType = unknown -
``` sh
Run Flatmm kernel with DataType = unknown M =1280 N =16384 K =1024 StrideA =1024 StrideB =1024 StrideC =16384 : 0.228837 ms, 187.687 TFlops, 341.374 GB/s,
```
after this change
```sh
Run Flatmm kernel with DataType = bf16 M =1280 N =16384 K =1024 StrideA =1024 StrideB =1024 StrideC =16384 : 0.227029 ms, 189.181 TFlops, 344.092 GB/s,
```