mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-14 10:09:41 +00:00
* tf32:bf16x3:use bf16x3 emulate tf32 gemm
* change blockwiseGemm to demo bf16x3
* temp push
* self review
* self review
* fix multi-device compile error
* bug fix
* code refactor
* limit to gfx950
* enhance gemm gfx942 threshold
* lower change from blockwise to warpwise
* refact codes
* refact codes
* error fix
* change threshold
* bug fix
* fix threshold error
* change host reference implement to same as device
* bug fix
* bug fix
* code refact
* fix clang-format fail
* code refine
[ROCm/composable_kernel commit: 2a73eb3bc0]