* add bf16 gemms for gfx11/gfx12
* reduce the input values in test_gemm
* add int8 wmma gemm instances for gfx11/gfx12
* add example gemm_wmma_int8
* fix bug in gemm_wmma_int8 test
* increase bf16 gemm test tolerance
* update the dates and clean-up commented-out instances
[ROCm/composable_kernel commit: 8aba2724cc]
* add support for navi2x and navi3x models
* fix syntax
* use common macro for different mi300 architectures
[ROCm/composable_kernel commit: 180f16f9ac]
* wmma_op + unit test
* add arch limitation to wmma test
* change arch limitation
* Refactor + Add all type unit test(int4 compile failed)
* Add f32_16x16x16_bf16 unit test
* tempsave
* tempsave
* tempsave
* runtime bug, cannot find symbol
* workaround for incorrect HIP warpSize return value
* debugging
* tempsave
* Correctness OK, waiting for optimization
* Tidy up + format
* temp save
* temp save, reproduce the v_bfi_b32 issue
* add inline asm for wmmaop test
* tidy up
* clean some debug purpose code
* discard some codes
* clang format
* clang format
* compiler issue fixed + increase tile size
[ROCm/composable_kernel commit: 919aeb1f52]
* wmma_op + unit test
* add arch limitation to wmma test
* change arch limitation
* Refactor + Add all type unit test(int4 compile failed)
* Add f32_16x16x16_bf16 unit test
* Remote int4 related
* delete deprecated test
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>
[ROCm/composable_kernel commit: abf9cc6c5c]