mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-18 12:00:07 +00:00

Files

Haocong WANG 789c15d703 [Navi3x] Add Device Operations (#567 )

* wmma_op + unit test

* add arch limitation to wmma test

* change arch limitation

* Refactor + Add all type unit test(int4 compile failed)

* Add f32_16x16x16_bf16 unit test

* tempsave

* tempsave

* tempsave

* runtime bug, cannot find symbol

* workaround for incorrect HIP warpSize return value

* debugging

* tempsave

* Correctness OK, waiting for optimization

* Tidy up + format

* temp save

* temp save, reproduce the v_bfi_b32 issue

* add inline asm for wmmaop test

* tidy up

* clean some debug purpose code

* discard some codes

* clang format

* clang format

* compiler issue fixed + increase tile size

* navi3x_multipleD+example

* temp save

* workable

* batchedgemm[OK], groupconv[debug]

* groupconv: Sanity check[OK], Performance[Bad]

* navi3x_groupconv_need_optimization

* format

* Add arch limitation to all wmma examples

* fix bug: example30 input conv args

[ROCm/composable_kernel commit: 0cfda84d05]

2023-02-15 11:50:51 -06:00

CMakeLists.txt

[Navi3x] Add Device Operations (#567 )

2023-02-15 11:50:51 -06:00

gemm_bilinear_wmma_fp16.cpp

[Navi3x] Add Device Operations (#567 )

2023-02-15 11:50:51 -06:00

gemm_bilinear_xdl_fp16.cpp

Rangify constructor of HostTensorDescriptor & Tensor<> (#445 )

2022-11-11 11:36:01 -06:00

README.md

Gemm+Bilinear (#316 )

2022-07-02 09:15:38 -05:00

README.md

Instructions for `example_gemm_bilinear_xdl_fp16`

Run `example_gemm_bilinear_xdl_fp16`

#arg1: verification (0=no, 1=yes)
#arg2: initialization (0=no init, 1=integer value, 2=decimal value)
#arg3: time kernel (0=no, 1=yes)
#arg4 to 10: M (256x), N(128x), K(32x), StrideA, StrideB, StrideD, StrideE
#arg11 to 12: alpha, beta
./bin/example_gemm_bilinear_xdl_fp16 1 1 1 3840 4096 4096 4096 4096 4096 4096 0.5 0.5

Result (MI100 @ 1502Mhz, 184.6TFlops peak FP16)

a_m_k: dim 2, lengths {3840, 4096}, strides {4096, 1}
b_k_n: dim 2, lengths {4096, 4096}, strides {1, 4096}
c0_m_n: dim 2, lengths {3840, 4096}, strides {4096, 1}
c_m_n: dim 2, lengths {3840, 4096}, strides {4096, 1}
arg.a_grid_desc_k0_m_k1_{512, 3840, 8}
arg.b_grid_desc_k0_n_k1_{512, 4096, 8}
arg.c0_grid_desc_m_n_{ 3840, 4096}
arg.c_grid_desc_m_n_{ 3840, 4096}
launch_and_time_kernel: grid_dim {480, 1, 1}, block_dim {256, 1, 1}
Warm up
Start running 1 times...
Perf: 0.936965 ms, 137.517 TFlops, 102.959 GB/s
error: 0
max_diff: 0, 558.5, 558.5

README.md

Instructions for example_gemm_bilinear_xdl_fp16

Run example_gemm_bilinear_xdl_fp16

Instructions for `example_gemm_bilinear_xdl_fp16`

Run `example_gemm_bilinear_xdl_fp16`