mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-12 09:16:52 +00:00

Files

Anthony Chang c20a75b07d Fused GEMM+GEMM (#351 )

* initial stub for gemm_gemm_xdl_cshuffle

* set up example code

* compiles

* prevent integer overflow

* harmonize interface between ref_gemm and ref_batched_gemm

* batched_gemm_gemm

* fix example

* host tensor gen: diagonal pattern in lowest two-dimensions only

* make c descriptors containing only integral constants

* clean up

* add BlockwiseGemmXdlops_v2 while exploring an unified approach

* implement proper interface

* tidy up example

* fix compilation warnings

* coarsely controlled 2nd gemm padding

* remove rocm-cmake's hard requirement for certain revision

* clang-format

* resolve merge conflict

* fix compilation error on gfx10

* adds acc0 elementwise op to interface

* add gemm_gemm instances and tests

* avoid LDS data hazard

* fix build

Co-authored-by: Chao Liu <chao.liu2@amd.com>

2022-08-13 09:18:58 -05:00

CMakeLists.txt

Skip lds of b matrix (#326 )

2022-08-13 01:35:49 -05:00

gemm_dl_fp16.cpp

Clean up conv example, Instances, profiler and test (#324 )

2022-07-29 18:19:25 -05:00

gemm_dl_fp32.cpp

Clean up conv example, Instances, profiler and test (#324 )

2022-07-29 18:19:25 -05:00

gemm_dl_int8.cpp

Clean up conv example, Instances, profiler and test (#324 )

2022-07-29 18:19:25 -05:00

gemm_xdl_bf16.cpp

Clean up conv example, Instances, profiler and test (#324 )

2022-07-29 18:19:25 -05:00

gemm_xdl_fp16.cpp

Clean up conv example, Instances, profiler and test (#324 )

2022-07-29 18:19:25 -05:00

gemm_xdl_fp64.cpp

Clean up conv example, Instances, profiler and test (#324 )

2022-07-29 18:19:25 -05:00

gemm_xdl_int8.cpp

Clean up conv example, Instances, profiler and test (#324 )

2022-07-29 18:19:25 -05:00

gemm_xdl_skip_b_lds_fp16.cpp

Fused GEMM+GEMM (#351 )

2022-08-13 09:18:58 -05:00

README.md

Compile for gfx908 and gfx90a (#130 )

2022-03-31 12:33:34 -05:00

README.md

Instructions for `example_gemm_xdl`

Run `example_gemm_xdl`

#arg1: verification (0=no, 1=yes)
#arg2: initialization (0=no init, 1=integer value, 2=decimal value)
#arg3: run kernel # of times (>1)
./bin/example_gemm_xdl 0 1 5

Result (MI100 @ 1087Mhz, 133.5TFlops peak FP16)

a_m_k: dim 2, lengths {3840, 4096}, strides {4096, 1}
b_k_n: dim 2, lengths {4096, 4096}, strides {1, 4096}
c_m_n: dim 2, lengths {3840, 4096}, strides {4096, 1}
arg.a_grid_desc_k0_m_k1_{512, 3840, 8}
arg.b_grid_desc_k0_n_k1_{512, 4096, 8}
arg.c_grid_desc_m_n_{ 3840, 4096}
launch_and_time_kernel: grid_dim {480, 1, 1}, block_dim {256, 1, 1}
Warm up
Start running 5 times...
Perf: 1.19685 ms, 107.657 TFlops, 78.8501 GB/s

README.md

Instructions for example_gemm_xdl

Run example_gemm_xdl

Instructions for `example_gemm_xdl`

Run `example_gemm_xdl`