mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-19 20:40:07 +00:00

Files

rocking5566 4a7d8df85a Gemm_c_shuffle (4 layouts) X (fp32 bf16 int8) (#131 )

* [What] Separate fixpoint gemm from gemm example
[Why] let example of gemm_int8 be pure gemm.
[What]
1. Add gemm_requant_relu_requant,
2. Let CDataType be int32 in pure gemm, because no one use int8 CDataType. It is also part of gemm_requant_relu_requant

* Fix path

* Revise cmakelist due to merge develop

* Add gemm fp16 test

* Extract PrepareGemmTensor

* Extract TestGemm

* Add test for different layout

* Add 4 layouts of shuffle version of fp32

* Add 4 layouts of shuffle version of int8

* Add 4 layouts of shuffle version of bf16

* replace all DeviceGemmPtr_ with DeviceGemmNoOpPtr to fit naming convension

* Add test for non-shuffle verstion of gemm

* Fix typo

* Print kernel information

* Add rest of the fp32 kernel to the test

* 1. Add rest of the fp16 device iop.
2. Mark the invalid device operation

Co-authored-by: rocking <chunylai@amd.com>

[ROCm/composable_kernel commit: 485ea46a40]

2022-03-21 15:59:51 -05:00

CMakeLists.txt

Reorganize files, Part 1 (#119 )

2022-03-08 21:46:36 -06:00

gemm_xdl_bias_relu.cpp

Gemm_c_shuffle (4 layouts) X (fp32 bf16 int8) (#131 )

2022-03-21 15:59:51 -05:00

README.md

Reorganize files, Part 1 (#119 )

2022-03-08 21:46:36 -06:00

README.md

Instructions for `gemm_xdl_bias_relu_add` Example

Docker script

docker run                                                                   \
-it                                                                          \
--rm                                                                         \
--privileged                                                                 \
--group-add sudo                                                             \
-w /root/workspace                                                           \
-v ${PATH_TO_LOCAL_WORKSPACE}:/root/workspace                                \
rocm/tensorflow:rocm4.3.1-tf2.6-dev                                          \
/bin/bash

Build `gemm_xdl_bias_relu_add`

mkdir build && cd build

# Need to specify target ID, example below is gfx908
cmake                                                                  \
-D BUILD_DEV=OFF                                                       \
-D CMAKE_BUILD_TYPE=Release                                            \
-D CMAKE_CXX_FLAGS="-DCK_AMD_GPU_GFX908 --amdgpu-target=gfx908 -O3 "   \
-D CMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc                              \
-D CMAKE_PREFIX_PATH=/opt/rocm                                         \
..

 make -j gemm_xdl_bias_relu_add

Run `gemm_xdl_bias_relu_add`

#arg1: verification (0=no, 1=yes)
#arg2: initialization (0=no init, 1=integer value, 2=decimal value)
#arg3: run kernel # of times (>1)
#arg4 to 9: M (256x), N(128x), K(32x), StrideA, StrideB, StrideC
./example/gemm_xdl_bias_relu_add 0 1 5 3840 4096 4096 4096 4096 4096

Result (MI100 @ 1087Mhz, 133.5TFlops peak FP16)

a_m_k: dim 2, lengths {3840, 4096}, strides {4096, 1}
b_k_n: dim 2, lengths {4096, 4096}, strides {1, 4096}
c_m_n: dim 2, lengths {3840, 4096}, strides {4096, 1}
c0_m_n: dim 2, lengths {3840, 4096}, strides {4096, 1}
c1_m_n: dim 2, lengths {3840, 4096}, strides {1, 0}
arg.a_grid_desc_k0_m_k1_{512, 3840, 8}
arg.b_grid_desc_k0_n_k1_{512, 4096, 8}
arg.c_grid_desc_m_n_{ 3840, 4096}
arg.c0_grid_desc_m_n_{ 3840, 4096}
arg.c1_grid_desc_m_n_{ 3840, 4096}
launch_and_time_kernel: grid_dim {480, 1, 1}, block_dim {256, 1, 1}
Warm up
Start running 5 times...
Perf: 1.27583 ms, 100.992 TFlops, 73.9688 GB/s

README.md

Instructions for gemm_xdl_bias_relu_add Example

Docker script

Build gemm_xdl_bias_relu_add

Run gemm_xdl_bias_relu_add

Instructions for `gemm_xdl_bias_relu_add` Example

Build `gemm_xdl_bias_relu_add`

Run `gemm_xdl_bias_relu_add`