mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-14 18:17:44 +00:00
b39049283ed79de84766a6d2a65b7002b47541f2
* add some instance to develop
* avoid bank conflicts for wrw for all instance
* add small K1 test
* delete some unused instance
* binding gemm k1 to conv n
* try using half_4 to do ds_read
* reset buffer load oob and ds memcpy to default option
* remove useless instances
* remove redandunt space
* remove printf code
* clang-format-10 change
* use fastest config
* fix clang format for the other files
* remove gemmk0 pad for output
* add gemmk padding macro
* add bank length computation
* add template to distinguish the instance that need lds padding for wrw
* use rocm5.1 as docker
* use integer value for GEMM test
* add Right padding macro
* add 2 test asm code
* using 256x256x32 tile size
* 1. move dedicated transform into gridwisegemm's head file. 2. make lds tensor params a struct templete. 3. remove useless code
* using small vec
* 256*128 kernel size for example
* remove asm files
* use a new gridwise gemm header for bwd-weight
* revert gridwise gemm v2r4r2
* change foramt
* reset gridwise gemm v2r4r2
* remove unused code
* revert instance file
* revert example instance
* format file
* remove macros
* resolve compile error
* rename wrw kernel invoker
* use gridwisegemm pipeline struct instead of implement run fucntion in the same header
Co-authored-by: Chao Liu <chao.liu2@amd.com>
[ROCm/composable_kernel commit: 070619fbf1]
Docker script
docker run \
-it \
--privileged \
--group-add sudo \
-w /root/workspace \
-v ${PATH_TO_LOCAL_WORKSPACE}:/root/workspace \
rocm/tensorflow:rocm4.3.1-tf2.6-dev \
/bin/bash
Build
mkdir build && cd build
# Need to specify target ID, example below is gfx908 and gfx90a
cmake \
-D BUILD_DEV=OFF \
-D CMAKE_BUILD_TYPE=Release \
-D CMAKE_CXX_FLAGS=" --offload-arch=gfx908 --offload-arch=gfx90a -O3" \
-D CMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc \
-D CMAKE_PREFIX_PATH=/opt/rocm \
..
Build and Run Examples
make -j examples
Instructions for running each individual examples are under example/
Tests
make -j tests
make test
Build ckProfiler
make -j ckProfiler
Instructions for running ckProfiler are under profiler/
Caveat
Kernel Timing and Verification
CK's own kernel timer will warn up kernel once, and then run it multiple times to get average kernel time. For some kernels that use atomic add, this will cause output buffer to be accumulated multiple times, causing verfication failure. To work around it, do not use CK's own timer and do verification at the same time. CK's own timer and verification in each example and ckProfiler can be enabled or disabled from command line.
Languages
C++
93.1%
Python
4.5%
CMake
1.5%
Shell
0.5%
Pawn
0.2%