diff --git a/CITATION.cff b/CITATION.cff new file mode 100644 index 0000000000..d35fe9e587 --- /dev/null +++ b/CITATION.cff @@ -0,0 +1,67 @@ +cff-version: 1.2.0 +title: Composable Kernel +message: If you use this software, please cite using the following metadata. +type: software +authors: + - given-names: Chao + family-names: Liu + email: chao.liu2@amd.com + affiliation: AMD + - given-names: Jing + family-names: Zhang + email: jing.zhang3@amd.com + affiliation: AMD + - given-names: Letao + family-names: Qin + email: letao.qin@amd.com + affiliation: AMD + - given-names: Qianfeng + family-names: Zhang + email: qianfeng.zhang@amd.com + affiliation: AMD + - given-names: Liang + family-names: Huang + email: carlus.huang@amd.com + affiliation: AMD + - given-names: Shaojie + family-names: Wang + email: shaojie.wang@amd.com + affiliation: AMD + - given-names: Anthony + family-names: Chang + email: antc@amd.com + affiliation: AMD + - given-names: Chunyu + family-names: Lai + email: chunyu.lai@amd.com + affiliation: AMD + - given-names: Illia + family-names: Silin + email: illia.silin@amd.com + affiliation: AMD + - given-names: Adam + family-names: Osewski + email: adam.osewski@amd.com + affiliation: AMD + - given-names: Poyen + family-names: Chen + email: poyen.chen@amd.com + affiliation: AMD + - given-names: Rosty + family-names: Geyyer + email: rosty.geyyer@amd.com + affiliation: AMD + - given-names: Hanwen + family-names: Chen + - given-names: Tejash + family-names: Shah + - given-names: Xiaoyan + family-names: Zhou + - given-names: Jianfeng + family-names: Yan +repository-code: 'https://github.com/ROCmSoftwarePlatform/composable_kernel' +abstract: Composable Kernel (CK) library aims to provide a programming model for writing performance critical kernels for Machine Learning workloads across multiple architectures including GPUs, CPUs, etc, through general purpose kernel progarmming languages, like HIP C++. +keywords: + - 'CK, Composable Kernel, Tensor Coordinate Transformation' +license: MIT +license-url: https://github.com/ROCmSoftwarePlatform/composable_kernel/blob/7fc3ed761aa35709d87c8fbbe41dd368648b3541/LICENSE diff --git a/CONTRIBUTORS.md b/CONTRIBUTORS.md new file mode 100644 index 0000000000..fc5f856be9 --- /dev/null +++ b/CONTRIBUTORS.md @@ -0,0 +1,26 @@ + +# Developers +[Chao Liu](https://github.com/asroy), [Jing Zhang](https://github.com/zjing14), 2018-2022 + +[Letao Qin](https://github.com/ltqin), [Qianfeng Zhang](https://github.com/qianfengz), [Liang Huang](https://github.com/carlushuang), [Shaojie Wang](https://github.com/shaojiewang), 2019-2022 + +[Anthony Chang](https://github.com/rosenrodt), [Chunyu Lai](https://github.com/rocking5566), [Illia Silin](https://github.com/illsilin), [Adam Osewski](https://github.com/aosewski), [Poyen Chen](https://github.com/poyenc), [Rosty Geyyer](https://github.com/geyyer), 2022 + +Hanwen Chang, 2019-2021, + +Tejash Shah, 2019-2020 + +Xiaoyan Zhou, 2020 + +[Jianfeng Yan](https://github.com/j4yan), 2021-2022 + + +# Product Manager +[Jun Liu](https://github.com/junliume) + +# Contributors +[Dan Yao](https://github.com/danyao12), [Guangzhao Lu](https://github.com/guangzlu), [Raman Jana](https://github.com/ramjana), [Jehandad Khan](https://github.com/JehandadKhan) + +# Acknowledgement +CK team works closely with Meta [AITemplate](???to.be.added???) team ([Bing Xu](https://github.com/antinucleon), Ying Zhang, etc). Most of the lucrative graph optimization opportunities in ML models were identified by AITemplate team, and we also co-designed many high performance fused kernels for AMD GPUs. Without this collaboration, CK would not reach its current potential. + diff --git a/README.md b/README.md index bbc4d2bc30..f8009f55c1 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,43 @@ -## Docker script +# Composable Kernel + +## Methodology +Composable Kernel (CK) library aims to provide a programming model for writing performance critical kernels for Machine Learning workloads across multiple architectures including GPUs, CPUs, etc, through general purpose kernel languages, like HIP C++. + +CK utilizes two concepts to achieve performance portabilatity and code maintainbility: +* A tile-based programming model +* Algorithm complexity reduction for complex ML operators, using innovative technique we call "Tensor Coordinate Transformation". + +![ALT](/doc/image/ck_component.png "CK Components") + +## Code Structure +Current CK library are structured into 4 layers: +* "Templated Tile Operators" +* "Templated Kernel and Invoker" layer +* "Instantiated Kernel and Invoker" layer +* "Client API" layer + +![ALT](/doc/image/ck_layer.png "CK Layers") + +## Contributors +The list of developers and contributors is here: [Contributors](/CONTRIBUTORS.md) + +## Citation +If you use CK, please use following citations: +* CK paper will be freely available on arXiv soon: [Realizing Tensor Operators Using Coordinate Transformations and Tile Based Programming](???) +* [CITATION.cff](/CITATION.cff) + +## License +CK is released under the MIT license. [License File](/LICENSE) + + +# Build CK + +## Build docker image +```bash +DOCKER_BUILDKIT=1 docker build -t ck:latest -f Dockerfile . +``` + +## Launch docker ```bash docker run \ -it \ @@ -6,47 +45,38 @@ docker run \ --group-add sudo \ -w /root/workspace \ -v ${PATH_TO_LOCAL_WORKSPACE}:/root/workspace \ -rocm/tensorflow:rocm5.1-tf2.6-dev \ +ck:latest \ /bin/bash ``` -# Install newer version of rocm-cmake -https://github.com/RadeonOpenCompute/rocm-cmake - -## Build +## Build CK ```bash mkdir build && cd build -``` -```bash -# Need to specify target ID, example below is gfx908 and gfx90a -cmake \ --D BUILD_DEV=OFF \ --D CMAKE_BUILD_TYPE=Release \ --D CMAKE_CXX_FLAGS=" --offload-arch=gfx908 --offload-arch=gfx90a -O3" \ --D CMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc \ --D CMAKE_PREFIX_PATH=/opt/rocm \ --D CMAKE_INSTALL_PREFIX=${PATH_TO_CK_INSTALL_DIRECTORY} \ +# Need to specify target ID, example below is for gfx908 and gfx90a +cmake \ +-D CMAKE_PREFIX_PATH=/opt/rocm \ +-D CMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc \ +-D CMAKE_CXX_FLAGS="-O3" \ +-D CMAKE_BUILD_TYPE=Release \ +-D GPU_TARGETS=gfx908;gfx90a \ .. ``` -### Build and Run Examples -```bash - make -j examples -``` -Instructions for running each individual examples are under ```example/``` - -## Tests +### Build examples and tests ```bash make -j examples tests make test ``` +Instructions for running each individual examples are under [example](/example) + + ## Build ckProfiler ```bash make -j ckProfiler ``` -Instructions for running ckProfiler are under ```profiler/``` +Instructions for running ckProfiler are under [profiler](/profiler) ## Install CK ```bash @@ -54,7 +84,7 @@ make install ``` ## Using CK as pre-built kernel library -Instructions for using CK as a pre-built kernel library are under ```client_example/``` +Instructions for using CK as a pre-built kernel library are under [client_example](/client_example) ## Caveat ### Kernel Timing and Verification diff --git a/doc/image/ck_component.png b/doc/image/ck_component.png new file mode 100644 index 0000000000..db892331d7 Binary files /dev/null and b/doc/image/ck_component.png differ diff --git a/doc/image/ck_layer.png b/doc/image/ck_layer.png new file mode 100644 index 0000000000..117a1b3a0e Binary files /dev/null and b/doc/image/ck_layer.png differ diff --git a/script/cmake-ck-dev.sh b/script/cmake-ck-dev.sh new file mode 100755 index 0000000000..f5a08204c4 --- /dev/null +++ b/script/cmake-ck-dev.sh @@ -0,0 +1,19 @@ +#!/bin/bash +rm -f CMakeCache.txt +rm -f *.cmake +rm -rf CMakeFiles + +MY_PROJECT_SOURCE=$1 + +cmake \ +-D CMAKE_PREFIX_PATH=/opt/rocm \ +-D CMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc \ +-D CMAKE_CXX_FLAGS="-O3 -ftemplate-backtrace-limit=0 -gline-tables-only -save-temps=$PWD" \ +-D CMAKE_BUILD_TYPE=Release \ +-D BUILD_DEV=ON \ +-D GPU_TARGETS=gfx908;gfx90a \ +-D CMAKE_VERBOSE_MAKEFILE:BOOL=ON \ +-D USE_BITINT_EXTENSION_INT4=OFF \ +${MY_PROJECT_SOURCE} + +#-D AMDGPU_TARGETS=gfx90a;gfx908 diff --git a/script/cmake-ck-release.sh b/script/cmake-ck-release.sh new file mode 100755 index 0000000000..a583cc35ed --- /dev/null +++ b/script/cmake-ck-release.sh @@ -0,0 +1,19 @@ +#!/bin/bash +rm -f CMakeCache.txt +rm -f *.cmake +rm -rf CMakeFiles + +MY_PROJECT_SOURCE=$1 + +cmake \ +-D CMAKE_PREFIX_PATH=/opt/rocm \ +-D CMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc \ +-D CMAKE_CXX_FLAGS="-O3" \ +-D CMAKE_BUILD_TYPE=Release \ +-D BUILD_DEV=OFF \ +-D GPU_TARGETS=gfx908;gfx90a \ +-D CMAKE_VERBOSE_MAKEFILE:BOOL=ON \ +-D USE_BITINT_EXTENSION_INT4=OFF \ +${MY_PROJECT_SOURCE} + +#-D AMDGPU_TARGETS=gfx90a;gfx908 diff --git a/script/cmake-rocm.sh b/script/cmake-rocm.sh deleted file mode 100755 index 86b6236896..0000000000 --- a/script/cmake-rocm.sh +++ /dev/null @@ -1,20 +0,0 @@ -#!/bin/bash -rm -f CMakeCache.txt -rm -f *.cmake -rm -rf CMakeFiles - -MY_PROJECT_SOURCE=../ -MY_PROJECT_INSTALL=../install.dir - -cmake \ --D CMAKE_INSTALL_PREFIX=${MY_PROJECT_INSTALL} \ --D BUILD_DEV=OFF \ --D CMAKE_BUILD_TYPE=Release \ --D CMAKE_CXX_FLAGS=" -O3 -ftemplate-backtrace-limit=0 -gline-tables-only -save-temps=$PWD" \ --D CMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc \ --D CMAKE_PREFIX_PATH=/opt/rocm \ --D CMAKE_VERBOSE_MAKEFILE:BOOL=ON \ -${MY_PROJECT_SOURCE} - -#-D CMAKE_CXX_FLAGS=" --offload-arch=gfx908 --offload-arch=gfx90a -O3 -ftemplate-backtrace-limit=0 -mllvm --amdgpu-spill-vgpr-to-agpr=0 -gline-tables-only -save-temps=$PWD" \ -#-D CMAKE_CXX_FLAGS=" --offload-arch=gfx908 --offload-arch=gfx90a -O3 -ftemplate-backtrace-limit=0 -gline-tables-only -save-temps=$PWD" \