mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-04-19 14:29:05 +00:00
update document: Readme, contributors, citation, (#463)
* update cmake script * update readme * Update README.md * add citation * add images * Update README.md * update * Update README.md * Update CONTRIBUTORS.md * Update README.md * Update CITATION.cff * Update README.md * Update CITATION.cff
This commit is contained in:
67
CITATION.cff
Normal file
67
CITATION.cff
Normal file
@@ -0,0 +1,67 @@
|
||||
cff-version: 1.2.0
|
||||
title: Composable Kernel
|
||||
message: If you use this software, please cite using the following metadata.
|
||||
type: software
|
||||
authors:
|
||||
- given-names: Chao
|
||||
family-names: Liu
|
||||
email: chao.liu2@amd.com
|
||||
affiliation: AMD
|
||||
- given-names: Jing
|
||||
family-names: Zhang
|
||||
email: jing.zhang3@amd.com
|
||||
affiliation: AMD
|
||||
- given-names: Letao
|
||||
family-names: Qin
|
||||
email: letao.qin@amd.com
|
||||
affiliation: AMD
|
||||
- given-names: Qianfeng
|
||||
family-names: Zhang
|
||||
email: qianfeng.zhang@amd.com
|
||||
affiliation: AMD
|
||||
- given-names: Liang
|
||||
family-names: Huang
|
||||
email: carlus.huang@amd.com
|
||||
affiliation: AMD
|
||||
- given-names: Shaojie
|
||||
family-names: Wang
|
||||
email: shaojie.wang@amd.com
|
||||
affiliation: AMD
|
||||
- given-names: Anthony
|
||||
family-names: Chang
|
||||
email: antc@amd.com
|
||||
affiliation: AMD
|
||||
- given-names: Chunyu
|
||||
family-names: Lai
|
||||
email: chunyu.lai@amd.com
|
||||
affiliation: AMD
|
||||
- given-names: Illia
|
||||
family-names: Silin
|
||||
email: illia.silin@amd.com
|
||||
affiliation: AMD
|
||||
- given-names: Adam
|
||||
family-names: Osewski
|
||||
email: adam.osewski@amd.com
|
||||
affiliation: AMD
|
||||
- given-names: Poyen
|
||||
family-names: Chen
|
||||
email: poyen.chen@amd.com
|
||||
affiliation: AMD
|
||||
- given-names: Rosty
|
||||
family-names: Geyyer
|
||||
email: rosty.geyyer@amd.com
|
||||
affiliation: AMD
|
||||
- given-names: Hanwen
|
||||
family-names: Chen
|
||||
- given-names: Tejash
|
||||
family-names: Shah
|
||||
- given-names: Xiaoyan
|
||||
family-names: Zhou
|
||||
- given-names: Jianfeng
|
||||
family-names: Yan
|
||||
repository-code: 'https://github.com/ROCmSoftwarePlatform/composable_kernel'
|
||||
abstract: Composable Kernel (CK) library aims to provide a programming model for writing performance critical kernels for Machine Learning workloads across multiple architectures including GPUs, CPUs, etc, through general purpose kernel progarmming languages, like HIP C++.
|
||||
keywords:
|
||||
- 'CK, Composable Kernel, Tensor Coordinate Transformation'
|
||||
license: MIT
|
||||
license-url: https://github.com/ROCmSoftwarePlatform/composable_kernel/blob/7fc3ed761aa35709d87c8fbbe41dd368648b3541/LICENSE
|
||||
26
CONTRIBUTORS.md
Normal file
26
CONTRIBUTORS.md
Normal file
@@ -0,0 +1,26 @@
|
||||
|
||||
# Developers
|
||||
[Chao Liu](https://github.com/asroy), [Jing Zhang](https://github.com/zjing14), 2018-2022
|
||||
|
||||
[Letao Qin](https://github.com/ltqin), [Qianfeng Zhang](https://github.com/qianfengz), [Liang Huang](https://github.com/carlushuang), [Shaojie Wang](https://github.com/shaojiewang), 2019-2022
|
||||
|
||||
[Anthony Chang](https://github.com/rosenrodt), [Chunyu Lai](https://github.com/rocking5566), [Illia Silin](https://github.com/illsilin), [Adam Osewski](https://github.com/aosewski), [Poyen Chen](https://github.com/poyenc), [Rosty Geyyer](https://github.com/geyyer), 2022
|
||||
|
||||
Hanwen Chang, 2019-2021,
|
||||
|
||||
Tejash Shah, 2019-2020
|
||||
|
||||
Xiaoyan Zhou, 2020
|
||||
|
||||
[Jianfeng Yan](https://github.com/j4yan), 2021-2022
|
||||
|
||||
|
||||
# Product Manager
|
||||
[Jun Liu](https://github.com/junliume)
|
||||
|
||||
# Contributors
|
||||
[Dan Yao](https://github.com/danyao12), [Guangzhao Lu](https://github.com/guangzlu), [Raman Jana](https://github.com/ramjana), [Jehandad Khan](https://github.com/JehandadKhan)
|
||||
|
||||
# Acknowledgement
|
||||
CK team works closely with Meta [AITemplate](???to.be.added???) team ([Bing Xu](https://github.com/antinucleon), Ying Zhang, etc). Most of the lucrative graph optimization opportunities in ML models were identified by AITemplate team, and we also co-designed many high performance fused kernels for AMD GPUs. Without this collaboration, CK would not reach its current potential.
|
||||
|
||||
80
README.md
80
README.md
@@ -1,4 +1,43 @@
|
||||
## Docker script
|
||||
# Composable Kernel
|
||||
|
||||
## Methodology
|
||||
Composable Kernel (CK) library aims to provide a programming model for writing performance critical kernels for Machine Learning workloads across multiple architectures including GPUs, CPUs, etc, through general purpose kernel languages, like HIP C++.
|
||||
|
||||
CK utilizes two concepts to achieve performance portabilatity and code maintainbility:
|
||||
* A tile-based programming model
|
||||
* Algorithm complexity reduction for complex ML operators, using innovative technique we call "Tensor Coordinate Transformation".
|
||||
|
||||

|
||||
|
||||
## Code Structure
|
||||
Current CK library are structured into 4 layers:
|
||||
* "Templated Tile Operators"
|
||||
* "Templated Kernel and Invoker" layer
|
||||
* "Instantiated Kernel and Invoker" layer
|
||||
* "Client API" layer
|
||||
|
||||

|
||||
|
||||
## Contributors
|
||||
The list of developers and contributors is here: [Contributors](/CONTRIBUTORS.md)
|
||||
|
||||
## Citation
|
||||
If you use CK, please use following citations:
|
||||
* CK paper will be freely available on arXiv soon: [Realizing Tensor Operators Using Coordinate Transformations and Tile Based Programming](???)
|
||||
* [CITATION.cff](/CITATION.cff)
|
||||
|
||||
## License
|
||||
CK is released under the MIT license. [License File](/LICENSE)
|
||||
|
||||
|
||||
# Build CK
|
||||
|
||||
## Build docker image
|
||||
```bash
|
||||
DOCKER_BUILDKIT=1 docker build -t ck:latest -f Dockerfile .
|
||||
```
|
||||
|
||||
## Launch docker
|
||||
```bash
|
||||
docker run \
|
||||
-it \
|
||||
@@ -6,47 +45,38 @@ docker run \
|
||||
--group-add sudo \
|
||||
-w /root/workspace \
|
||||
-v ${PATH_TO_LOCAL_WORKSPACE}:/root/workspace \
|
||||
rocm/tensorflow:rocm5.1-tf2.6-dev \
|
||||
ck:latest \
|
||||
/bin/bash
|
||||
```
|
||||
|
||||
# Install newer version of rocm-cmake
|
||||
https://github.com/RadeonOpenCompute/rocm-cmake
|
||||
|
||||
## Build
|
||||
## Build CK
|
||||
```bash
|
||||
mkdir build && cd build
|
||||
```
|
||||
|
||||
```bash
|
||||
# Need to specify target ID, example below is gfx908 and gfx90a
|
||||
cmake \
|
||||
-D BUILD_DEV=OFF \
|
||||
-D CMAKE_BUILD_TYPE=Release \
|
||||
-D CMAKE_CXX_FLAGS=" --offload-arch=gfx908 --offload-arch=gfx90a -O3" \
|
||||
-D CMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc \
|
||||
-D CMAKE_PREFIX_PATH=/opt/rocm \
|
||||
-D CMAKE_INSTALL_PREFIX=${PATH_TO_CK_INSTALL_DIRECTORY} \
|
||||
# Need to specify target ID, example below is for gfx908 and gfx90a
|
||||
cmake \
|
||||
-D CMAKE_PREFIX_PATH=/opt/rocm \
|
||||
-D CMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc \
|
||||
-D CMAKE_CXX_FLAGS="-O3" \
|
||||
-D CMAKE_BUILD_TYPE=Release \
|
||||
-D GPU_TARGETS=gfx908;gfx90a \
|
||||
..
|
||||
```
|
||||
|
||||
### Build and Run Examples
|
||||
```bash
|
||||
make -j examples
|
||||
```
|
||||
Instructions for running each individual examples are under ```example/```
|
||||
|
||||
## Tests
|
||||
### Build examples and tests
|
||||
```bash
|
||||
make -j examples tests
|
||||
make test
|
||||
```
|
||||
|
||||
Instructions for running each individual examples are under [example](/example)
|
||||
|
||||
|
||||
## Build ckProfiler
|
||||
```bash
|
||||
make -j ckProfiler
|
||||
```
|
||||
Instructions for running ckProfiler are under ```profiler/```
|
||||
Instructions for running ckProfiler are under [profiler](/profiler)
|
||||
|
||||
## Install CK
|
||||
```bash
|
||||
@@ -54,7 +84,7 @@ make install
|
||||
```
|
||||
|
||||
## Using CK as pre-built kernel library
|
||||
Instructions for using CK as a pre-built kernel library are under ```client_example/```
|
||||
Instructions for using CK as a pre-built kernel library are under [client_example](/client_example)
|
||||
|
||||
## Caveat
|
||||
### Kernel Timing and Verification
|
||||
|
||||
BIN
doc/image/ck_component.png
Normal file
BIN
doc/image/ck_component.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 552 KiB |
BIN
doc/image/ck_layer.png
Normal file
BIN
doc/image/ck_layer.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 536 KiB |
19
script/cmake-ck-dev.sh
Executable file
19
script/cmake-ck-dev.sh
Executable file
@@ -0,0 +1,19 @@
|
||||
#!/bin/bash
|
||||
rm -f CMakeCache.txt
|
||||
rm -f *.cmake
|
||||
rm -rf CMakeFiles
|
||||
|
||||
MY_PROJECT_SOURCE=$1
|
||||
|
||||
cmake \
|
||||
-D CMAKE_PREFIX_PATH=/opt/rocm \
|
||||
-D CMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc \
|
||||
-D CMAKE_CXX_FLAGS="-O3 -ftemplate-backtrace-limit=0 -gline-tables-only -save-temps=$PWD" \
|
||||
-D CMAKE_BUILD_TYPE=Release \
|
||||
-D BUILD_DEV=ON \
|
||||
-D GPU_TARGETS=gfx908;gfx90a \
|
||||
-D CMAKE_VERBOSE_MAKEFILE:BOOL=ON \
|
||||
-D USE_BITINT_EXTENSION_INT4=OFF \
|
||||
${MY_PROJECT_SOURCE}
|
||||
|
||||
#-D AMDGPU_TARGETS=gfx90a;gfx908
|
||||
19
script/cmake-ck-release.sh
Executable file
19
script/cmake-ck-release.sh
Executable file
@@ -0,0 +1,19 @@
|
||||
#!/bin/bash
|
||||
rm -f CMakeCache.txt
|
||||
rm -f *.cmake
|
||||
rm -rf CMakeFiles
|
||||
|
||||
MY_PROJECT_SOURCE=$1
|
||||
|
||||
cmake \
|
||||
-D CMAKE_PREFIX_PATH=/opt/rocm \
|
||||
-D CMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc \
|
||||
-D CMAKE_CXX_FLAGS="-O3" \
|
||||
-D CMAKE_BUILD_TYPE=Release \
|
||||
-D BUILD_DEV=OFF \
|
||||
-D GPU_TARGETS=gfx908;gfx90a \
|
||||
-D CMAKE_VERBOSE_MAKEFILE:BOOL=ON \
|
||||
-D USE_BITINT_EXTENSION_INT4=OFF \
|
||||
${MY_PROJECT_SOURCE}
|
||||
|
||||
#-D AMDGPU_TARGETS=gfx90a;gfx908
|
||||
@@ -1,20 +0,0 @@
|
||||
#!/bin/bash
|
||||
rm -f CMakeCache.txt
|
||||
rm -f *.cmake
|
||||
rm -rf CMakeFiles
|
||||
|
||||
MY_PROJECT_SOURCE=../
|
||||
MY_PROJECT_INSTALL=../install.dir
|
||||
|
||||
cmake \
|
||||
-D CMAKE_INSTALL_PREFIX=${MY_PROJECT_INSTALL} \
|
||||
-D BUILD_DEV=OFF \
|
||||
-D CMAKE_BUILD_TYPE=Release \
|
||||
-D CMAKE_CXX_FLAGS=" -O3 -ftemplate-backtrace-limit=0 -gline-tables-only -save-temps=$PWD" \
|
||||
-D CMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc \
|
||||
-D CMAKE_PREFIX_PATH=/opt/rocm \
|
||||
-D CMAKE_VERBOSE_MAKEFILE:BOOL=ON \
|
||||
${MY_PROJECT_SOURCE}
|
||||
|
||||
#-D CMAKE_CXX_FLAGS=" --offload-arch=gfx908 --offload-arch=gfx90a -O3 -ftemplate-backtrace-limit=0 -mllvm --amdgpu-spill-vgpr-to-agpr=0 -gline-tables-only -save-temps=$PWD" \
|
||||
#-D CMAKE_CXX_FLAGS=" --offload-arch=gfx908 --offload-arch=gfx90a -O3 -ftemplate-backtrace-limit=0 -gline-tables-only -save-temps=$PWD" \
|
||||
Reference in New Issue
Block a user