mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-04-19 14:29:05 +00:00
Update doc (#464)
* update cmake script * update readme * Update README.md * add citation * add images * Update README.md * update * Update README.md * Update CONTRIBUTORS.md * Update README.md * Update CITATION.cff * Update README.md * Update CITATION.cff * update doc * Update CONTRIBUTORS.md * Update LICENSE
This commit is contained in:
@@ -1,5 +1,9 @@
|
||||
# Composable Kernel Developers and Contributors
|
||||
|
||||
# Developers
|
||||
This is the list of developers and contributors to Composable Kernel library
|
||||
|
||||
|
||||
## Developers
|
||||
[Chao Liu](https://github.com/asroy), [Jing Zhang](https://github.com/zjing14), 2018-2022
|
||||
|
||||
[Letao Qin](https://github.com/ltqin), [Qianfeng Zhang](https://github.com/qianfengz), [Liang Huang](https://github.com/carlushuang), [Shaojie Wang](https://github.com/shaojiewang), 2019-2022
|
||||
@@ -15,12 +19,13 @@ Xiaoyan Zhou, 2020
|
||||
[Jianfeng Yan](https://github.com/j4yan), 2021-2022
|
||||
|
||||
|
||||
# Product Manager
|
||||
## Product Manager
|
||||
[Jun Liu](https://github.com/junliume)
|
||||
|
||||
# Contributors
|
||||
[Dan Yao](https://github.com/danyao12), [Guangzhao Lu](https://github.com/guangzlu), [Raman Jana](https://github.com/ramjana), [Jehandad Khan](https://github.com/JehandadKhan)
|
||||
|
||||
# Acknowledgement
|
||||
CK team works closely with Meta [AITemplate](???to.be.added???) team ([Bing Xu](https://github.com/antinucleon), Ying Zhang, etc). Most of the lucrative graph optimization opportunities in ML models were identified by AITemplate team, and we also co-designed many high performance fused kernels for AMD GPUs. Without this collaboration, CK would not reach its current potential.
|
||||
## Contributors
|
||||
[Dan Yao](https://github.com/danyao12), [Guangzhao Lu](https://github.com/guangzlu), [Raman Jana](https://github.com/ramjana), [Jehandad Khan](https://github.com/JehandadKhan), [Wen-Heng (Jack) Chung](https://github.com/whchung)
|
||||
|
||||
|
||||
## Acknowledgement
|
||||
CK team works closely with Meta [AITemplate](https://github.com/facebookincubator/AITemplate) team ([Bing Xu](https://github.com/antinucleon), [Hao Lu](https://github.com/hlu1), [Ying Zhang](https://github.com/ipiszy), etc). Most of the lucrative graph optimization opportunities in ML models were identified by AITemplate team, and we also co-designed many high performance fused kernels for AMD GPUs. Without this collaboration, CK would not reach its current potential.
|
||||
|
||||
8
LICENSE
8
LICENSE
@@ -1,11 +1,3 @@
|
||||
Copyright (c) 2018- , Advanced Micro Devices, Inc. (Chao Liu, Jing Zhang)
|
||||
Copyright (c) 2019- , Advanced Micro Devices, Inc. (Letao Qin, Qianfeng Zhang, Liang Huang, Shaojie Wang)
|
||||
Copyright (c) 2022- , Advanced Micro Devices, Inc. (Anthony Chang, Chunyu Lai, Illia Silin, Adam Osewski, Poyen Chen, Jehandad Khan)
|
||||
Copyright (c) 2019-2021, Advanced Micro Devices, Inc. (Hanwen Chang)
|
||||
Copyright (c) 2019-2020, Advanced Micro Devices, Inc. (Tejash Shah)
|
||||
Copyright (c) 2020 , Advanced Micro Devices, Inc. (Xiaoyan Zhou)
|
||||
Copyright (c) 2021-2022, Advanced Micro Devices, Inc. (Jianfeng Yan)
|
||||
|
||||
SPDX-License-Identifier: MIT
|
||||
Copyright (c) 2018-2022, Advanced Micro Devices, Inc. All rights reserved.
|
||||
|
||||
|
||||
@@ -1,9 +1,9 @@
|
||||
# Composable Kernel
|
||||
|
||||
## Methodology
|
||||
Composable Kernel (CK) library aims to provide a programming model for writing performance critical kernels for Machine Learning workloads across multiple architectures including GPUs, CPUs, etc, through general purpose kernel languages, like HIP C++.
|
||||
Composable Kernel (CK) library aims to provide a programming model for writing performance critical kernels for machine learning workloads across multiple architectures including GPUs, CPUs, etc, through general purpose kernel languages, like HIP C++.
|
||||
|
||||
CK utilizes two concepts to achieve performance portabilatity and code maintainbility:
|
||||
CK utilizes two concepts to achieve performance portability and code maintainability:
|
||||
* A tile-based programming model
|
||||
* Algorithm complexity reduction for complex ML operators, using innovative technique we call "Tensor Coordinate Transformation".
|
||||
|
||||
@@ -11,7 +11,7 @@ CK utilizes two concepts to achieve performance portabilatity and code maintainb
|
||||
|
||||
## Code Structure
|
||||
Current CK library are structured into 4 layers:
|
||||
* "Templated Tile Operators"
|
||||
* "Templated Tile Operators" layer
|
||||
* "Templated Kernel and Invoker" layer
|
||||
* "Instantiated Kernel and Invoker" layer
|
||||
* "Client API" layer
|
||||
@@ -90,7 +90,7 @@ Instructions for using CK as a pre-built kernel library are under [client_exampl
|
||||
### Kernel Timing and Verification
|
||||
CK's own kernel timer will warn up kernel once, and then run it multiple times
|
||||
to get average kernel time. For some kernels that use atomic add, this will cause
|
||||
output buffer to be accumulated multiple times, causing verfication failure.
|
||||
output buffer to be accumulated multiple times, causing verification failure.
|
||||
To work around it, do not use CK's own timer and do verification at the same time.
|
||||
CK's own timer and verification in each example and ckProfiler can be enabled or
|
||||
disabled from command line.
|
||||
|
||||
Reference in New Issue
Block a user