mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-14 02:02:46 +00:00
Add basic documentation structure (#1715)
* Add basic documentation structure
* Add terminology placeholder
* Add codegen placeholder
* Create template for each page
[ROCm/composable_kernel commit: 5affda819d]
This commit is contained in:
@@ -1,3 +1,4 @@
|
||||
[Back to the main page](./README.md)
|
||||
# Composable Kernel Developers and Contributors
|
||||
|
||||
This is the list of developers and contributors to Composable Kernel library
|
||||
|
||||
34
README.md
34
README.md
@@ -26,23 +26,15 @@ The current CK library is structured into four layers:
|
||||
|
||||
## General information
|
||||
|
||||
To build our documentation locally, use the following code:
|
||||
|
||||
``` bash
|
||||
cd docs
|
||||
pip3 install -r sphinx/requirements.txt
|
||||
python3 -m sphinx -T -E -b html -d _build/doctrees -D language=en . _build/html
|
||||
```
|
||||
|
||||
You can find a list of our developers and contributors on our [Contributors](/CONTRIBUTORS.md) page.
|
||||
|
||||
```note
|
||||
If you use CK, cite us as follows:
|
||||
|
||||
* [Realizing Tensor Operators Using Coordinate Transformations and Tile Based Programming](???):
|
||||
This paper will be available on arXiv soon.
|
||||
* [CITATION.cff](/CITATION.cff)
|
||||
```
|
||||
* [CK supported operations](include/ck/README.md)
|
||||
* [CK Tile supported operations](include/ck_tile/README.md)
|
||||
* [CK wrapper](client_example/25_wrapper/README.md)
|
||||
* [CK codegen](codegen/README.md)
|
||||
* [CK profiler](profiler/README.md)
|
||||
* [Examples (Custom use of CK supported operations)](example/README.md)
|
||||
* [Client examples (Use of CK supported operations with instance factory)](client_example/README.md)
|
||||
* [Terminology](/TERMINOLOGY.md)
|
||||
* [Contributors](/CONTRIBUTORS.md)
|
||||
|
||||
CK is released under the **[MIT license](/LICENSE)**.
|
||||
|
||||
@@ -137,6 +129,14 @@ Docker images are available on [DockerHub](https://hub.docker.com/r/rocm/composa
|
||||
|
||||
You can find instructions for running ckProfiler in [profiler](/profiler).
|
||||
|
||||
* Build our documentation locally:
|
||||
|
||||
``` bash
|
||||
cd docs
|
||||
pip3 install -r sphinx/requirements.txt
|
||||
python3 -m sphinx -T -E -b html -d _build/doctrees -D language=en . _build/html
|
||||
```
|
||||
|
||||
Note the `-j` option for building with multiple threads in parallel, which speeds up the build significantly.
|
||||
However, `-j` launches unlimited number of threads, which can cause the build to run out of memory and
|
||||
crash. On average, you should expect each thread to use ~2Gb of RAM.
|
||||
|
||||
2
TERMINOLOGY.md
Normal file
2
TERMINOLOGY.md
Normal file
@@ -0,0 +1,2 @@
|
||||
[Back to the main page](./README.md)
|
||||
# Composable Kernel terminology
|
||||
@@ -1,14 +1,9 @@
|
||||
[Back to the main page](../../README.md)
|
||||
# Composable Kernel wrapper GEMM tutorial
|
||||
|
||||
This tutorial demonstrates how to implement matrix multiplication using Composable Kernel (CK)
|
||||
wrapper. We present the base version of GEMM without most of the available optimizations; however,
|
||||
it's worth noting that CK has kernels with different optimizations.
|
||||
This tutorial demonstrates how to implement matrix multiplication using Composable Kernel (CK) wrapper. We present the base version of GEMM without most of the available optimizations; however, it's worth noting that CK has kernels with different optimizations.
|
||||
|
||||
To implement these optimizations, you can use the CK wrapper or directly use available instances in
|
||||
CK. You can also refer to the
|
||||
[optimized GEMM example](https://github.com/ROCm/composable_kernel/blob/develop/client_example/25_wrapper/wrapper_optimized_gemm.cpp),
|
||||
that uses CK wrapper based on the
|
||||
[`gridwise_gemm_xdlops_v2r3`](https://github.com/ROCm/composable_kernel/blob/develop/include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdlops_v2r3.hpp) implementation.
|
||||
To implement these optimizations, you can use the CK wrapper or directly use available instances in CK. You can also refer to the [optimized GEMM example](https://github.com/ROCm/composable_kernel/blob/develop/client_example/25_wrapper/wrapper_optimized_gemm.cpp), that uses CK wrapper based on the [`gridwise_gemm_xdlops_v2r3`](https://github.com/ROCm/composable_kernel/blob/develop/include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdlops_v2r3.hpp) implementation.
|
||||
|
||||
The kernel definition should look similar to:
|
||||
|
||||
|
||||
@@ -1,3 +1,5 @@
|
||||
[Back to the main page](../README.md)
|
||||
# Composable Kernel client examples
|
||||
##
|
||||
Client application links to CK library, and therefore CK library needs to be installed before building client applications.
|
||||
|
||||
|
||||
2
codegen/README.md
Normal file
2
codegen/README.md
Normal file
@@ -0,0 +1,2 @@
|
||||
[Back to the main page](../README.md)
|
||||
# Composable Kernel codegen
|
||||
2
example/README.md
Normal file
2
example/README.md
Normal file
@@ -0,0 +1,2 @@
|
||||
[Back to the main page](../README.md)
|
||||
# Composable Kernel examples
|
||||
19
include/ck/README.md
Normal file
19
include/ck/README.md
Normal file
@@ -0,0 +1,19 @@
|
||||
[Back to the main page](../../README.md)
|
||||
# Composable Kernel supported operations
|
||||
## Supported device operations
|
||||
* [Average pooling]()
|
||||
* [Batched contraction]()
|
||||
* [Batched gemm]()
|
||||
* [Batchnorm]()
|
||||
* [CGEMM]()
|
||||
* [Contraction]()
|
||||
* [Convolution]()
|
||||
* [Image to Column and Column to Image]()
|
||||
* [Elementwise]()
|
||||
* [GEMM]()
|
||||
* [Max pooling]()
|
||||
* [Reduce]()
|
||||
* [Normalization]()
|
||||
* [Permute]()
|
||||
* [Put]()
|
||||
* [Softmax]()
|
||||
@@ -1,4 +1,5 @@
|
||||
# ck_tile
|
||||
[Back to the main page](../../README.md)
|
||||
# Composable Kernel Tile
|
||||
## concept
|
||||
`ck_tile` provides a programming model with templated abstractions to enable users to implement performance-critical kernels for machine learning workloads. introduces following basic concepts to help users building your own operator
|
||||
- tensor coordinate transformation, this is the core concept of layout/index transform abstraction in both compiler time and run time.
|
||||
|
||||
@@ -1,3 +1,5 @@
|
||||
[Back to the main page](../README.md)
|
||||
# Composable Kernel profiler
|
||||
## Profile GEMM kernels
|
||||
```bash
|
||||
#arg1: tensor operation (gemm=GEMM)
|
||||
@@ -180,3 +182,13 @@ Note: Column to image kernel adds to the output memory, this will cause output b
|
||||
################ op datatype verify init log time dim0 dim1 dim2 in_stride0 in_stride1 in_stride2 out_stride0 out_stride1 out_stride2
|
||||
./bin/ckProfiler permute_scale 0 1 1 0 1 64 64 64 4096 64 1 1 64 4096
|
||||
```
|
||||
|
||||
## Convert MIOpen driver command to CKProfiler
|
||||
|
||||
```bash
|
||||
python3 ../script/convert_miopen_driver_to_profiler.py
|
||||
/opt/rocm/bin/MIOpenDriver conv -n 32 -c 64 -H 28 -W 28 -k 64 -y 3 -x 3
|
||||
-p 1 -q 1 -u 2 -v 2 -l 1 -j 1 -m conv -g 32 -F 1 -t 1
|
||||
```
|
||||
|
||||
Only convolution driver is supported.
|
||||
|
||||
Reference in New Issue
Block a user