diff --git a/CONTRIBUTORS.md b/CONTRIBUTORS.md index cdce5a4630..8ef5c2b726 100644 --- a/CONTRIBUTORS.md +++ b/CONTRIBUTORS.md @@ -1,3 +1,4 @@ +[Back to the main page](./README.md) # Composable Kernel Developers and Contributors This is the list of developers and contributors to Composable Kernel library diff --git a/README.md b/README.md index d8eb152ee9..c0872aa567 100644 --- a/README.md +++ b/README.md @@ -26,23 +26,15 @@ The current CK library is structured into four layers: ## General information -To build our documentation locally, use the following code: - -``` bash -cd docs -pip3 install -r sphinx/requirements.txt -python3 -m sphinx -T -E -b html -d _build/doctrees -D language=en . _build/html -``` - -You can find a list of our developers and contributors on our [Contributors](/CONTRIBUTORS.md) page. - -```note -If you use CK, cite us as follows: - -* [Realizing Tensor Operators Using Coordinate Transformations and Tile Based Programming](???): - This paper will be available on arXiv soon. -* [CITATION.cff](/CITATION.cff) -``` +* [CK supported operations](include/ck/README.md) +* [CK Tile supported operations](include/ck_tile/README.md) +* [CK wrapper](client_example/25_wrapper/README.md) +* [CK codegen](codegen/README.md) +* [CK profiler](profiler/README.md) +* [Examples (Custom use of CK supported operations)](example/README.md) +* [Client examples (Use of CK supported operations with instance factory)](client_example/README.md) +* [Terminology](/TERMINOLOGY.md) +* [Contributors](/CONTRIBUTORS.md) CK is released under the **[MIT license](/LICENSE)**. @@ -137,6 +129,14 @@ Docker images are available on [DockerHub](https://hub.docker.com/r/rocm/composa You can find instructions for running ckProfiler in [profiler](/profiler). +* Build our documentation locally: + + ``` bash + cd docs + pip3 install -r sphinx/requirements.txt + python3 -m sphinx -T -E -b html -d _build/doctrees -D language=en . _build/html + ``` + Note the `-j` option for building with multiple threads in parallel, which speeds up the build significantly. However, `-j` launches unlimited number of threads, which can cause the build to run out of memory and crash. On average, you should expect each thread to use ~2Gb of RAM. diff --git a/TERMINOLOGY.md b/TERMINOLOGY.md new file mode 100644 index 0000000000..e8833efb89 --- /dev/null +++ b/TERMINOLOGY.md @@ -0,0 +1,2 @@ +[Back to the main page](./README.md) +# Composable Kernel terminology \ No newline at end of file diff --git a/client_example/25_wrapper/README.md b/client_example/25_wrapper/README.md index eba3de017f..3db9a9af44 100644 --- a/client_example/25_wrapper/README.md +++ b/client_example/25_wrapper/README.md @@ -1,14 +1,9 @@ +[Back to the main page](../../README.md) # Composable Kernel wrapper GEMM tutorial -This tutorial demonstrates how to implement matrix multiplication using Composable Kernel (CK) -wrapper. We present the base version of GEMM without most of the available optimizations; however, -it's worth noting that CK has kernels with different optimizations. +This tutorial demonstrates how to implement matrix multiplication using Composable Kernel (CK) wrapper. We present the base version of GEMM without most of the available optimizations; however, it's worth noting that CK has kernels with different optimizations. -To implement these optimizations, you can use the CK wrapper or directly use available instances in -CK. You can also refer to the -[optimized GEMM example](https://github.com/ROCm/composable_kernel/blob/develop/client_example/25_wrapper/wrapper_optimized_gemm.cpp), -that uses CK wrapper based on the -[`gridwise_gemm_xdlops_v2r3`](https://github.com/ROCm/composable_kernel/blob/develop/include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdlops_v2r3.hpp) implementation. +To implement these optimizations, you can use the CK wrapper or directly use available instances in CK. You can also refer to the [optimized GEMM example](https://github.com/ROCm/composable_kernel/blob/develop/client_example/25_wrapper/wrapper_optimized_gemm.cpp), that uses CK wrapper based on the [`gridwise_gemm_xdlops_v2r3`](https://github.com/ROCm/composable_kernel/blob/develop/include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdlops_v2r3.hpp) implementation. The kernel definition should look similar to: diff --git a/client_example/README.md b/client_example/README.md index 64a7130d53..d9f793434d 100644 --- a/client_example/README.md +++ b/client_example/README.md @@ -1,3 +1,5 @@ +[Back to the main page](../README.md) +# Composable Kernel client examples ## Client application links to CK library, and therefore CK library needs to be installed before building client applications. diff --git a/codegen/README.md b/codegen/README.md new file mode 100644 index 0000000000..deadf3221d --- /dev/null +++ b/codegen/README.md @@ -0,0 +1,2 @@ +[Back to the main page](../README.md) +# Composable Kernel codegen \ No newline at end of file diff --git a/example/README.md b/example/README.md new file mode 100644 index 0000000000..43b3419f80 --- /dev/null +++ b/example/README.md @@ -0,0 +1,2 @@ +[Back to the main page](../README.md) +# Composable Kernel examples \ No newline at end of file diff --git a/include/ck/README.md b/include/ck/README.md new file mode 100644 index 0000000000..bff689f6b0 --- /dev/null +++ b/include/ck/README.md @@ -0,0 +1,19 @@ +[Back to the main page](../../README.md) +# Composable Kernel supported operations +## Supported device operations +* [Average pooling]() +* [Batched contraction]() +* [Batched gemm]() +* [Batchnorm]() +* [CGEMM]() +* [Contraction]() +* [Convolution]() +* [Image to Column and Column to Image]() +* [Elementwise]() +* [GEMM]() +* [Max pooling]() +* [Reduce]() +* [Normalization]() +* [Permute]() +* [Put]() +* [Softmax]() diff --git a/include/ck_tile/README.md b/include/ck_tile/README.md index 572e9c7e48..9f88af1ca1 100644 --- a/include/ck_tile/README.md +++ b/include/ck_tile/README.md @@ -1,4 +1,5 @@ -# ck_tile +[Back to the main page](../../README.md) +# Composable Kernel Tile ## concept `ck_tile` provides a programming model with templated abstractions to enable users to implement performance-critical kernels for machine learning workloads. introduces following basic concepts to help users building your own operator - tensor coordinate transformation, this is the core concept of layout/index transform abstraction in both compiler time and run time. diff --git a/profiler/README.md b/profiler/README.md index 10febcabdc..3f4837aada 100644 --- a/profiler/README.md +++ b/profiler/README.md @@ -1,3 +1,5 @@ +[Back to the main page](../README.md) +# Composable Kernel profiler ## Profile GEMM kernels ```bash #arg1: tensor operation (gemm=GEMM) @@ -180,3 +182,13 @@ Note: Column to image kernel adds to the output memory, this will cause output b ################ op datatype verify init log time dim0 dim1 dim2 in_stride0 in_stride1 in_stride2 out_stride0 out_stride1 out_stride2 ./bin/ckProfiler permute_scale 0 1 1 0 1 64 64 64 4096 64 1 1 64 4096 ``` + +## Convert MIOpen driver command to CKProfiler + +```bash +python3 ../script/convert_miopen_driver_to_profiler.py +/opt/rocm/bin/MIOpenDriver conv -n 32 -c 64 -H 28 -W 28 -k 64 -y 3 -x 3 +-p 1 -q 1 -u 2 -v 2 -l 1 -j 1 -m conv -g 32 -F 1 -t 1 +``` + +Only convolution driver is supported.