mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-04-19 22:39:03 +00:00
standardize docs (#655)
This commit is contained in:
11
.gitignore
vendored
11
.gitignore
vendored
@@ -48,6 +48,11 @@ build*
|
||||
.gdb_history
|
||||
install.dir*
|
||||
|
||||
# directories containing generated documentation
|
||||
docs/source/_build/
|
||||
docs/docBin/
|
||||
# documentation artifacts
|
||||
build/
|
||||
_build/
|
||||
_images/
|
||||
_static/
|
||||
_templates/
|
||||
_toc.yml
|
||||
docBin/
|
||||
|
||||
18
.readthedocs.yaml
Normal file
18
.readthedocs.yaml
Normal file
@@ -0,0 +1,18 @@
|
||||
# Read the Docs configuration file
|
||||
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
|
||||
|
||||
version: 2
|
||||
|
||||
build:
|
||||
os: ubuntu-22.04
|
||||
tools:
|
||||
python: "3.8"
|
||||
|
||||
sphinx:
|
||||
configuration: docs/conf.py
|
||||
|
||||
formats: [htmlzip]
|
||||
|
||||
python:
|
||||
install:
|
||||
- requirements: docs/.sphinx/requirements.txt
|
||||
14
README.md
14
README.md
@@ -7,7 +7,7 @@ CK utilizes two concepts to achieve performance portability and code maintainabi
|
||||
* A tile-based programming model
|
||||
* Algorithm complexity reduction for complex ML operators, using innovative technique we call "Tensor Coordinate Transformation".
|
||||
|
||||

|
||||

|
||||
|
||||
## Code Structure
|
||||
Current CK library are structured into 4 layers:
|
||||
@@ -16,7 +16,17 @@ Current CK library are structured into 4 layers:
|
||||
* "Instantiated Kernel and Invoker" layer
|
||||
* "Client API" layer
|
||||
|
||||

|
||||

|
||||
|
||||
## Documentation
|
||||
|
||||
Run the steps below to build documentation locally.
|
||||
|
||||
```
|
||||
cd docs
|
||||
pip3 install -r .sphinx/requirements.txt
|
||||
python3 -m sphinx -T -E -b html -d _build/doctrees -D language=en . _build/html
|
||||
```
|
||||
|
||||
## Contributors
|
||||
The list of developers and contributors is here: [Contributors](/CONTRIBUTORS.md)
|
||||
|
||||
@@ -1,93 +0,0 @@
|
||||
## CK docker hub
|
||||
|
||||
[Docker hub](https://hub.docker.com/r/rocm/composable_kernel)
|
||||
|
||||
## Why do I need this?
|
||||
|
||||
To make our lives easier and bring Composable Kernel dependencies together, we recommend using docker images.
|
||||
|
||||
## So what is Composable Kernel?
|
||||
|
||||
Composable Kernel (CK) library aims to provide a programming model for writing performance critical kernels for machine learning workloads across multiple architectures including GPUs, CPUs, etc, through general purpose kernel languages, like HIP C++.
|
||||
|
||||
To get the CK library
|
||||
|
||||
```
|
||||
git clone https://github.com/ROCmSoftwarePlatform/composable_kernel.git
|
||||
```
|
||||
|
||||
run a docker container
|
||||
|
||||
```
|
||||
docker run \
|
||||
-it \
|
||||
--privileged \
|
||||
--group-add sudo \
|
||||
-w /root/workspace \
|
||||
-v ${PATH_TO_LOCAL_WORKSPACE}:/root/workspace \
|
||||
rocm/composable_kernel:ck_ub20.04_rocm5.3_release \
|
||||
/bin/bash
|
||||
```
|
||||
|
||||
and build the CK
|
||||
|
||||
```
|
||||
mkdir build && cd build
|
||||
|
||||
# Need to specify target ID, example below is for gfx908 and gfx90a
|
||||
cmake \
|
||||
-D CMAKE_PREFIX_PATH=/opt/rocm \
|
||||
-D CMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc \
|
||||
-D CMAKE_CXX_FLAGS="-O3" \
|
||||
-D CMAKE_BUILD_TYPE=Release \
|
||||
-D GPU_TARGETS="gfx908;gfx90a" \
|
||||
..
|
||||
```
|
||||
|
||||
and
|
||||
|
||||
```
|
||||
make -j examples tests
|
||||
```
|
||||
|
||||
To run all the test cases including tests and examples run
|
||||
|
||||
```
|
||||
make test
|
||||
```
|
||||
|
||||
We can also run specific examples or tests like
|
||||
|
||||
```
|
||||
./bin/example_gemm_xdl_fp16
|
||||
./bin/test_gemm_fp16
|
||||
```
|
||||
|
||||
For more details visit [CK github repo](https://github.com/ROCmSoftwarePlatform/composable_kernel), [CK examples](https://github.com/ROCmSoftwarePlatform/composable_kernel/tree/develop/example), [even more CK examples](https://github.com/ROCmSoftwarePlatform/composable_kernel/tree/develop/client_example).
|
||||
|
||||
## And what is inside?
|
||||
|
||||
The docker images have everything you need for running CK including:
|
||||
|
||||
* [ROCm](https://www.amd.com/en/graphics/servers-solutions-rocm)
|
||||
* [CMake](https://cmake.org/)
|
||||
* [Compiler](https://github.com/RadeonOpenCompute/llvm-project)
|
||||
|
||||
## Which image is right for me?
|
||||
|
||||
Let's take a look at the image naming, for example "ck_ub20.04_rocm5.4_release". The image specs are:
|
||||
|
||||
* "ck" - made for running Composable Kernel
|
||||
* "ub20.04" - based on Ubuntu 20.04
|
||||
* "rocm5.4" - ROCm platform version 5.4
|
||||
* "release" - compiler version is release
|
||||
|
||||
So just pick the right image for your project dependencies and you're all set.
|
||||
|
||||
## DIY starts here
|
||||
|
||||
If you need to customize a docker image or just can't stop tinkering, feel free to adjust the [Dockerfile](https://github.com/ROCmSoftwarePlatform/composable_kernel/blob/develop/Dockerfile) for your needs.
|
||||
|
||||
## License
|
||||
|
||||
CK is released under the MIT [license](https://github.com/ROCmSoftwarePlatform/composable_kernel/blob/develop/LICENSE).
|
||||
@@ -1,191 +0,0 @@
|
||||
## CK Hello world
|
||||
|
||||
## Motivation
|
||||
|
||||
This tutorial is aimed at engineers dealing with artificial intelligence and machine learning who would like to optimize their pipelines and squeeze every performance drop by adding Composable Kernel (CK) library to their projects. We would like to make the CK library approachable so the tutorial is not based on the latest release and doesn't have all the bleeding edge features, but it will be reproducible now and forever.
|
||||
|
||||
During this tutorial we will have an introduction to the CK library, we will build it and run some examples and tests, so to say we will run a "Hello world" example. In future tutorials we will go in depth and breadth and get familiar with other tools and ways to integrate CK into your project.
|
||||
|
||||
## Description
|
||||
|
||||
Modern AI technology solves more and more problems in all imaginable fields, but crafting fast and efficient workflows is still challenging. CK is one of the tools to make AI heavy lifting as fast and efficient as possible. CK is a collection of optimized AI operator kernels and tools to create new ones. The library has components required for majority of modern neural networks architectures including matrix multiplication, convolution, contraction, reduction, attention modules, variety of activation functions, fused operators and many more.
|
||||
|
||||
So how do we (almost) reach the speed of light? CK acceleration abilities are based on:
|
||||
|
||||
* Layered structure.
|
||||
* Tile-based computation model.
|
||||
* Tensor coordinate transformation.
|
||||
* Hardware acceleration use.
|
||||
* Support of low precision data types including fp16, bf16, int8 and int4.
|
||||
|
||||
If you are excited and need more technical details and benchmarking results - read this awesome blog [post](https://community.amd.com/t5/instinct-accelerators/amd-composable-kernel-library-efficient-fused-kernels-for-ai/ba-p/553224).
|
||||
|
||||
For more details visit our [github repo](https://github.com/ROCmSoftwarePlatform/composable_kernel).
|
||||
|
||||
## Hardware targets
|
||||
|
||||
CK library fully supports "gfx908" and "gfx90a" GPU architectures and only some operators are supported for "gfx1030". Let's check the hardware you have at hand and decide on the target GPU architecture
|
||||
|
||||
GPU Target AMD GPU
|
||||
gfx908 Radeon Instinct MI100
|
||||
gfx90a Radeon Instinct MI210, MI250, MI250X
|
||||
gfx1030 Radeon PRO V620, W6800, W6800X, W6800X Duo, W6900X, RX 6800, RX 6800 XT, RX 6900 XT, RX 6900 XTX, RX 6950 XT
|
||||
|
||||
There are also [cloud options](https://aws.amazon.com/ec2/instance-types/g4/) you can find if you don't have an AMD GPU at hand.
|
||||
|
||||
## Build the library
|
||||
|
||||
First let's clone the library and rebase to the tested version:
|
||||
|
||||
```
|
||||
git clone https://github.com/ROCmSoftwarePlatform/composable_kernel.git
|
||||
cd composable_kernel/
|
||||
git checkout tutorial_hello_world
|
||||
```
|
||||
|
||||
To make our lives easier we prepared [docker images](https://hub.docker.com/r/rocm/composable_kernel) with all the necessary dependencies. Pick the right image and create a container. In this tutorial we use "rocm/composable_kernel:ck_ub20.04_rocm5.3_release" image, it is based on Ubuntu 20.04, ROCm v5.3, compiler release version.
|
||||
|
||||
If your current folder is ${HOME}, start the docker container with
|
||||
|
||||
```
|
||||
docker run \
|
||||
-it \
|
||||
--privileged \
|
||||
--group-add sudo \
|
||||
-w /root/workspace \
|
||||
-v ${HOME}:/root/workspace \
|
||||
rocm/composable_kernel:ck_ub20.04_rocm5.3_release \
|
||||
/bin/bash
|
||||
```
|
||||
|
||||
If your current folder is different from ${HOME}, adjust the line `-v ${HOME}:/root/workspace` to fit your folder structure.
|
||||
|
||||
Inside the docker container current folder is "~/workspace", library path is "~/workspace/composable_kernel", navigate to the library
|
||||
|
||||
```
|
||||
cd composable_kernel/
|
||||
```
|
||||
|
||||
Create and go to the "build" directory
|
||||
|
||||
```
|
||||
mkdir build && cd build
|
||||
```
|
||||
|
||||
In the previous section we talked about target GPU architecture. Once you decide which one is right for you, run cmake using the right GPU_TARGETS flag
|
||||
|
||||
```
|
||||
cmake \
|
||||
-D CMAKE_PREFIX_PATH=/opt/rocm \
|
||||
-D CMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc \
|
||||
-D CMAKE_CXX_FLAGS="-O3" \
|
||||
-D CMAKE_BUILD_TYPE=Release \
|
||||
-D BUILD_DEV=OFF \
|
||||
-D GPU_TARGETS="gfx908;gfx90a;gfx1030" ..
|
||||
```
|
||||
|
||||
If everything went well the cmake run will end up with:
|
||||
|
||||
```
|
||||
-- Configuring done
|
||||
-- Generating done
|
||||
-- Build files have been written to: "/root/workspace/composable_kernel/build"
|
||||
```
|
||||
|
||||
Finally, we can build examples and tests
|
||||
|
||||
```
|
||||
make -j examples tests
|
||||
```
|
||||
|
||||
If everything is smooth, you'll see
|
||||
|
||||
```
|
||||
Scanning dependencies of target tests
|
||||
[100%] Built target tests
|
||||
```
|
||||
|
||||
## Run examples and tests
|
||||
|
||||
Examples are listed as test cases as well, so we can run all examples and tests with
|
||||
|
||||
```
|
||||
ctest
|
||||
```
|
||||
|
||||
You can check the list of all tests by running
|
||||
|
||||
```
|
||||
ctest -N
|
||||
```
|
||||
|
||||
We can also run them separately, here is a separate example execution.
|
||||
|
||||
```
|
||||
./bin/example_gemm_xdl_fp16 1 1 1
|
||||
```
|
||||
|
||||
The arguments "1 1 1" mean that we want to run this example in the mode: verify results with CPU, initialize matrices with integers and benchmark the kernel execution. You can play around with these parameters and see how output and execution results change.
|
||||
|
||||
If everything goes well and you have a device based on gfx908 or gfx90a architecture you should see something like
|
||||
|
||||
```
|
||||
a_m_k: dim 2, lengths {3840, 4096}, strides {4096, 1}
|
||||
b_k_n: dim 2, lengths {4096, 4096}, strides {1, 4096}
|
||||
c_m_n: dim 2, lengths {3840, 4096}, strides {4096, 1}
|
||||
launch_and_time_kernel: grid_dim {480, 1, 1}, block_dim {256, 1, 1}
|
||||
Warm up 1 time
|
||||
Start running 10 times...
|
||||
Perf: 1.10017 ms, 117.117 TFlops, 87.6854 GB/s, DeviceGemmXdl<256, 256, 128, 4, 8, 32, 32, 4, 2> NumPrefetch: 1, LoopScheduler: Default, PipelineVersion: v1
|
||||
```
|
||||
|
||||
Meanwhile, running it on a gfx1030 device should result in
|
||||
|
||||
```
|
||||
a_m_k: dim 2, lengths {3840, 4096}, strides {4096, 1}
|
||||
b_k_n: dim 2, lengths {4096, 4096}, strides {1, 4096}
|
||||
c_m_n: dim 2, lengths {3840, 4096}, strides {4096, 1}
|
||||
DeviceGemmXdl<256, 256, 128, 4, 8, 32, 32, 4, 2> NumPrefetch: 1, LoopScheduler: Default, PipelineVersion: v1 does not support this problem
|
||||
```
|
||||
|
||||
But don't panic, some of the operators are supported on gfx1030 architecture, so you can run a separate example like
|
||||
|
||||
```
|
||||
./bin/example_gemm_dl_fp16 1 1 1
|
||||
```
|
||||
|
||||
and it should result in something nice similar to
|
||||
|
||||
```
|
||||
a_m_k: dim 2, lengths {3840, 4096}, strides {1, 4096}
|
||||
b_k_n: dim 2, lengths {4096, 4096}, strides {4096, 1}
|
||||
c_m_n: dim 2, lengths {3840, 4096}, strides {4096, 1}
|
||||
arg.a_grid_desc_k0_m0_m1_k1_{2048, 3840, 2}
|
||||
arg.b_grid_desc_k0_n0_n1_k1_{2048, 4096, 2}
|
||||
arg.c_grid_desc_m_n_{ 3840, 4096}
|
||||
launch_and_time_kernel: grid_dim {960, 1, 1}, block_dim {256, 1, 1}
|
||||
Warm up 1 time
|
||||
Start running 10 times...
|
||||
Perf: 3.65695 ms, 35.234 TFlops, 26.3797 GB/s, DeviceGemmDl<256, 128, 128, 16, 2, 4, 4, 1>
|
||||
```
|
||||
|
||||
Or we can run a separate test
|
||||
|
||||
```
|
||||
ctest -R test_gemm_fp16
|
||||
```
|
||||
|
||||
If everything goes well you should see something like
|
||||
|
||||
```
|
||||
Start 121: test_gemm_fp16
|
||||
1/1 Test #121: test_gemm_fp16 ................... Passed 51.81 sec
|
||||
|
||||
100% tests passed, 0 tests failed out of 1
|
||||
```
|
||||
|
||||
## Summary
|
||||
|
||||
In this tutorial we took the first look at the Composable Kernel library, built it on your system and ran some examples and tests. Stay tuned, in the next tutorial we will run kernels with different configs to find out the best one for your hardware and task.
|
||||
|
||||
P.S.: Don't forget to switch out the cloud instance if you have launched one, you can find better ways to spend your money for sure!
|
||||
@@ -51,7 +51,7 @@ PROJECT_BRIEF = "prototype interfaces compatible with ROCm platform and
|
||||
# pixels and the maximum width should not exceed 200 pixels. Doxygen will copy
|
||||
# the logo to the output directory.
|
||||
|
||||
PROJECT_LOGO = ./rocm.jpg
|
||||
PROJECT_LOGO =
|
||||
|
||||
# The OUTPUT_DIRECTORY tag is used to specify the (relative or absolute) path
|
||||
# into which the generated documentation will be written. If a relative path is
|
||||
@@ -775,10 +775,10 @@ WARN_LOGFILE =
|
||||
# spaces. See also FILE_PATTERNS and EXTENSION_MAPPING
|
||||
# Note: If this tag is empty the current directory is searched.
|
||||
|
||||
INPUT = ../include/ck/tensor_operation/gpu/grid \
|
||||
../include/ck/tensor_operation/gpu/block \
|
||||
../include/ck/tensor_operation/gpu/thread \
|
||||
../library/include/ck/library/utility
|
||||
INPUT = ../../include/ck/tensor_operation/gpu/grid \
|
||||
../../include/ck/tensor_operation/gpu/block \
|
||||
../../include/ck/tensor_operation/gpu/thread \
|
||||
../../library/include/ck/library/utility
|
||||
|
||||
# This tag can be used to specify the character encoding of the source files
|
||||
# that doxygen parses. Internally doxygen uses the UTF-8 encoding. Doxygen uses
|
||||
1
docs/.sphinx/_toc.yml.in
Normal file
1
docs/.sphinx/_toc.yml.in
Normal file
@@ -0,0 +1 @@
|
||||
root: index
|
||||
1
docs/.sphinx/requirements.in
Normal file
1
docs/.sphinx/requirements.in
Normal file
@@ -0,0 +1 @@
|
||||
git+https://github.com/RadeonOpenCompute/rocm-docs-core.git
|
||||
269
docs/.sphinx/requirements.txt
Normal file
269
docs/.sphinx/requirements.txt
Normal file
@@ -0,0 +1,269 @@
|
||||
#
|
||||
# This file is autogenerated by pip-compile with Python 3.10
|
||||
# by the following command:
|
||||
#
|
||||
# pip-compile requirements.in
|
||||
#
|
||||
accessible-pygments==0.0.4
|
||||
# via pydata-sphinx-theme
|
||||
alabaster==0.7.13
|
||||
# via sphinx
|
||||
asttokens==2.2.1
|
||||
# via stack-data
|
||||
attrs==22.2.0
|
||||
# via
|
||||
# jsonschema
|
||||
# jupyter-cache
|
||||
babel==2.12.1
|
||||
# via
|
||||
# pydata-sphinx-theme
|
||||
# sphinx
|
||||
backcall==0.2.0
|
||||
# via ipython
|
||||
beautifulsoup4==4.12.0
|
||||
# via pydata-sphinx-theme
|
||||
breathe==4.34.0
|
||||
# via rocm-docs-core
|
||||
certifi==2022.12.7
|
||||
# via requests
|
||||
cffi==1.15.1
|
||||
# via pynacl
|
||||
charset-normalizer==3.1.0
|
||||
# via requests
|
||||
click==8.1.3
|
||||
# via
|
||||
# jupyter-cache
|
||||
# sphinx-external-toc
|
||||
comm==0.1.3
|
||||
# via ipykernel
|
||||
debugpy==1.6.6
|
||||
# via ipykernel
|
||||
decorator==5.1.1
|
||||
# via ipython
|
||||
deprecated==1.2.13
|
||||
# via pygithub
|
||||
docutils==0.16
|
||||
# via
|
||||
# breathe
|
||||
# myst-parser
|
||||
# pydata-sphinx-theme
|
||||
# rocm-docs-core
|
||||
# sphinx
|
||||
executing==1.2.0
|
||||
# via stack-data
|
||||
fastjsonschema==2.16.3
|
||||
# via nbformat
|
||||
gitdb==4.0.10
|
||||
# via gitpython
|
||||
gitpython==3.1.31
|
||||
# via rocm-docs-core
|
||||
greenlet==2.0.2
|
||||
# via sqlalchemy
|
||||
idna==3.4
|
||||
# via requests
|
||||
imagesize==1.4.1
|
||||
# via sphinx
|
||||
importlib-metadata==6.1.0
|
||||
# via
|
||||
# jupyter-cache
|
||||
# myst-nb
|
||||
importlib-resources==5.10.4
|
||||
# via rocm-docs-core
|
||||
ipykernel==6.22.0
|
||||
# via myst-nb
|
||||
ipython==8.11.0
|
||||
# via
|
||||
# ipykernel
|
||||
# myst-nb
|
||||
jedi==0.18.2
|
||||
# via ipython
|
||||
jinja2==3.1.2
|
||||
# via
|
||||
# myst-parser
|
||||
# sphinx
|
||||
jsonschema==4.17.3
|
||||
# via nbformat
|
||||
jupyter-cache==0.5.0
|
||||
# via myst-nb
|
||||
jupyter-client==8.1.0
|
||||
# via
|
||||
# ipykernel
|
||||
# nbclient
|
||||
jupyter-core==5.3.0
|
||||
# via
|
||||
# ipykernel
|
||||
# jupyter-client
|
||||
# nbformat
|
||||
linkify-it-py==1.0.3
|
||||
# via myst-parser
|
||||
markdown-it-py==2.2.0
|
||||
# via
|
||||
# mdit-py-plugins
|
||||
# myst-parser
|
||||
markupsafe==2.1.2
|
||||
# via jinja2
|
||||
matplotlib-inline==0.1.6
|
||||
# via
|
||||
# ipykernel
|
||||
# ipython
|
||||
mdit-py-plugins==0.3.5
|
||||
# via myst-parser
|
||||
mdurl==0.1.2
|
||||
# via markdown-it-py
|
||||
myst-nb==0.17.1
|
||||
# via rocm-docs-core
|
||||
myst-parser[linkify]==0.18.1
|
||||
# via
|
||||
# myst-nb
|
||||
# rocm-docs-core
|
||||
nbclient==0.5.13
|
||||
# via
|
||||
# jupyter-cache
|
||||
# myst-nb
|
||||
nbformat==5.8.0
|
||||
# via
|
||||
# jupyter-cache
|
||||
# myst-nb
|
||||
# nbclient
|
||||
nest-asyncio==1.5.6
|
||||
# via
|
||||
# ipykernel
|
||||
# nbclient
|
||||
packaging==23.0
|
||||
# via
|
||||
# ipykernel
|
||||
# pydata-sphinx-theme
|
||||
# sphinx
|
||||
parso==0.8.3
|
||||
# via jedi
|
||||
pexpect==4.8.0
|
||||
# via ipython
|
||||
pickleshare==0.7.5
|
||||
# via ipython
|
||||
platformdirs==3.1.1
|
||||
# via jupyter-core
|
||||
prompt-toolkit==3.0.38
|
||||
# via ipython
|
||||
psutil==5.9.4
|
||||
# via ipykernel
|
||||
ptyprocess==0.7.0
|
||||
# via pexpect
|
||||
pure-eval==0.2.2
|
||||
# via stack-data
|
||||
pycparser==2.21
|
||||
# via cffi
|
||||
pydata-sphinx-theme==0.13.1
|
||||
# via sphinx-book-theme
|
||||
pygithub==1.57
|
||||
# via rocm-docs-core
|
||||
pygments==2.14.0
|
||||
# via
|
||||
# accessible-pygments
|
||||
# ipython
|
||||
# pydata-sphinx-theme
|
||||
# sphinx
|
||||
pyjwt==2.6.0
|
||||
# via pygithub
|
||||
pynacl==1.5.0
|
||||
# via pygithub
|
||||
pyrsistent==0.19.3
|
||||
# via jsonschema
|
||||
python-dateutil==2.8.2
|
||||
# via jupyter-client
|
||||
pyyaml==6.0
|
||||
# via
|
||||
# jupyter-cache
|
||||
# myst-nb
|
||||
# myst-parser
|
||||
# sphinx-external-toc
|
||||
pyzmq==25.0.2
|
||||
# via
|
||||
# ipykernel
|
||||
# jupyter-client
|
||||
requests==2.28.2
|
||||
# via
|
||||
# pygithub
|
||||
# sphinx
|
||||
rocm-docs-core @ git+https://github.com/RadeonOpenCompute/rocm-docs-core.git
|
||||
# via -r requirements.in
|
||||
six==1.16.0
|
||||
# via
|
||||
# asttokens
|
||||
# python-dateutil
|
||||
smmap==5.0.0
|
||||
# via gitdb
|
||||
snowballstemmer==2.2.0
|
||||
# via sphinx
|
||||
soupsieve==2.4
|
||||
# via beautifulsoup4
|
||||
sphinx==4.3.1
|
||||
# via
|
||||
# breathe
|
||||
# myst-nb
|
||||
# myst-parser
|
||||
# pydata-sphinx-theme
|
||||
# rocm-docs-core
|
||||
# sphinx-book-theme
|
||||
# sphinx-copybutton
|
||||
# sphinx-design
|
||||
# sphinx-external-toc
|
||||
# sphinx-notfound-page
|
||||
sphinx-book-theme==1.0.0rc2
|
||||
# via rocm-docs-core
|
||||
sphinx-copybutton==0.5.1
|
||||
# via rocm-docs-core
|
||||
sphinx-design==0.3.0
|
||||
# via rocm-docs-core
|
||||
sphinx-external-toc==0.3.1
|
||||
# via rocm-docs-core
|
||||
sphinx-notfound-page==0.8.3
|
||||
# via rocm-docs-core
|
||||
sphinxcontrib-applehelp==1.0.4
|
||||
# via sphinx
|
||||
sphinxcontrib-devhelp==1.0.2
|
||||
# via sphinx
|
||||
sphinxcontrib-htmlhelp==2.0.1
|
||||
# via sphinx
|
||||
sphinxcontrib-jsmath==1.0.1
|
||||
# via sphinx
|
||||
sphinxcontrib-qthelp==1.0.3
|
||||
# via sphinx
|
||||
sphinxcontrib-serializinghtml==1.1.5
|
||||
# via sphinx
|
||||
sqlalchemy==1.4.47
|
||||
# via jupyter-cache
|
||||
stack-data==0.6.2
|
||||
# via ipython
|
||||
tabulate==0.9.0
|
||||
# via jupyter-cache
|
||||
tornado==6.2
|
||||
# via
|
||||
# ipykernel
|
||||
# jupyter-client
|
||||
traitlets==5.9.0
|
||||
# via
|
||||
# comm
|
||||
# ipykernel
|
||||
# ipython
|
||||
# jupyter-client
|
||||
# jupyter-core
|
||||
# matplotlib-inline
|
||||
# nbclient
|
||||
# nbformat
|
||||
typing-extensions==4.5.0
|
||||
# via
|
||||
# myst-nb
|
||||
# myst-parser
|
||||
uc-micro-py==1.0.1
|
||||
# via linkify-it-py
|
||||
urllib3==1.26.15
|
||||
# via requests
|
||||
wcwidth==0.2.6
|
||||
# via prompt-toolkit
|
||||
wrapt==1.15.0
|
||||
# via deprecated
|
||||
zipp==3.15.0
|
||||
# via importlib-metadata
|
||||
|
||||
# The following packages are considered to be unsafe in a requirements file:
|
||||
# setuptools
|
||||
@@ -49,4 +49,4 @@ used in the CK GPU implementation of Flashattention.
|
||||
|
||||
.. doxygenstruct:: ck::ThreadwiseTensorSliceTransfer_StaticToStatic
|
||||
|
||||
.. bibliography::
|
||||
.. bibliography::
|
||||
@@ -72,4 +72,4 @@ Else if :math:`j>1`,
|
||||
\tilde{Y}_{ij} &= \diag(z^{new}_{i})^{-1} \exp(\tilde{m}_{ij} - m^{new}_i ) \tilde{P}_{ij} \\
|
||||
z_i &= z^{new}_i \\
|
||||
m_i &= m^{new}_i \\
|
||||
\end{align}
|
||||
\end{align}
|
||||
24
docs/conf.py
Normal file
24
docs/conf.py
Normal file
@@ -0,0 +1,24 @@
|
||||
# Configuration file for the Sphinx documentation builder.
|
||||
#
|
||||
# This file only contains a selection of the most common options. For a full
|
||||
# list see the documentation:
|
||||
# https://www.sphinx-doc.org/en/master/usage/configuration.html
|
||||
|
||||
from rocm_docs import ROCmDocs
|
||||
|
||||
docs_core = ROCmDocs("Composable Kernel Documentation")
|
||||
docs_core.run_doxygen()
|
||||
docs_core.setup()
|
||||
|
||||
mathjax3_config = {
|
||||
'tex': {
|
||||
'macros': {
|
||||
'diag': '\\operatorname{diag}',
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
bibtex_bibfiles = ['refs.bib']
|
||||
|
||||
for sphinx_var in ROCmDocs.SPHINX_VARS:
|
||||
globals()[sphinx_var] = getattr(docs_core, sphinx_var)
|
||||
|
Before Width: | Height: | Size: 552 KiB After Width: | Height: | Size: 552 KiB |
|
Before Width: | Height: | Size: 536 KiB After Width: | Height: | Size: 536 KiB |
52
docs/index.rst
Normal file
52
docs/index.rst
Normal file
@@ -0,0 +1,52 @@
|
||||
============================
|
||||
Composable Kernel User Guide
|
||||
============================
|
||||
|
||||
------------
|
||||
Introduction
|
||||
------------
|
||||
|
||||
This document contains instructions for installing, using, and contributing to Composable Kernel (CK).
|
||||
|
||||
-----------
|
||||
Methodology
|
||||
-----------
|
||||
|
||||
Composable Kernel (CK) library aims to provide a programming model for writing performance critical kernels for machine learning workloads across multiple architectures including GPUs, CPUs, etc, through general purpose kernel languages, like HIP C++.
|
||||
|
||||
CK utilizes two concepts to achieve performance portability and code maintainability:
|
||||
|
||||
* A tile-based programming model
|
||||
* Algorithm complexity reduction for complex ML operators, using innovative technique we call "Tensor Coordinate Transformation".
|
||||
|
||||
.. image:: data/ck_component.png
|
||||
:alt: CK Components
|
||||
|
||||
--------------
|
||||
Code Structure
|
||||
--------------
|
||||
|
||||
Current CK library are structured into 4 layers:
|
||||
|
||||
* "Templated Tile Operators" layer
|
||||
* "Templated Kernel and Invoker" layer
|
||||
* "Instantiated Kernel and Invoker" layer
|
||||
* "Client API" layer
|
||||
|
||||
.. image:: data/ck_layer.png
|
||||
:alt: CK Layers
|
||||
|
||||
Documentation Roadmap
|
||||
^^^^^^^^^^^^^^^^^^^^^
|
||||
The following is a list of CK documents in the suggested reading order:
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 5
|
||||
:caption: Contents:
|
||||
:numbered:
|
||||
|
||||
tutorial_hello_world
|
||||
dockerhub
|
||||
Supported_Primitives_Guide
|
||||
API_Reference_Guide
|
||||
Contributors_Guide
|
||||
@@ -1,15 +0,0 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -eu
|
||||
|
||||
# Make this directory the PWD
|
||||
cd "$(dirname "${BASH_SOURCE[0]}")"
|
||||
|
||||
# Build doxygen info
|
||||
bash run_doxygen.sh
|
||||
|
||||
# Build sphinx docs
|
||||
cd source
|
||||
make clean
|
||||
make -e SPHINXOPTS="-t html" html
|
||||
make latexpdf
|
||||
@@ -1,10 +0,0 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -eu
|
||||
|
||||
# Make this directory the PWD
|
||||
cd "$(dirname "${BASH_SOURCE[0]}")"
|
||||
|
||||
# Build the doxygen info
|
||||
rm -rf docBin
|
||||
doxygen Doxyfile
|
||||
@@ -1,13 +0,0 @@
|
||||
************
|
||||
Disclaimer
|
||||
************
|
||||
-------------------------------
|
||||
AMD's standard legal Disclaimer
|
||||
-------------------------------
|
||||
|
||||
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions, and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. Any computer system has risks of security vulnerabilities that cannot be completely prevented or mitigated. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes. THIS INFORMATION IS PROVIDED 'AS IS." AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS, OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY RELIANCE, DIRECT, INDIRECT, SPECIAL, OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. AMD, the AMD Arrow logo, Radeon, Ryzen, Epyc, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies. Google(R) is a registered trademark of Google LLC. PCIe(R) is a registered trademark of PCI-SIG Corporation. Linux(R) is the registered trademark of Linus Torvalds in the U.S. and other countries. Ubuntu(R) and the Ubuntu logo are registered trademarks of Canonical Ltd. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies. (C)2023 Advanced Micro Devices, Inc. All rights reserved.
|
||||
|
||||
----------------------
|
||||
Third Party Disclaimer
|
||||
----------------------
|
||||
Third-party content is licensed to you directly by the third party that owns the content and is not licensed to you by AMD. ALL LINKED THIRD-PARTY CONTENT IS PROVIDED "AS IS" WITHOUT A WARRANTY OF ANY KIND. USE OF SUCH THIRD-PARTY CONTENT IS DONE AT YOUR SOLE DISCRETION AND UNDER NO CIRCUMSTANCES WILL AMD BE LIABLE TO YOU FOR ANY THIRD-PARTY CONTENT. YOU ASSUME ALL RISK AND ARE SOLELY RESPONSIBLE FOR ANY DAMAGES THAT MAY ARISE FROM YOUR USE OF THIRD-PARTY CONTENT.
|
||||
@@ -1,15 +0,0 @@
|
||||
=====================
|
||||
Getting Started Guide
|
||||
=====================
|
||||
|
||||
------------
|
||||
Introduction
|
||||
------------
|
||||
|
||||
This document contains instructions for installing, using, and contributing to Composable Kernel (CK).
|
||||
|
||||
Documentation Roadmap
|
||||
^^^^^^^^^^^^^^^^^^^^^
|
||||
The following is a list of CK documents in the suggested reading order:
|
||||
|
||||
[TODO]
|
||||
@@ -1,20 +0,0 @@
|
||||
# Minimal makefile for Sphinx documentation
|
||||
#
|
||||
|
||||
# You can set these variables from the command line.
|
||||
SPHINXOPTS =
|
||||
SPHINXBUILD = sphinx-build
|
||||
SPHINXPROJ = CK
|
||||
SOURCEDIR = .
|
||||
BUILDDIR = _build
|
||||
|
||||
# Put it first so that "make" without argument is like "make help".
|
||||
help:
|
||||
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
|
||||
|
||||
.PHONY: help Makefile
|
||||
|
||||
# Catch-all target: route all unknown targets to Sphinx using the new
|
||||
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
|
||||
%: Makefile
|
||||
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
|
||||
@@ -1,219 +0,0 @@
|
||||
"""Copyright (C) 2018-2023 Advanced Micro Devices, Inc. All rights reserved.
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
of this software and associated documentation files (the "Software"), to deal
|
||||
in the Software without restriction, including without limitation the rights
|
||||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell cop-
|
||||
ies of the Software, and to permit persons to whom the Software is furnished
|
||||
to do so, subject to the following conditions:
|
||||
|
||||
The above copyright notice and this permission notice shall be included in all
|
||||
copies or substantial portions of the Software.
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IM-
|
||||
PLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
|
||||
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
|
||||
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
|
||||
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNE-
|
||||
CTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
||||
"""
|
||||
|
||||
# -*- coding: utf-8 -*-
|
||||
#
|
||||
# Composable Kernel (CK) docuumentation build configuration file, based on
|
||||
# rocBLAS documentation build configuration file, created by
|
||||
# sphinx-quickstart on Mon Jan 8 16:34:42 2018.
|
||||
#
|
||||
# This file is execfile()d with the current directory set to its
|
||||
# containing dir.
|
||||
#
|
||||
# Note that not all possible configuration values are present in this
|
||||
# autogenerated file.
|
||||
#
|
||||
# All configuration values have a default; values that are commented out
|
||||
# serve to show the default.
|
||||
|
||||
# If extensions (or modules to document with autodoc) are in another directory,
|
||||
# add these directories to sys.path here. If the directory is relative to the
|
||||
# documentation root, use os.path.abspath to make it absolute, like shown here.
|
||||
#
|
||||
# import os
|
||||
# import sys
|
||||
# sys.path.insert(0, os.path.abspath('.'))
|
||||
|
||||
import os
|
||||
import sys
|
||||
import subprocess
|
||||
|
||||
read_the_docs_build = os.environ.get('READTHEDOCS', None) == 'True'
|
||||
|
||||
if read_the_docs_build:
|
||||
subprocess.call('../run_doxygen.sh')
|
||||
|
||||
# -- General configuration ------------------------------------------------
|
||||
|
||||
# If your documentation needs a minimal Sphinx version, state it here.
|
||||
#
|
||||
# needs_sphinx = '1.0'
|
||||
|
||||
# Add any Sphinx extension module names here, as strings. They can be
|
||||
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
|
||||
# ones.
|
||||
extensions = ['sphinx.ext.mathjax', 'breathe', 'sphinxcontrib.bibtex']
|
||||
|
||||
breathe_projects = { "CK": "../docBin/xml" }
|
||||
breathe_default_project = "CK"
|
||||
|
||||
bibtex_bibfiles = ['refs.bib']
|
||||
|
||||
# Add any paths that contain templates here, relative to this directory.
|
||||
templates_path = ['_templates']
|
||||
|
||||
# The suffix(es) of source filenames.
|
||||
# You can specify multiple suffix as a list of string:
|
||||
#
|
||||
# source_suffix = ['.rst', '.md']
|
||||
source_suffix = '.rst'
|
||||
|
||||
# The master toctree document.
|
||||
master_doc = 'index'
|
||||
|
||||
# General information about the project.
|
||||
project = u'Composable Kernel (CK)'
|
||||
copyright = u'2018-2023, Advanced Micro Devices'
|
||||
author = u'Advanced Micro Devices'
|
||||
|
||||
# The version info for the project you're documenting, acts as replacement for
|
||||
# |version| and |release|, also used in various other places throughout the
|
||||
# built documents.
|
||||
#
|
||||
# The short X.Y version.
|
||||
#version = u'0.8'
|
||||
# The full version, including alpha/beta/rc tags.
|
||||
#release = u'0.8'
|
||||
|
||||
# The language for content autogenerated by Sphinx. Refer to documentation
|
||||
# for a list of supported languages.
|
||||
#
|
||||
# This is also used if you do content translation via gettext catalogs.
|
||||
# Usually you set "language" from the command line for these cases.
|
||||
language = 'en'
|
||||
|
||||
# List of patterns, relative to source directory, that match files and
|
||||
# directories to ignore when looking for source files.
|
||||
# This patterns also effect to html_static_path and html_extra_path
|
||||
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']
|
||||
|
||||
# The name of the Pygments (syntax highlighting) style to use.
|
||||
pygments_style = 'sphinx'
|
||||
|
||||
# If true, `todo` and `todoList` produce output, else they produce nothing.
|
||||
todo_include_todos = False
|
||||
|
||||
|
||||
# -- Options for HTML output ----------------------------------------------
|
||||
|
||||
# The theme to use for HTML and HTML Help pages. See the documentation for
|
||||
# a list of builtin themes.
|
||||
#
|
||||
# html_theme = 'alabaster'
|
||||
|
||||
#if read_the_docs_build:
|
||||
# html_theme = 'default'
|
||||
#else:
|
||||
import sphinx_rtd_theme
|
||||
html_theme = "sphinx_rtd_theme"
|
||||
html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
|
||||
html_logo = "rocm_logo.png"
|
||||
|
||||
# Theme options are theme-specific and customize the look and feel of a theme
|
||||
# further. For a list of options available for each theme, see the
|
||||
# documentation.
|
||||
html_theme_options = {
|
||||
'logo_only': True,
|
||||
'display_version': True
|
||||
}
|
||||
|
||||
# Add any paths that contain custom static files (such as style sheets) here,
|
||||
# relative to this directory. They are copied after the builtin static files,
|
||||
# so a file named "default.css" will overwrite the builtin "default.css".
|
||||
#html_static_path = ['_static']
|
||||
|
||||
# Custom sidebar templates, must be a dictionary that maps document names
|
||||
# to template names.
|
||||
#
|
||||
# This is required for the alabaster theme
|
||||
# refs: http://alabaster.readthedocs.io/en/latest/installation.html#sidebars
|
||||
# html_sidebars = {
|
||||
# '**': [
|
||||
# 'relations.html', # needs 'show_related': True theme option to display
|
||||
# 'searchbox.html',
|
||||
# ]
|
||||
# }
|
||||
|
||||
mathjax3_config = {
|
||||
'tex': {
|
||||
'macros': {
|
||||
'diag': '\\operatorname{diag}',
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# -- Options for HTMLHelp output ------------------------------------------
|
||||
|
||||
# Output file base name for HTML help builder.
|
||||
htmlhelp_basename = 'CKdoc'
|
||||
|
||||
|
||||
# -- Options for LaTeX output ---------------------------------------------
|
||||
|
||||
latex_elements = {
|
||||
# The paper size ('letterpaper' or 'a4paper').
|
||||
#
|
||||
'papersize': 'letterpaper',
|
||||
|
||||
# The font size ('10pt', '11pt' or '12pt').
|
||||
#
|
||||
'pointsize': '10pt',
|
||||
|
||||
# Additional stuff for the LaTeX preamble.
|
||||
#
|
||||
'preamble': r'''
|
||||
\setcounter{tocdepth}{5}
|
||||
\newcommand{\diag}{\operatorname{diag}}
|
||||
''',
|
||||
|
||||
# Latex figure (float) alignment
|
||||
#
|
||||
# 'figure_align': 'htbp',
|
||||
}
|
||||
|
||||
# Grouping the document tree into LaTeX files. List of tuples
|
||||
# (source start file, target name, title,
|
||||
# author, documentclass [howto, manual, or own class]).
|
||||
latex_documents = [
|
||||
(master_doc, 'CK.tex', u'Composabl Kernel (CK) Documentation',
|
||||
u'Advanced Micro Devices', 'manual'),
|
||||
]
|
||||
|
||||
|
||||
# -- Options for manual page output ---------------------------------------
|
||||
|
||||
# One entry per manual page. List of tuples
|
||||
# (source start file, name, description, authors, manual section).
|
||||
man_pages = [
|
||||
(master_doc, 'ck', u'Composable Kernel (CK) Documentation',
|
||||
[author], 1)
|
||||
]
|
||||
|
||||
|
||||
# -- Options for Texinfo output -------------------------------------------
|
||||
|
||||
# Grouping the document tree into Texinfo files. List of tuples
|
||||
# (source start file, target name, title, author,
|
||||
# dir menu entry, description, category)
|
||||
texinfo_documents = [
|
||||
(master_doc, 'CK', u'Composable Kernel (CK) Documentation',
|
||||
author, 'CK', 'Composable Kernel for AMD ROCm',
|
||||
'Miscellaneous'),
|
||||
]
|
||||
@@ -1,16 +0,0 @@
|
||||
============================
|
||||
Composable Kernel User Guide
|
||||
============================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 5
|
||||
:caption: Contents:
|
||||
:numbered:
|
||||
|
||||
Linux_Install_Guide
|
||||
tutorial_hello_world
|
||||
dockerhub
|
||||
Supported_Primitives_Guide
|
||||
API_Reference_Guide
|
||||
Contributors_Guide
|
||||
Disclaimer
|
||||
Binary file not shown.
|
Before Width: | Height: | Size: 347 KiB |
Reference in New Issue
Block a user