mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-04-19 22:39:03 +00:00
creation of install doc and refactor of doc in general (#1908)
* creation of install doc and refactor of doc in general * updates based on review comments * updated based on review comments * updated readme and contributors markdown * added extra note to not use -j on its own * added note about smoke tests and regression tests * made changes as per Illia's feedback --------- Co-authored-by: Aviral Goel <aviral.goel@amd.com>
This commit is contained in:
@@ -20,10 +20,11 @@ Tejash Shah, 2019-2020
|
||||
Xiaoyan Zhou, 2020
|
||||
|
||||
[Jianfeng Yan](https://github.com/j4yan), 2021-2022
|
||||
|
||||
[Jun Liu](https://github.com/junliume), 2021-2024
|
||||
|
||||
## Product Manager
|
||||
[Jun Liu](https://github.com/junliume)
|
||||
[John Afaganis](https://github.com/afagaj)
|
||||
|
||||
|
||||
|
||||
## Contributors
|
||||
|
||||
@@ -104,6 +104,7 @@ Docker images are available on [DockerHub](https://hub.docker.com/r/rocm/composa
|
||||
```bash
|
||||
make -j install
|
||||
```
|
||||
**[See Note on -j](#notes)**
|
||||
|
||||
## Optional post-install steps
|
||||
|
||||
@@ -146,7 +147,8 @@ Docker images are available on [DockerHub](https://hub.docker.com/r/rocm/composa
|
||||
python3 -m sphinx -T -E -b html -d _build/doctrees -D language=en . _build/html
|
||||
```
|
||||
|
||||
Note the `-j` option for building with multiple threads in parallel, which speeds up the build significantly.
|
||||
### Notes
|
||||
The `-j` option for building with multiple threads in parallel, which speeds up the build significantly.
|
||||
However, `-j` launches unlimited number of threads, which can cause the build to run out of memory and
|
||||
crash. On average, you should expect each thread to use ~2Gb of RAM.
|
||||
Depending on the number of CPU cores and the amount of RAM on your system, you may want to
|
||||
@@ -211,4 +213,4 @@ script/uninstall_precommit.sh
|
||||
```
|
||||
|
||||
If you need to temporarily disable pre-commit hooks, you can add the `--no-verify` option to the
|
||||
`git commit` command.
|
||||
`git commit` command.
|
||||
@@ -1,18 +1,15 @@
|
||||
.. meta::
|
||||
:description: Composable Kernel documentation and API reference library
|
||||
:keywords: composable kernel, CK, ROCm, API, documentation
|
||||
:description: Composable Kernel mathematical basis
|
||||
:keywords: composable kernel, CK, ROCm, API, mathematics, algorithm
|
||||
|
||||
.. _supported-primitives:
|
||||
|
||||
********************************************************************
|
||||
Supported Primitives Guide
|
||||
Composable Kernel mathematical basis
|
||||
********************************************************************
|
||||
|
||||
This document contains details of supported primitives in Composable Kernel (CK). In contrast to the API Reference Guide, the Supported Primitives Guide is an introduction to the math which underpins the algorithms implemented in CK.
|
||||
This is an introduction to the math which underpins the algorithms implemented in Composable Kernel.
|
||||
|
||||
------------
|
||||
Softmax
|
||||
------------
|
||||
|
||||
For vectors :math:`x^{(1)}, x^{(2)}, \ldots, x^{(T)}` of size :math:`B` you can decompose the
|
||||
softmax of concatenated :math:`x = [ x^{(1)}\ | \ \ldots \ | \ x^{(T)} ]` as,
|
||||
29
docs/conceptual/Composable-Kernel-structure.rst
Normal file
29
docs/conceptual/Composable-Kernel-structure.rst
Normal file
@@ -0,0 +1,29 @@
|
||||
.. meta::
|
||||
:description: Composable Kernel structure
|
||||
:keywords: composable kernel, CK, ROCm, API, structure
|
||||
|
||||
.. _what-is-ck:
|
||||
|
||||
********************************************************************
|
||||
Composable Kernel structure
|
||||
********************************************************************
|
||||
|
||||
The Composable Kernel library uses a tile-based programming model and tensor coordinate transformation to achieve performance portability and code maintainability. Tensor coordinate transformation is a complexity reduction technique for complex machine learning operators.
|
||||
|
||||
|
||||
.. image:: ../data/ck_component.png
|
||||
:alt: CK Components
|
||||
|
||||
|
||||
The Composable Kernel library consists of four layers:
|
||||
|
||||
* a templated tile operator layer
|
||||
* a templated kernel and invoker layer
|
||||
* an instantiated kernel and invoker layer
|
||||
* a client API layer.
|
||||
|
||||
A wrapper component is included to simplify tensor transform operations.
|
||||
|
||||
.. image:: ../data/ck_layer.png
|
||||
:alt: CK Layers
|
||||
|
||||
@@ -1,41 +0,0 @@
|
||||
.. meta::
|
||||
:description: Composable Kernel documentation and API reference library
|
||||
:keywords: composable kernel, CK, ROCm, API, documentation
|
||||
|
||||
.. _what-is-ck:
|
||||
|
||||
********************************************************************
|
||||
What is the Composable Kernel library
|
||||
********************************************************************
|
||||
|
||||
|
||||
Methodology
|
||||
===========
|
||||
|
||||
The Composable Kernel (CK) library provides a programming model for writing performance critical kernels for machine learning workloads across multiple architectures including GPUs and CPUs, through general purpose kernel languages like HIP C++.
|
||||
|
||||
CK utilizes two concepts to achieve performance portability and code maintainability:
|
||||
|
||||
* A tile-based programming model
|
||||
* Algorithm complexity reduction for complex ML operators using an innovative technique called
|
||||
"Tensor Coordinate Transformation".
|
||||
|
||||
.. image:: ../data/ck_component.png
|
||||
:alt: CK Components
|
||||
|
||||
|
||||
Code Structure
|
||||
==============
|
||||
|
||||
The CK library is structured into 4 layers:
|
||||
|
||||
* "Templated Tile Operators" layer
|
||||
* "Templated Kernel and Invoker" layer
|
||||
* "Instantiated Kernel and Invoker" layer
|
||||
* "Client API" layer
|
||||
|
||||
It also includes a simple wrapper component used to perform tensor transform operations more easily and with fewer lines of code.
|
||||
|
||||
.. image:: ../data/ck_layer.png
|
||||
:alt: CK Layers
|
||||
|
||||
@@ -8,30 +8,33 @@
|
||||
Composable Kernel User Guide
|
||||
********************************************************************
|
||||
|
||||
The Composable Kernel (CK) library provides a programming model for writing performance critical kernels for machine learning workloads across multiple architectures including GPUs and CPUs, through general purpose kernel languages like HIP C++. This document contains instructions for installing, using, and contributing to the Composable Kernel project. To learn more see :ref:`what-is-ck`.
|
||||
The Composable Kernel library provides a programming model for writing performance critical kernels for machine learning workloads across multiple architectures including GPUs and CPUs, through general purpose kernel languages such as `HIP C++ <https://rocm.docs.amd.com/projects/HIP/en/latest/index.html>`_.
|
||||
|
||||
The CK documentation is structured as follows:
|
||||
The Composable Kernel repository is located at `https://github.com/ROCm/composable-kernel <https://github.com/ROCm/composable-kernel>`_.
|
||||
|
||||
.. grid:: 2
|
||||
:gutter: 3
|
||||
|
||||
.. grid-item-card:: Installation
|
||||
.. grid-item-card:: Install
|
||||
|
||||
* :ref:`docker-hub`
|
||||
* :doc:`Composable Kernel prerequisites <./install/Composable-Kernel-prerequisites>`
|
||||
* :doc:`Build and install Composable Kernel <./install/Composable-Kernel-install>`
|
||||
* :doc:`Build and install Composable Kernel on a Docker image <./install/Composable-Kernel-Docker>`
|
||||
|
||||
.. grid-item-card:: Conceptual
|
||||
|
||||
* :ref:`what-is-ck`
|
||||
* :doc:`Composable Kernel structure <./conceptual/Composable-Kernel-structure>`
|
||||
* :doc:`Composable Kernel mathematical basis <./conceptual/Composable-Kernel-math>`
|
||||
|
||||
.. grid-item-card:: API reference
|
||||
.. grid-item-card:: Tutorials
|
||||
|
||||
* :doc:`Composable Kernel examples and tests <./tutorial/Composable-Kernel-examples>`
|
||||
|
||||
.. grid-item-card:: Reference
|
||||
|
||||
* :ref:`supported-primitives`
|
||||
* :ref:`api-reference`
|
||||
* :ref:`wrapper`
|
||||
|
||||
.. grid-item-card:: Tutorial
|
||||
|
||||
* :ref:`hello-world`
|
||||
|
||||
To contribute to the documentation refer to `Contributing to ROCm <https://rocm.docs.amd.com/en/latest/contribute/contributing.html>`_.
|
||||
|
||||
|
||||
16
docs/install/Composable-Kernel-Docker.rst
Normal file
16
docs/install/Composable-Kernel-Docker.rst
Normal file
@@ -0,0 +1,16 @@
|
||||
.. meta::
|
||||
:description: Composable Kernel docker files
|
||||
:keywords: composable kernel, CK, ROCm, API, docker
|
||||
|
||||
.. _docker-hub:
|
||||
|
||||
********************************************************************
|
||||
Composable Kernel Docker containers
|
||||
********************************************************************
|
||||
|
||||
Docker images that include all the required prerequisites for building Composable Kernel are available on `Docker Hub <https://hub.docker.com/r/rocm/composable_kernel/tags>`_.
|
||||
|
||||
The images also contain `ROCm <https://rocm.docs.amd.com/en/latest/index.html>`_, `CMake <https://cmake.org/getting-started/>`_, and the `ROCm LLVM compiler infrastructure <https://rocm.docs.amd.com/projects/llvm-project/en/latest/index.html>`_.
|
||||
|
||||
Composable Kernel Docker images are named according to their operating system and ROCm version. For example, a Docker image named ``ck_ub22.04_rocm6.3`` would correspond to an Ubuntu 22.04 image with ROCm 6.3.
|
||||
|
||||
72
docs/install/Composable-Kernel-install.rst
Normal file
72
docs/install/Composable-Kernel-install.rst
Normal file
@@ -0,0 +1,72 @@
|
||||
.. meta::
|
||||
:description: Composable Kernel build and install
|
||||
:keywords: composable kernel, CK, ROCm, API, documentation, install
|
||||
|
||||
******************************************************
|
||||
Building and installing Composable Kernel with CMake
|
||||
******************************************************
|
||||
|
||||
Before you begin, clone the `Composable Kernel GitHub repository <https://github.com/ROCm/composable_kernel.git>`_ and create a ``build`` directory in its root:
|
||||
|
||||
.. code:: shell
|
||||
|
||||
git clone https://github.com/ROCm/composable_kernel.git
|
||||
cd composable_kernel
|
||||
mkdir build
|
||||
|
||||
Change directory to the ``build`` directory and generate the makefile using the ``cmake`` command. Two build options are required:
|
||||
|
||||
* ``CMAKE_PREFIX_PATH``: The ROCm installation path. ROCm is installed in ``/opt/rocm`` by default.
|
||||
* ``CMAKE_CXX_COMPILER``: The path to the Clang compiler. Clang is found at ``/opt/rocm/llvm/bin/clang++`` by default.
|
||||
|
||||
|
||||
.. code:: shell
|
||||
|
||||
cd build
|
||||
cmake ../. -D CMAKE_PREFIX_PATH="/opt/rocm" -D CMAKE_CXX_COMPILER="/opt/rocm/llvm/bin/clang++" [-D<OPTION1=VALUE1> [-D<OPTION2=VALUE2>] ...]
|
||||
|
||||
|
||||
Other build options are:
|
||||
|
||||
* ``DISABLE_DL_KERNELS``: Set this to "ON" to not build deep learning (DL) and data parallel primitive (DPP) instances.
|
||||
|
||||
.. note::
|
||||
|
||||
DL and DPP instances are useful on architectures that don't support XDL or WMMA.
|
||||
|
||||
* ``CK_USE_FP8_ON_UNSUPPORTED_ARCH``: Set to ``ON`` to build FP8 data type instances on gfx90a without native FP8 support.
|
||||
* ``GPU_TARGETS``: Target architectures. Target architectures in this list must all be different versions of the same architectures. Enclose the list of targets in quotation marks. Separate multiple targets with semicolons (``;``). For example, ``cmake -D GPU_TARGETS="gfx908;gfx90a"``. This option is required to build tests and examples.
|
||||
* ``GPU_ARCHS``: Target architectures. Target architectures in this list are not limited to different versions of the same architectures. Enclose the list of targets in quotation marks. Separate multiple targets with semicolons (``;``). For example, ``cmake -D GPU_TARGETS="gfx908;gfx1100"``.
|
||||
* ``CMAKE_BUILD_TYPE``: The build type. Can be ``None``, ``Release``, ``Debug``, ``RelWithDebInfo``, or ``MinSizeRel``. CMake will use ``Release`` by default.
|
||||
|
||||
.. Note::
|
||||
|
||||
If neither ``GPU_TARGETS`` nor ``GPU_ARCHS`` is specified, Composable Kernel will be built for all targets supported by the compiler.
|
||||
|
||||
Build Composable Kernel using the generated makefile. This will build the library, the examples, and the tests, and save them to ``bin``.
|
||||
|
||||
.. code:: shell
|
||||
|
||||
make -j20
|
||||
|
||||
The ``-j`` option speeds up the build by using multiple threads in parallel. For example, ``-j20`` uses twenty threads in parallel. On average, each thread will use 2GB of memory. Make sure that the number of threads you use doesn't exceed the available memory in your system.
|
||||
|
||||
Using ``-j`` alone will launch an unlimited number of threads and is not recommended.
|
||||
|
||||
Install the Composable Kernel library:
|
||||
|
||||
.. code:: shell
|
||||
|
||||
make install
|
||||
|
||||
After running ``make install``, the Composable Kernel files will be saved to the following locations:
|
||||
|
||||
* Library files: ``/opt/rocm/lib/``
|
||||
* Header files: ``/opt/rocm/include/ck/`` and ``/opt/rocm/include/ck_tile/``
|
||||
* Examples, tests, and ckProfiler: ``/opt/rocm/bin/``
|
||||
|
||||
For information about ckProfiler, see `the ckProfiler readme file <https://github.com/ROCm/composable_kernel/blob/develop/profiler/README.md>`_.
|
||||
|
||||
For information about running the examples and tests, see :doc:`Composable Kernel examples and tests <../tutorial/Composable-Kernel-examples>`.
|
||||
|
||||
|
||||
32
docs/install/Composable-Kernel-prerequisites.rst
Normal file
32
docs/install/Composable-Kernel-prerequisites.rst
Normal file
@@ -0,0 +1,32 @@
|
||||
.. meta::
|
||||
:description: Composable Kernel prerequisites
|
||||
:keywords: composable kernel, CK, ROCm, API, documentation, prerequisites
|
||||
|
||||
******************************************************
|
||||
Composable Kernel prerequisites
|
||||
******************************************************
|
||||
|
||||
Docker images that include all the required prerequisites for building Composable Kernel are available on `Docker Hub <https://hub.docker.com/r/rocm/composable_kernel/tags>`_.
|
||||
|
||||
The following prerequisites are required to build and install Composable Kernel:
|
||||
|
||||
* cmake
|
||||
* hip-rocclr
|
||||
* iputils-ping
|
||||
* jq
|
||||
* libelf-dev
|
||||
* libncurses5-dev
|
||||
* libnuma-dev
|
||||
* libpthread-stubs0-dev
|
||||
* llvm-amdgpu
|
||||
* mpich
|
||||
* net-tools
|
||||
* python3
|
||||
* python3-dev
|
||||
* python3-pip
|
||||
* redis
|
||||
* rocm-llvm-dev
|
||||
* zlib1g-dev
|
||||
* libzstd-dev
|
||||
* openssh-server
|
||||
* clang-format-12
|
||||
@@ -1,101 +0,0 @@
|
||||
.. meta::
|
||||
:description: Composable Kernel documentation and API reference library
|
||||
:keywords: composable kernel, CK, ROCm, API, documentation
|
||||
|
||||
.. _docker-hub:
|
||||
|
||||
********************************************************************
|
||||
CK Docker Hub
|
||||
********************************************************************
|
||||
|
||||
Why do I need this?
|
||||
===================
|
||||
|
||||
To make things simpler, and bring Composable Kernel and its dependencies together,
|
||||
docker images can be found on `Docker Hub <https://hub.docker.com/r/rocm/composable_kernel/tags>`_. Docker images provide a complete image of the OS, the Composable Kernel library, and its dependencies in a single downloadable file.
|
||||
|
||||
Refer to `Docker Overview <https://docs.docker.com/get-started/overview/>`_ for more information on Docker images and containers.
|
||||
|
||||
Which image is right for me?
|
||||
============================
|
||||
|
||||
The image naming includes information related to the docker image.
|
||||
For example ``ck_ub20.04_rocm6.0`` indicates the following:
|
||||
|
||||
* ``ck`` - made for running Composable Kernel;
|
||||
* ``ub20.04`` - based on Ubuntu 20.04;
|
||||
* ``rocm6.0`` - ROCm platform version 6.0.
|
||||
|
||||
Download a docker image suitable for your OS and ROCm release, run or start the docker container, and then resume the tutorial from this point. Use the ``docker pull`` command to download the file::
|
||||
|
||||
docker pull rocm/composable_kernel:ck_ub20.04_rocm6.0
|
||||
|
||||
|
||||
What is inside the image?
|
||||
-------------------------
|
||||
|
||||
The docker images have everything you need for running CK including:
|
||||
|
||||
* `ROCm <https://rocm.docs.amd.com/en/latest/index.html>`_
|
||||
* `CMake <https://cmake.org/getting-started/>`_
|
||||
* `Compiler <https://github.com/ROCm/llvm-project>`_
|
||||
* `Composable Kernel library <https://github.com/ROCm/composable_kernel>`_
|
||||
|
||||
Running the docker container
|
||||
============================
|
||||
|
||||
After downloading the docker image, you can start the container using one of a number of commands. Start with the ``docker run`` command as shown below::
|
||||
|
||||
docker run \
|
||||
-it \
|
||||
--privileged \
|
||||
--group-add sudo \
|
||||
-w /root/workspace \
|
||||
-v ${PATH_TO_LOCAL_WORKSPACE}:/root/workspace \
|
||||
rocm/composable_kernel:ck_ub20.04_rocm6.0 \
|
||||
/bin/bash
|
||||
|
||||
After starting the bash shell, the docker container current folder is `~/workspace`. The library path is ``~/workspace/composable_kernel``. Navigate to the library to begin the tutorial as explained in :ref:`hello-world`:
|
||||
|
||||
.. note::
|
||||
|
||||
If your current folder is different from `${HOME}`, adjust the line ``-v ${HOME}:/root/workspace`` in the ``docker run`` command to fit your folder structure.
|
||||
|
||||
Stop and restart the docker image
|
||||
=================================
|
||||
|
||||
After finishing the tutorial, or just when you have completed your work session, you can close the docker container, or stop the docker container to restart it at another time. Closing the docker container means that it is still in the active state, and can be resumed from where you left it. Stopping the container closes it, and returns the image to its initial state.
|
||||
|
||||
Use the ``Ctrl-D`` option to exit the container, while leaving it active, so you can return to the container in its current state to resume the tutorial, or pickup your project where you left off.
|
||||
|
||||
To restart the active container use the ``docker exec`` command to specify the container name and options as follows::
|
||||
|
||||
docker exec -it <container_name> bash
|
||||
|
||||
Where:
|
||||
|
||||
* `exec` is the docker command
|
||||
* `-it` is the interactive option for `exec`
|
||||
* `<container_name>` specifies an active container on the system
|
||||
* `bash` specifies the command to run in the interactive shell
|
||||
|
||||
.. note::
|
||||
|
||||
You can use the ``docker container ls`` command to list the active containers on the system.
|
||||
|
||||
To start a container from the image, use the ``docker start`` command::
|
||||
|
||||
docker start <container_name>
|
||||
|
||||
Then use the docker exec command as shown above to start the bash shell.
|
||||
|
||||
Use the ``docker stop`` command to stop the container and restore the image to its initial state::
|
||||
|
||||
docker stop <container_name>
|
||||
|
||||
Editing the docker image
|
||||
=======================
|
||||
|
||||
If you want to customize the docker image, edit the
|
||||
`Dockerfile <https://github.com/ROCm/composable_kernel/blob/develop/Dockerfile>`_
|
||||
from the GitHub repository to suit your needs.
|
||||
@@ -5,26 +5,20 @@
|
||||
.. _api-reference:
|
||||
|
||||
********************************************************************
|
||||
API reference guide
|
||||
Composable Kernel API reference guide
|
||||
********************************************************************
|
||||
|
||||
|
||||
This document contains details of the APIs for the Composable Kernel (CK) library and introduces
|
||||
some of the key design principles that are used to write new classes that extend CK functionality.
|
||||
This document contains details of the APIs for the Composable Kernel library and introduces some of the key design principles that are used to write new classes that extend the functionality of the Composable Kernel library.
|
||||
|
||||
=================
|
||||
CK Datatypes
|
||||
=================
|
||||
|
||||
-----------------
|
||||
DeviceMem
|
||||
-----------------
|
||||
=================
|
||||
|
||||
.. doxygenstruct:: DeviceMem
|
||||
|
||||
---------------------------
|
||||
=============================
|
||||
Kernels For Flashattention
|
||||
---------------------------
|
||||
=============================
|
||||
|
||||
The Flashattention algorithm is defined in :cite:t:`dao2022flashattention`. This section lists
|
||||
the classes that are used in the CK GPU implementation of Flashattention.
|
||||
@@ -1,20 +1,15 @@
|
||||
.. meta::
|
||||
:description: Composable Kernel documentation and API reference library
|
||||
:keywords: composable kernel, CK, ROCm, API, documentation
|
||||
:description: Composable Kernel wrapper
|
||||
:keywords: composable kernel, CK, ROCm, API, wrapper
|
||||
|
||||
.. _wrapper:
|
||||
|
||||
********************************************************************
|
||||
Wrapper
|
||||
Composable Kernel wrapper
|
||||
********************************************************************
|
||||
|
||||
-------------------------------------
|
||||
Description
|
||||
-------------------------------------
|
||||
|
||||
|
||||
The CK library provides a lightweight wrapper for more complex operations implemented in
|
||||
the library.
|
||||
The Composable Kernel library provides a lightweight wrapper to simplify the more complex operations.
|
||||
|
||||
Example:
|
||||
|
||||
@@ -3,34 +3,38 @@ defaults:
|
||||
root: index
|
||||
subtrees:
|
||||
|
||||
- caption: Conceptual
|
||||
entries:
|
||||
- file: conceptual/what-is-ck.rst
|
||||
title: What is Composable Kernel?
|
||||
|
||||
- caption: Install
|
||||
entries:
|
||||
- file: install/dockerhub.rst
|
||||
title: Docker Hub
|
||||
|
||||
- caption: CK API Reference
|
||||
- file: install/Composable-Kernel-prerequisites.rst
|
||||
title: Composable Kernel prerequisites
|
||||
- file: install/Composable-Kernel-install.rst
|
||||
title: Build and install Composable Kernel
|
||||
- file: install/Composable-Kernel-Docker.rst
|
||||
title: Composable Kernel Docker images
|
||||
|
||||
- caption: Conceptual
|
||||
entries:
|
||||
- file: reference/Supported_Primitives_Guide.rst
|
||||
title: Supported Primitives
|
||||
- file: reference/API_Reference_Guide.rst
|
||||
title: API Reference
|
||||
- file: reference/wrapper.rst
|
||||
title: Wrapper
|
||||
- file: conceptual/Composable-Kernel-structure.rst
|
||||
title: Composable Kernel structure
|
||||
- file: conceptual/Composable-Kernel-math.rst
|
||||
title: Composable Kernel mathematical basis
|
||||
|
||||
- caption: Tutorial
|
||||
entries:
|
||||
- file: tutorial/tutorial_hello_world.rst
|
||||
title: Hello World Tutorial
|
||||
- file: tutorial/Composable-Kernel-examples.rst
|
||||
title: Composable Kernel examples
|
||||
|
||||
- caption: Reference
|
||||
entries:
|
||||
- file: reference/Composable-Kernel-API-reference.rst
|
||||
title: Composable Kernel API reference
|
||||
- file: reference/Composable-Kernel-wrapper.rst
|
||||
title: Composable Kernel Wrapper
|
||||
|
||||
- caption: About
|
||||
entries:
|
||||
- file: Contributors_Guide.rst
|
||||
title: Contributing to CK
|
||||
title: Contributing to Composable Kernel
|
||||
- file: license.rst
|
||||
title: License
|
||||
|
||||
40
docs/tutorial/Composable-Kernel-examples.rst
Normal file
40
docs/tutorial/Composable-Kernel-examples.rst
Normal file
@@ -0,0 +1,40 @@
|
||||
.. meta::
|
||||
:description: Composable Kernel examples and tests
|
||||
:keywords: composable kernel, CK, ROCm, API, examples, tests
|
||||
|
||||
********************************************************************
|
||||
Composable Kernel examples and tests
|
||||
********************************************************************
|
||||
|
||||
After :doc:`building and installing Composable Kernel <../install/Composable-Kernel-install>`, the examples and tests will be moved to ``/opt/rocm/bin/``.
|
||||
|
||||
All tests have the prefix ``test`` and all examples have the prefix ``example``.
|
||||
|
||||
Use ``ctest`` with no arguments to run all examples and tests, or use ``ctest -R`` to run a single test. For example:
|
||||
|
||||
.. code:: shell
|
||||
|
||||
ctest -R test_gemm_fp16
|
||||
|
||||
Examples can be run individually as well. For example:
|
||||
|
||||
.. code:: shell
|
||||
|
||||
./bin/example_gemm_xdl_fp16 1 1 1
|
||||
|
||||
For instructions on how to run individual examples and tests, see their README files in the |example|_ and |test|_ GitHub folders.
|
||||
|
||||
To run smoke tests, use ``make smoke``.
|
||||
|
||||
To run regression tests, use ``make regression``.
|
||||
|
||||
In general, tests that run for under thirty seconds are included in the smoke tests and tests that run for over thirty seconds are included in the regression tests.
|
||||
|
||||
.. |example| replace:: ``example``
|
||||
.. _example: https://github.com/ROCm/composable_kernel/tree/develop/example
|
||||
|
||||
.. |client_example| replace:: ``client_example``
|
||||
.. _client_example: https://github.com/ROCm/composable_kernel/tree/develop/client_example
|
||||
|
||||
.. |test| replace:: ``test``
|
||||
.. _test: https://github.com/ROCm/composable_kernel/tree/develop/test
|
||||
@@ -1,165 +0,0 @@
|
||||
.. meta::
|
||||
:description: Composable Kernel documentation and API reference library
|
||||
:keywords: composable kernel, CK, ROCm, API, documentation
|
||||
|
||||
.. _hello-world:
|
||||
|
||||
********************************************************************
|
||||
Hello World Tutorial
|
||||
********************************************************************
|
||||
|
||||
This tutorial is for engineers dealing with artificial intelligence and machine learning who
|
||||
would like to optimize pipelines and improve performance using the Composable
|
||||
Kernel (CK) library. This tutorial provides an introduction to the CK library. You will build the library and run some examples using a "Hello World" example.
|
||||
|
||||
Description
|
||||
===========
|
||||
|
||||
Modern AI technology solves more and more problems in a variety of fields, but crafting fast and
|
||||
efficient workflows is still challenging. CK can make the AI workflow fast
|
||||
and efficient. CK is a collection of optimized AI operator kernels with tools to create
|
||||
new kernels. The library has components required for modern neural network architectures
|
||||
including matrix multiplication, convolution, contraction, reduction, attention modules, a variety of activation functions, and fused operators.
|
||||
|
||||
CK library acceleration features are based on:
|
||||
|
||||
* Layered structure
|
||||
* Tile-based computation model
|
||||
* Tensor coordinate transformation
|
||||
* Hardware acceleration use
|
||||
* Support of low precision data types including fp16, bf16, int8 and int4
|
||||
|
||||
If you need more technical details and benchmarking results read the following
|
||||
`blog post <https://community.amd.com/t5/instinct-accelerators/amd-composable-kernel-library-efficient-fused-kernels-for-ai/ba-p/553224>`_.
|
||||
|
||||
To download the library visit the `composable_kernel repository <https://github.com/ROCm/composable_kernel>`_.
|
||||
|
||||
Hardware targets
|
||||
================
|
||||
|
||||
CK library fully supports `gfx908` and `gfx90a` GPU architectures, while only some operators are
|
||||
supported for `gfx1030` devices. Check your hardware to determine the target GPU architecture.
|
||||
|
||||
========== =========
|
||||
GPU Target AMD GPU
|
||||
========== =========
|
||||
gfx908 Radeon Instinct MI100
|
||||
gfx90a Radeon Instinct MI210, MI250, MI250X
|
||||
gfx1030 Radeon PRO V620, W6800, W6800X, W6800X Duo, W6900X, RX 6800, RX 6800 XT, RX 6900 XT, RX 6900 XTX, RX 6950 XT
|
||||
========== =========
|
||||
|
||||
There are also `cloud options <https://aws.amazon.com/ec2/instance-types/g4/>`_ you can find if
|
||||
you don't have an AMD GPU at hand.
|
||||
|
||||
Build the library
|
||||
=================
|
||||
|
||||
This tutorial is based on the use of docker images as explained in :ref:`docker-hub`. Download a docker image suitable for your OS and ROCm release, run or start the docker container, and then resume the tutorial from this point.
|
||||
|
||||
.. note::
|
||||
|
||||
You can also `install ROCm <https://rocm.docs.amd.com/projects/install-on-linux/en/latest/>`_ on your system, clone the `Composable Kernel repository <https://github.com/ROCm/composable_kernel.git>`_ on GitHub, and use that to build and run the examples using the commands described below.
|
||||
|
||||
Both the docker container and GitHub repository include the Composable Kernel library. Navigate to the library::
|
||||
|
||||
cd composable_kernel/
|
||||
|
||||
Create and change to a ``build`` directory::
|
||||
|
||||
mkdir build && cd build
|
||||
|
||||
The previous section discussed supported GPU architecture. Once you decide which hardware targets are needed, run CMake using the ``GPU_TARGETS`` flag::
|
||||
|
||||
cmake \
|
||||
-D CMAKE_PREFIX_PATH=/opt/rocm \
|
||||
-D CMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc \
|
||||
-D CMAKE_CXX_FLAGS="-O3" \
|
||||
-D CMAKE_BUILD_TYPE=Release \
|
||||
-D BUILD_DEV=OFF \
|
||||
-D GPU_TARGETS="gfx908;gfx90a;gfx1030" ..
|
||||
|
||||
If everything goes well the CMake command will return::
|
||||
|
||||
-- Configuring done
|
||||
-- Generating done
|
||||
-- Build files have been written to: "/root/workspace/composable_kernel/build"
|
||||
|
||||
Finally, you can build examples and tests::
|
||||
|
||||
make -j examples tests
|
||||
|
||||
When complete you should see::
|
||||
|
||||
Scanning dependencies of target tests
|
||||
[100%] Built target tests
|
||||
|
||||
Run examples and tests
|
||||
======================
|
||||
|
||||
Examples are listed as test cases as well, so you can run all examples and tests with::
|
||||
|
||||
ctest
|
||||
|
||||
You can check the list of all tests by running::
|
||||
|
||||
ctest -N
|
||||
|
||||
You can also run examples separately as shown in the following example execution::
|
||||
|
||||
./bin/example_gemm_xdl_fp16 1 1 1
|
||||
|
||||
The arguments ``1 1 1`` mean that you want to run this example in the mode: verify results with CPU, initialize matrices with integers, and benchmark the kernel execution. You can play around with these parameters and see how output and execution results change.
|
||||
|
||||
If you have a device based on `gfx908` or `gfx90a` architecture, and if the example runs as expected, you should see something like::
|
||||
|
||||
a_m_k: dim 2, lengths {3840, 4096}, strides {4096, 1}
|
||||
b_k_n: dim 2, lengths {4096, 4096}, strides {4096, 1}
|
||||
c_m_n: dim 2, lengths {3840, 4096}, strides {4096, 1}
|
||||
Perf: 1.08153 ms, 119.136 TFlops, 89.1972 GB/s, DeviceGemm_Xdl_CShuffle<Default, 256, 256, 128, 32, 8, 2, 32, 32, 4, 2, 8, 4, 1, 2> LoopScheduler: Interwave, PipelineVersion: v1
|
||||
|
||||
However, running it on a `gfx1030` device should result in the following::
|
||||
|
||||
a_m_k: dim 2, lengths {3840, 4096}, strides {4096, 1}
|
||||
b_k_n: dim 2, lengths {4096, 4096}, strides {1, 4096}
|
||||
c_m_n: dim 2, lengths {3840, 4096}, strides {4096, 1}
|
||||
DeviceGemmXdl<256, 256, 128, 4, 8, 32, 32, 4, 2> NumPrefetch: 1, LoopScheduler: Default, PipelineVersion: v1 does not support this problem
|
||||
|
||||
Don't worry, some operators are supported on `gfx1030` architecture, so you can run a
|
||||
separate example like::
|
||||
|
||||
./bin/example_gemm_dl_fp16 1 1 1
|
||||
|
||||
and it should return something like::
|
||||
|
||||
a_m_k: dim 2, lengths {3840, 4096}, strides {1, 4096}
|
||||
b_k_n: dim 2, lengths {4096, 4096}, strides {4096, 1}
|
||||
c_m_n: dim 2, lengths {3840, 4096}, strides {4096, 1}
|
||||
arg.a_grid_desc_k0_m0_m1_k1_{2048, 3840, 2}
|
||||
arg.b_grid_desc_k0_n0_n1_k1_{2048, 4096, 2}
|
||||
arg.c_grid_desc_m_n_{ 3840, 4096}
|
||||
launch_and_time_kernel: grid_dim {960, 1, 1}, block_dim {256, 1, 1}
|
||||
Warm up 1 time
|
||||
Start running 10 times...
|
||||
Perf: 3.65695 ms, 35.234 TFlops, 26.3797 GB/s, DeviceGemmDl<256, 128, 128, 16, 2, 4, 4, 1>
|
||||
|
||||
.. note::
|
||||
|
||||
A new CMake flag ``DL_KERNELS`` has been added to the latest versions of CK. If you do not see the above results when running ``example_gemm_dl_fp16``, you might need to add ``-D DL_KERNELS=ON`` to your CMake command to build the operators supported on the `gfx1030` architecture.
|
||||
|
||||
You can also run a separate test::
|
||||
|
||||
ctest -R test_gemm_fp16
|
||||
|
||||
If everything goes well you should see something like::
|
||||
|
||||
Start 121: test_gemm_fp16
|
||||
1/1 Test #121: test_gemm_fp16 ................... Passed 51.81 sec
|
||||
|
||||
100% tests passed, 0 tests failed out of 1
|
||||
|
||||
Summary
|
||||
=======
|
||||
|
||||
In this tutorial you took the first look at the Composable Kernel library, built it on your system and ran some examples and tests. In the next tutorial you will run kernels with different configurations to find out the best one for your hardware and task.
|
||||
|
||||
P.S.: If you are running on a cloud instance, don't forget to switch off the cloud instance.
|
||||
Reference in New Issue
Block a user