mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-06-29 19:28:33 +00:00
Merge pull request #2398 from spolifroni-amd/spolifroni-amd/updates-for-642
Spolifroni amd/updates for 642
This commit is contained in:
2
.gitignore
vendored
2
.gitignore
vendored
@@ -55,6 +55,8 @@ _static/
|
||||
_templates/
|
||||
_toc.yml
|
||||
_doxygen/
|
||||
docs/doxygen/html
|
||||
docs/doxygen/xml
|
||||
|
||||
# JetBrains IDE
|
||||
.idea/
|
||||
|
||||
@@ -20,10 +20,11 @@ Tejash Shah, 2019-2020
|
||||
Xiaoyan Zhou, 2020
|
||||
|
||||
[Jianfeng Yan](https://github.com/j4yan), 2021-2022
|
||||
|
||||
[Jun Liu](https://github.com/junliume), 2021-2024
|
||||
|
||||
## Product Manager
|
||||
[Jun Liu](https://github.com/junliume)
|
||||
[John Afaganis](https://github.com/afagaj)
|
||||
|
||||
|
||||
|
||||
## Contributors
|
||||
|
||||
@@ -104,6 +104,7 @@ Docker images are available on [DockerHub](https://hub.docker.com/r/rocm/composa
|
||||
```bash
|
||||
make -j install
|
||||
```
|
||||
**[See Note on -j](#notes)**
|
||||
|
||||
## Optional post-install steps
|
||||
|
||||
@@ -146,7 +147,8 @@ Docker images are available on [DockerHub](https://hub.docker.com/r/rocm/composa
|
||||
python3 -m sphinx -T -E -b html -d _build/doctrees -D language=en . _build/html
|
||||
```
|
||||
|
||||
Note the `-j` option for building with multiple threads in parallel, which speeds up the build significantly.
|
||||
### Notes
|
||||
The `-j` option builds with multiple threads in parallel, which speeds up the build significantly.
|
||||
However, `-j` launches unlimited number of threads, which can cause the build to run out of memory and
|
||||
crash. On average, you should expect each thread to use ~2Gb of RAM.
|
||||
Depending on the number of CPU cores and the amount of RAM on your system, you may want to
|
||||
@@ -211,4 +213,4 @@ script/uninstall_precommit.sh
|
||||
```
|
||||
|
||||
If you need to temporarily disable pre-commit hooks, you can add the `--no-verify` option to the
|
||||
`git commit` command.
|
||||
`git commit` command.
|
||||
@@ -1,18 +1,15 @@
|
||||
.. meta::
|
||||
:description: Composable Kernel documentation and API reference library
|
||||
:keywords: composable kernel, CK, ROCm, API, documentation
|
||||
:description: Composable Kernel mathematical basis
|
||||
:keywords: composable kernel, CK, ROCm, API, mathematics, algorithm
|
||||
|
||||
.. _supported-primitives:
|
||||
|
||||
********************************************************************
|
||||
Supported Primitives Guide
|
||||
Composable Kernel mathematical basis
|
||||
********************************************************************
|
||||
|
||||
This document contains details of supported primitives in Composable Kernel (CK). In contrast to the API Reference Guide, the Supported Primitives Guide is an introduction to the math which underpins the algorithms implemented in CK.
|
||||
This is an introduction to the math which underpins the algorithms implemented in Composable Kernel.
|
||||
|
||||
------------
|
||||
Softmax
|
||||
------------
|
||||
|
||||
For vectors :math:`x^{(1)}, x^{(2)}, \ldots, x^{(T)}` of size :math:`B` you can decompose the
|
||||
softmax of concatenated :math:`x = [ x^{(1)}\ | \ \ldots \ | \ x^{(T)} ]` as,
|
||||
29
docs/conceptual/Composable-Kernel-structure.rst
Normal file
29
docs/conceptual/Composable-Kernel-structure.rst
Normal file
@@ -0,0 +1,29 @@
|
||||
.. meta::
|
||||
:description: Composable Kernel structure
|
||||
:keywords: composable kernel, CK, ROCm, API, structure
|
||||
|
||||
.. _what-is-ck:
|
||||
|
||||
********************************************************************
|
||||
Composable Kernel structure
|
||||
********************************************************************
|
||||
|
||||
The Composable Kernel library uses a tile-based programming model and tensor coordinate transformation to achieve performance portability and code maintainability. Tensor coordinate transformation is a complexity reduction technique for complex machine learning operators.
|
||||
|
||||
|
||||
.. image:: ../data/ck_component.png
|
||||
:alt: CK Components
|
||||
|
||||
|
||||
The Composable Kernel library consists of four layers:
|
||||
|
||||
* a templated tile operator layer
|
||||
* a templated kernel and invoker layer
|
||||
* an instantiated kernel and invoker layer
|
||||
* a client API layer.
|
||||
|
||||
A wrapper component is included to simplify tensor transform operations.
|
||||
|
||||
.. image:: ../data/ck_layer.png
|
||||
:alt: CK Layers
|
||||
|
||||
@@ -1,41 +0,0 @@
|
||||
.. meta::
|
||||
:description: Composable Kernel documentation and API reference library
|
||||
:keywords: composable kernel, CK, ROCm, API, documentation
|
||||
|
||||
.. _what-is-ck:
|
||||
|
||||
********************************************************************
|
||||
What is the Composable Kernel library
|
||||
********************************************************************
|
||||
|
||||
|
||||
Methodology
|
||||
===========
|
||||
|
||||
The Composable Kernel (CK) library provides a programming model for writing performance critical kernels for machine learning workloads across multiple architectures including GPUs and CPUs, through general purpose kernel languages like HIP C++.
|
||||
|
||||
CK utilizes two concepts to achieve performance portability and code maintainability:
|
||||
|
||||
* A tile-based programming model
|
||||
* Algorithm complexity reduction for complex ML operators using an innovative technique called
|
||||
"Tensor Coordinate Transformation".
|
||||
|
||||
.. image:: ../data/ck_component.png
|
||||
:alt: CK Components
|
||||
|
||||
|
||||
Code Structure
|
||||
==============
|
||||
|
||||
The CK library is structured into 4 layers:
|
||||
|
||||
* "Templated Tile Operators" layer
|
||||
* "Templated Kernel and Invoker" layer
|
||||
* "Instantiated Kernel and Invoker" layer
|
||||
* "Client API" layer
|
||||
|
||||
It also includes a simple wrapper component used to perform tensor transform operations more easily and with fewer lines of code.
|
||||
|
||||
.. image:: ../data/ck_layer.png
|
||||
:alt: CK Layers
|
||||
|
||||
@@ -32,19 +32,19 @@ DOXYFILE_ENCODING = UTF-8
|
||||
# title of most generated pages and in a few other places.
|
||||
# The default value is: My Project.
|
||||
|
||||
PROJECT_NAME = "ck"
|
||||
PROJECT_NAME = "Composable Kernel"
|
||||
|
||||
# The PROJECT_NUMBER tag can be used to enter a project or revision number. This
|
||||
# could be handy for archiving the generated documentation or if some version
|
||||
# control system is used.
|
||||
|
||||
PROJECT_NUMBER = v3.0.1.0
|
||||
PROJECT_NUMBER =
|
||||
|
||||
# Using the PROJECT_BRIEF tag one can provide an optional one line description
|
||||
# for a project that appears at the top of each page and should give viewer a
|
||||
# quick idea about the purpose of the project. Keep the description short.
|
||||
|
||||
PROJECT_BRIEF = "prototype interfaces compatible with ROCm platform and HiP"
|
||||
PROJECT_BRIEF = "Prototype interfaces compatible with ROCm platform and HiP"
|
||||
|
||||
# With the PROJECT_LOGO tag one can specify a logo or an icon that is included
|
||||
# in the documentation. The maximum height of the logo should not exceed 55
|
||||
@@ -775,12 +775,10 @@ WARN_LOGFILE =
|
||||
# spaces. See also FILE_PATTERNS and EXTENSION_MAPPING
|
||||
# Note: If this tag is empty the current directory is searched.
|
||||
|
||||
INPUT = ../../include/ck/tensor_operation/gpu/grid \
|
||||
../../include/ck/tensor_operation/gpu/block \
|
||||
../../include/ck/tensor_operation/gpu/thread \
|
||||
INPUT = ../../include \
|
||||
../../include/ck/ \
|
||||
../../library/include/ck/library/utility \
|
||||
../../include/ck/wrapper
|
||||
|
||||
../../include/ck_tile
|
||||
|
||||
# This tag can be used to specify the character encoding of the source files
|
||||
# that doxygen parses. Internally doxygen uses the UTF-8 encoding. Doxygen uses
|
||||
@@ -959,7 +957,17 @@ FILTER_SOURCE_PATTERNS =
|
||||
# (index.html). This can be useful if you have a project on for instance GitHub
|
||||
# and want to reuse the introduction page also for the doxygen output.
|
||||
|
||||
USE_MDFILE_AS_MAINPAGE = ../README.md
|
||||
|
||||
USE_MDFILE_AS_MAINPAGE =
|
||||
|
||||
# The Fortran standard specifies that for fixed formatted Fortran code all
|
||||
# characters from position 72 are to be considered as comment. A common
|
||||
# extension is to allow longer lines before the automatic comment starts. The
|
||||
# setting FORTRAN_COMMENT_AFTER will also make it possible that longer lines can
|
||||
# be processed before the automatic comment starts.
|
||||
# Minimum value: 7, maximum value: 10000, default value: 72.
|
||||
|
||||
FORTRAN_COMMENT_AFTER = 72
|
||||
|
||||
#---------------------------------------------------------------------------
|
||||
# Configuration options related to source browsing
|
||||
@@ -1179,7 +1187,20 @@ HTML_EXTRA_STYLESHEET =
|
||||
# files will be copied as-is; there are no commands or markers available.
|
||||
# This tag requires that the tag GENERATE_HTML is set to YES.
|
||||
|
||||
HTML_EXTRA_FILES =
|
||||
HTML_EXTRA_FILES = ../_doxygen/extra_stylesheet.css
|
||||
|
||||
# The HTML_COLORSTYLE tag can be used to specify if the generated HTML output
|
||||
# should be rendered with a dark or light theme.
|
||||
# Possible values are: LIGHT always generate light mode output, DARK always
|
||||
# generate dark mode output, AUTO_LIGHT automatically set the mode according to
|
||||
# the user preference, use light mode if no preference is set (the default),
|
||||
# AUTO_DARK automatically set the mode according to the user preference, use
|
||||
# dark mode if no preference is set and TOGGLE allow to user to switch between
|
||||
# light and dark mode via a button.
|
||||
# The default value is: AUTO_LIGHT.
|
||||
# This tag requires that the tag GENERATE_HTML is set to YES.
|
||||
|
||||
HTML_COLORSTYLE = LIGHT
|
||||
|
||||
# The HTML_COLORSTYLE_HUE tag controls the color of the HTML output. Doxygen
|
||||
# will adjust the colors in the style sheet and background images according to
|
||||
|
||||
@@ -8,31 +8,38 @@
|
||||
Composable Kernel User Guide
|
||||
********************************************************************
|
||||
|
||||
The Composable Kernel (CK) library provides a programming model for writing performance critical kernels for machine learning workloads across multiple architectures including GPUs and CPUs, through general purpose kernel languages like HIP C++. This document contains instructions for installing, using, and contributing to the Composable Kernel project. To learn more see :ref:`what-is-ck`.
|
||||
The Composable Kernel library provides a programming model for writing performance critical kernels for machine learning workloads across multiple architectures including GPUs and CPUs, through general purpose kernel languages such as `HIP C++ <https://rocm.docs.amd.com/projects/HIP/en/latest/index.html>`_.
|
||||
|
||||
The CK documentation is structured as follows:
|
||||
The Composable Kernel repository is located at `https://github.com/ROCm/composable_kernel <https://github.com/ROCm/composable_kernel>`_.
|
||||
|
||||
.. grid:: 2
|
||||
:gutter: 3
|
||||
|
||||
.. grid-item-card:: Installation
|
||||
.. grid-item-card:: Install
|
||||
|
||||
* :ref:`docker-hub`
|
||||
* :doc:`Composable Kernel prerequisites <./install/Composable-Kernel-prerequisites>`
|
||||
* :doc:`Build and install Composable Kernel <./install/Composable-Kernel-install>`
|
||||
* :doc:`Build and install Composable Kernel on a Docker image <./install/Composable-Kernel-Docker>`
|
||||
|
||||
.. grid-item-card:: Conceptual
|
||||
|
||||
* :ref:`what-is-ck`
|
||||
* :doc:`Composable Kernel structure <./conceptual/Composable-Kernel-structure>`
|
||||
* :doc:`Composable Kernel mathematical basis <./conceptual/Composable-Kernel-math>`
|
||||
|
||||
.. grid-item-card:: API reference
|
||||
.. grid-item-card:: Tutorials
|
||||
|
||||
* :ref:`supported-primitives`
|
||||
* :ref:`api-reference`
|
||||
* :ref:`wrapper`
|
||||
* :doc:`Composable Kernel examples and tests <./tutorial/Composable-Kernel-examples>`
|
||||
|
||||
.. grid-item-card:: Tutorial
|
||||
|
||||
* :ref:`hello-world`
|
||||
.. grid-item-card:: Reference
|
||||
|
||||
* :doc:`Composable Kernel supported scalar types <./reference/Composable_Kernel_supported_scalar_types>`
|
||||
* :doc:`Composable Kernel custom types <./reference/Composable_Kernel_custom_types>`
|
||||
* :doc:`Composable Kernel vector utilities <./reference/Composable_Kernel_vector_utilities>`
|
||||
* :ref:`wrapper`
|
||||
* :doc:`Composable Kernel API reference <./doxygen/html/namespace_c_k>`
|
||||
* :doc:`CK Tile API reference <./doxygen/html/namespaceck__tile>`
|
||||
* :doc:`Composable Kernel complete API class list <./doxygen/html/annotated>`
|
||||
|
||||
To contribute to the documentation refer to `Contributing to ROCm <https://rocm.docs.amd.com/en/latest/contribute/contributing.html>`_.
|
||||
|
||||
You can find licensing information on the `Licensing <https://rocm.docs.amd.com/en/latest/about/license.html>`_ page.
|
||||
|
||||
16
docs/install/Composable-Kernel-Docker.rst
Normal file
16
docs/install/Composable-Kernel-Docker.rst
Normal file
@@ -0,0 +1,16 @@
|
||||
.. meta::
|
||||
:description: Composable Kernel docker files
|
||||
:keywords: composable kernel, CK, ROCm, API, docker
|
||||
|
||||
.. _docker-hub:
|
||||
|
||||
********************************************************************
|
||||
Composable Kernel Docker containers
|
||||
********************************************************************
|
||||
|
||||
Docker images that include all the required prerequisites for building Composable Kernel are available on `Docker Hub <https://hub.docker.com/r/rocm/composable_kernel/tags>`_.
|
||||
|
||||
The images also contain `ROCm <https://rocm.docs.amd.com/en/latest/index.html>`_, `CMake <https://cmake.org/getting-started/>`_, and the `ROCm LLVM compiler infrastructure <https://rocm.docs.amd.com/projects/llvm-project/en/latest/index.html>`_.
|
||||
|
||||
Composable Kernel Docker images are named according to their operating system and ROCm version. For example, a Docker image named ``ck_ub22.04_rocm6.3`` would correspond to an Ubuntu 22.04 image with ROCm 6.3.
|
||||
|
||||
72
docs/install/Composable-Kernel-install.rst
Normal file
72
docs/install/Composable-Kernel-install.rst
Normal file
@@ -0,0 +1,72 @@
|
||||
.. meta::
|
||||
:description: Composable Kernel build and install
|
||||
:keywords: composable kernel, CK, ROCm, API, documentation, install
|
||||
|
||||
******************************************************
|
||||
Building and installing Composable Kernel with CMake
|
||||
******************************************************
|
||||
|
||||
Before you begin, clone the `Composable Kernel GitHub repository <https://github.com/ROCm/composable_kernel.git>`_ and create a ``build`` directory in its root:
|
||||
|
||||
.. code:: shell
|
||||
|
||||
git clone https://github.com/ROCm/composable_kernel.git
|
||||
cd composable_kernel
|
||||
mkdir build
|
||||
|
||||
Change directory to the ``build`` directory and generate the makefile using the ``cmake`` command. Two build options are required:
|
||||
|
||||
* ``CMAKE_PREFIX_PATH``: The ROCm installation path. ROCm is installed in ``/opt/rocm`` by default.
|
||||
* ``CMAKE_CXX_COMPILER``: The path to the Clang compiler. Clang is found at ``/opt/rocm/llvm/bin/clang++`` by default.
|
||||
|
||||
|
||||
.. code:: shell
|
||||
|
||||
cd build
|
||||
cmake ../. -D CMAKE_PREFIX_PATH="/opt/rocm" -D CMAKE_CXX_COMPILER="/opt/rocm/llvm/bin/clang++" [-D<OPTION1=VALUE1> [-D<OPTION2=VALUE2>] ...]
|
||||
|
||||
|
||||
Other build options are:
|
||||
|
||||
* ``DISABLE_DL_KERNELS``: Set this to "ON" to not build deep learning (DL) and data parallel primitive (DPP) instances.
|
||||
|
||||
.. note::
|
||||
|
||||
DL and DPP instances are useful on architectures that don't support XDL or WMMA.
|
||||
|
||||
* ``CK_USE_FP8_ON_UNSUPPORTED_ARCH``: Set to ``ON`` to build FP8 data type instances on gfx90a without native FP8 support.
|
||||
* ``GPU_TARGETS``: Target architectures. Target architectures in this list must all be different versions of the same architectures. Enclose the list of targets in quotation marks. Separate multiple targets with semicolons (``;``). For example, ``cmake -D GPU_TARGETS="gfx908;gfx90a"``. This option is required to build tests and examples.
|
||||
* ``GPU_ARCHS``: Target architectures. Target architectures in this list are not limited to different versions of the same architectures. Enclose the list of targets in quotation marks. Separate multiple targets with semicolons (``;``). For example, ``cmake -D GPU_TARGETS="gfx908;gfx1100"``.
|
||||
* ``CMAKE_BUILD_TYPE``: The build type. Can be ``None``, ``Release``, ``Debug``, ``RelWithDebInfo``, or ``MinSizeRel``. CMake will use ``Release`` by default.
|
||||
|
||||
.. Note::
|
||||
|
||||
If neither ``GPU_TARGETS`` nor ``GPU_ARCHS`` is specified, Composable Kernel will be built for all targets supported by the compiler.
|
||||
|
||||
Build Composable Kernel using the generated makefile. This will build the library, the examples, and the tests, and save them to ``bin``.
|
||||
|
||||
.. code:: shell
|
||||
|
||||
make -j20
|
||||
|
||||
The ``-j`` option speeds up the build by using multiple threads in parallel. For example, ``-j20`` uses twenty threads in parallel. On average, each thread will use 2GB of memory. Make sure that the number of threads you use doesn't exceed the available memory in your system.
|
||||
|
||||
Using ``-j`` alone will launch an unlimited number of threads and is not recommended.
|
||||
|
||||
Install the Composable Kernel library:
|
||||
|
||||
.. code:: shell
|
||||
|
||||
make install
|
||||
|
||||
After running ``make install``, the Composable Kernel files will be saved to the following locations:
|
||||
|
||||
* Library files: ``/opt/rocm/lib/``
|
||||
* Header files: ``/opt/rocm/include/ck/`` and ``/opt/rocm/include/ck_tile/``
|
||||
* Examples, tests, and ckProfiler: ``/opt/rocm/bin/``
|
||||
|
||||
For information about ckProfiler, see `the ckProfiler readme file <https://github.com/ROCm/composable_kernel/blob/develop/profiler/README.md>`_.
|
||||
|
||||
For information about running the examples and tests, see :doc:`Composable Kernel examples and tests <../tutorial/Composable-Kernel-examples>`.
|
||||
|
||||
|
||||
32
docs/install/Composable-Kernel-prerequisites.rst
Normal file
32
docs/install/Composable-Kernel-prerequisites.rst
Normal file
@@ -0,0 +1,32 @@
|
||||
.. meta::
|
||||
:description: Composable Kernel prerequisites
|
||||
:keywords: composable kernel, CK, ROCm, API, documentation, prerequisites
|
||||
|
||||
******************************************************
|
||||
Composable Kernel prerequisites
|
||||
******************************************************
|
||||
|
||||
Docker images that include all the required prerequisites for building Composable Kernel are available on `Docker Hub <https://hub.docker.com/r/rocm/composable_kernel/tags>`_.
|
||||
|
||||
The following prerequisites are required to build and install Composable Kernel:
|
||||
|
||||
* cmake
|
||||
* hip-rocclr
|
||||
* iputils-ping
|
||||
* jq
|
||||
* libelf-dev
|
||||
* libncurses5-dev
|
||||
* libnuma-dev
|
||||
* libpthread-stubs0-dev
|
||||
* llvm-amdgpu
|
||||
* mpich
|
||||
* net-tools
|
||||
* python3
|
||||
* python3-dev
|
||||
* python3-pip
|
||||
* redis
|
||||
* rocm-llvm-dev
|
||||
* zlib1g-dev
|
||||
* libzstd-dev
|
||||
* openssh-server
|
||||
* clang-format-12
|
||||
@@ -1,101 +0,0 @@
|
||||
.. meta::
|
||||
:description: Composable Kernel documentation and API reference library
|
||||
:keywords: composable kernel, CK, ROCm, API, documentation
|
||||
|
||||
.. _docker-hub:
|
||||
|
||||
********************************************************************
|
||||
CK Docker Hub
|
||||
********************************************************************
|
||||
|
||||
Why do I need this?
|
||||
===================
|
||||
|
||||
To make things simpler, and bring Composable Kernel and its dependencies together,
|
||||
docker images can be found on `Docker Hub <https://hub.docker.com/r/rocm/composable_kernel/tags>`_. Docker images provide a complete image of the OS, the Composable Kernel library, and its dependencies in a single downloadable file.
|
||||
|
||||
Refer to `Docker Overview <https://docs.docker.com/get-started/overview/>`_ for more information on Docker images and containers.
|
||||
|
||||
Which image is right for me?
|
||||
============================
|
||||
|
||||
The image naming includes information related to the docker image.
|
||||
For example ``ck_ub20.04_rocm6.0`` indicates the following:
|
||||
|
||||
* ``ck`` - made for running Composable Kernel;
|
||||
* ``ub20.04`` - based on Ubuntu 20.04;
|
||||
* ``rocm6.0`` - ROCm platform version 6.0.
|
||||
|
||||
Download a docker image suitable for your OS and ROCm release, run or start the docker container, and then resume the tutorial from this point. Use the ``docker pull`` command to download the file::
|
||||
|
||||
docker pull rocm/composable_kernel:ck_ub20.04_rocm6.0
|
||||
|
||||
|
||||
What is inside the image?
|
||||
-------------------------
|
||||
|
||||
The docker images have everything you need for running CK including:
|
||||
|
||||
* `ROCm <https://rocm.docs.amd.com/en/latest/index.html>`_
|
||||
* `CMake <https://cmake.org/getting-started/>`_
|
||||
* `Compiler <https://github.com/ROCm/llvm-project>`_
|
||||
* `Composable Kernel library <https://github.com/ROCm/composable_kernel>`_
|
||||
|
||||
Running the docker container
|
||||
============================
|
||||
|
||||
After downloading the docker image, you can start the container using one of a number of commands. Start with the ``docker run`` command as shown below::
|
||||
|
||||
docker run \
|
||||
-it \
|
||||
--privileged \
|
||||
--group-add sudo \
|
||||
-w /root/workspace \
|
||||
-v ${PATH_TO_LOCAL_WORKSPACE}:/root/workspace \
|
||||
rocm/composable_kernel:ck_ub20.04_rocm6.0 \
|
||||
/bin/bash
|
||||
|
||||
After starting the bash shell, the docker container current folder is `~/workspace`. The library path is ``~/workspace/composable_kernel``. Navigate to the library to begin the tutorial as explained in :ref:`hello-world`:
|
||||
|
||||
.. note::
|
||||
|
||||
If your current folder is different from `${HOME}`, adjust the line ``-v ${HOME}:/root/workspace`` in the ``docker run`` command to fit your folder structure.
|
||||
|
||||
Stop and restart the docker image
|
||||
=================================
|
||||
|
||||
After finishing the tutorial, or just when you have completed your work session, you can close the docker container, or stop the docker container to restart it at another time. Closing the docker container means that it is still in the active state, and can be resumed from where you left it. Stopping the container closes it, and returns the image to its initial state.
|
||||
|
||||
Use the ``Ctrl-D`` option to exit the container, while leaving it active, so you can return to the container in its current state to resume the tutorial, or pickup your project where you left off.
|
||||
|
||||
To restart the active container use the ``docker exec`` command to specify the container name and options as follows::
|
||||
|
||||
docker exec -it <container_name> bash
|
||||
|
||||
Where:
|
||||
|
||||
* `exec` is the docker command
|
||||
* `-it` is the interactive option for `exec`
|
||||
* `<container_name>` specifies an active container on the system
|
||||
* `bash` specifies the command to run in the interactive shell
|
||||
|
||||
.. note::
|
||||
|
||||
You can use the ``docker container ls`` command to list the active containers on the system.
|
||||
|
||||
To start a container from the image, use the ``docker start`` command::
|
||||
|
||||
docker start <container_name>
|
||||
|
||||
Then use the docker exec command as shown above to start the bash shell.
|
||||
|
||||
Use the ``docker stop`` command to stop the container and restore the image to its initial state::
|
||||
|
||||
docker stop <container_name>
|
||||
|
||||
Editing the docker image
|
||||
=======================
|
||||
|
||||
If you want to customize the docker image, edit the
|
||||
`Dockerfile <https://github.com/ROCm/composable_kernel/blob/develop/Dockerfile>`_
|
||||
from the GitHub repository to suit your needs.
|
||||
@@ -1,48 +0,0 @@
|
||||
.. meta::
|
||||
:description: Composable Kernel documentation and API reference library
|
||||
:keywords: composable kernel, CK, ROCm, API, documentation
|
||||
|
||||
.. _api-reference:
|
||||
|
||||
********************************************************************
|
||||
API reference guide
|
||||
********************************************************************
|
||||
|
||||
|
||||
This document contains details of the APIs for the Composable Kernel (CK) library and introduces
|
||||
some of the key design principles that are used to write new classes that extend CK functionality.
|
||||
|
||||
=================
|
||||
CK Datatypes
|
||||
=================
|
||||
|
||||
-----------------
|
||||
DeviceMem
|
||||
-----------------
|
||||
|
||||
.. doxygenstruct:: DeviceMem
|
||||
|
||||
---------------------------
|
||||
Kernels For Flashattention
|
||||
---------------------------
|
||||
|
||||
The Flashattention algorithm is defined in :cite:t:`dao2022flashattention`. This section lists
|
||||
the classes that are used in the CK GPU implementation of Flashattention.
|
||||
|
||||
**Gridwise classes**
|
||||
|
||||
.. doxygenstruct:: ck::GridwiseBatchedGemmSoftmaxGemm_Xdl_CShuffle
|
||||
|
||||
**Blockwise classes**
|
||||
|
||||
.. doxygenstruct:: ck::ThreadGroupTensorSliceTransfer_v4r1
|
||||
|
||||
.. doxygenstruct:: ck::BlockwiseGemmXdlops_v2
|
||||
|
||||
.. doxygenstruct:: ck::BlockwiseSoftmax
|
||||
|
||||
**Threadwise classes**
|
||||
|
||||
.. doxygenstruct:: ck::ThreadwiseTensorSliceTransfer_StaticToStatic
|
||||
|
||||
.. bibliography::
|
||||
@@ -1,20 +1,15 @@
|
||||
.. meta::
|
||||
:description: Composable Kernel documentation and API reference library
|
||||
:keywords: composable kernel, CK, ROCm, API, documentation
|
||||
:description: Composable Kernel wrapper
|
||||
:keywords: composable kernel, CK, ROCm, API, wrapper
|
||||
|
||||
.. _wrapper:
|
||||
|
||||
********************************************************************
|
||||
Wrapper
|
||||
Composable Kernel wrapper
|
||||
********************************************************************
|
||||
|
||||
-------------------------------------
|
||||
Description
|
||||
-------------------------------------
|
||||
|
||||
|
||||
The CK library provides a lightweight wrapper for more complex operations implemented in
|
||||
the library.
|
||||
The Composable Kernel library provides a lightweight wrapper to simplify the more complex operations.
|
||||
|
||||
Example:
|
||||
|
||||
39
docs/reference/Composable_Kernel_custom_types.rst
Normal file
39
docs/reference/Composable_Kernel_custom_types.rst
Normal file
@@ -0,0 +1,39 @@
|
||||
.. meta::
|
||||
:description: Composable Kernel supported custom types
|
||||
:keywords: composable kernel, custom, data types, support, CK, ROCm
|
||||
|
||||
******************************************************
|
||||
Composable Kernel custom data types
|
||||
******************************************************
|
||||
|
||||
Composable Kernel supports the use of custom types that provide a way to implement specialized numerical formats.
|
||||
|
||||
To use custom types, a C++ type that implements the necessary operations for tensor computations needs to be created. These should include:
|
||||
|
||||
* Constructors and initialization methods
|
||||
* Arithmetic operators if the type will be used in computational operations
|
||||
* Any conversion functions needed to interface with other parts of an application
|
||||
|
||||
For example, to create a complex half-precision type:
|
||||
|
||||
.. code:: cpp
|
||||
|
||||
struct complex_half_t
|
||||
{
|
||||
half_t real;
|
||||
half_t img;
|
||||
};
|
||||
|
||||
struct complex_half_t
|
||||
{
|
||||
using type = half_t;
|
||||
type real;
|
||||
type img;
|
||||
|
||||
complex_half_t() : real{type{}}, img{type{}} {}
|
||||
complex_half_t(type real_init, type img_init) : real{real_init}, img{img_init} {}
|
||||
};
|
||||
|
||||
Custom types can be particularly useful for specialized applications such as complex number arithmetic,
|
||||
custom quantization schemes, or domain-specific number representations.
|
||||
|
||||
69
docs/reference/Composable_Kernel_supported_scalar_types.rst
Normal file
69
docs/reference/Composable_Kernel_supported_scalar_types.rst
Normal file
@@ -0,0 +1,69 @@
|
||||
.. meta::
|
||||
:description: Composable Kernel supported scalar types
|
||||
:keywords: composable kernel, scalar, data types, support, CK, ROCm
|
||||
|
||||
***************************************************
|
||||
Composable Kernel supported scalar data types
|
||||
***************************************************
|
||||
|
||||
The Composable Kernel library provides support for the following scalar data types:
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
:widths: 25 15 60
|
||||
|
||||
* - Type
|
||||
- Bit Width
|
||||
- Description
|
||||
|
||||
* - ``double``
|
||||
- 64-bit
|
||||
- Standard IEEE 754 double precision floating point
|
||||
|
||||
* - ``float``
|
||||
- 32-bit
|
||||
- Standard IEEE 754 single precision floating point
|
||||
|
||||
* - ``int32_t``
|
||||
- 32-bit
|
||||
- Standard signed 32-bit integer
|
||||
|
||||
* - ``int8_t``
|
||||
- 8-bit
|
||||
- Standard signed 8-bit integer
|
||||
|
||||
* - ``uint8_t``
|
||||
- 8-bit
|
||||
- Standard unsigned 8-bit integer
|
||||
|
||||
* - ``bool``
|
||||
- 1-bit
|
||||
- Boolean type
|
||||
|
||||
* - ``ck::half_t``
|
||||
- 16-bit
|
||||
- IEEE 754 half precision floating point with 5 exponent bits, 10 mantissa bits, and 1 sign bit
|
||||
|
||||
* - ``ck::bhalf_t``
|
||||
- 16-bit
|
||||
- Brain floating point with 8 exponent bits, 7 mantissa bits, and 1 sign bit
|
||||
|
||||
* - ``ck::f8_t``
|
||||
- 8-bit
|
||||
- 8-bit floating point (E4M3 format) with 4 exponent bits, 3 mantissa bits, and 1 sign bit
|
||||
|
||||
* - ``ck::bf8_t``
|
||||
- 8-bit
|
||||
- 8-bit brain floating point (E5M2 format) with 5 exponent bits, 2 mantissa bits, and 1 sign bit
|
||||
|
||||
* - ``ck::f4_t``
|
||||
- 4-bit
|
||||
- 4-bit floating point format (E2M1 format) with 2 exponent bits, 1 mantissa bit, and 1 sign bit
|
||||
|
||||
* - ``ck::f6_t``
|
||||
- 6-bit
|
||||
- 6-bit floating point format (E2M3 format) with 2 exponent bits, 3 mantissa bits, and 1 sign bit
|
||||
|
||||
* - ``ck::bf6_t``
|
||||
- 6-bit
|
||||
- 6-bit brain floating point format (E3M2 format) with 3 exponent bits, 2 mantissa bits, and 1 sign bit
|
||||
16
docs/reference/Composable_Kernel_vector_utilities.rst
Normal file
16
docs/reference/Composable_Kernel_vector_utilities.rst
Normal file
@@ -0,0 +1,16 @@
|
||||
.. meta::
|
||||
:description: Composable Kernel supported precision types and custom type support
|
||||
:keywords: composable kernel, precision, data types, ROCm
|
||||
|
||||
******************************************************
|
||||
Composable Kernel vector template utilities
|
||||
******************************************************
|
||||
|
||||
Composable Kernel includes template utilities for creating vector types with customizable widths. These template utilities also flatten nested vector types into a single, wider vector, preventing the creation of vectors of vectors.
|
||||
|
||||
Vectors composed of supported scalar and custom types can be created with the ``ck::vector_type`` template.
|
||||
|
||||
For example, ``ck::vector_type<float, 4>`` creates a vector composed of four floats and ``ck::vector_type<ck::half_t, 8>`` creates a vector composed of eight half-precision scalars.
|
||||
|
||||
For vector operations to be valid, the underlying types must be either a :doc:`supported scalar type <Composable_Kernel_supported_scalar_types>` or :doc:`a custom type <Composable_Kernel_custom_types>` that implements the required operations.
|
||||
|
||||
@@ -3,34 +3,43 @@ defaults:
|
||||
root: index
|
||||
subtrees:
|
||||
|
||||
- caption: Conceptual
|
||||
entries:
|
||||
- file: conceptual/what-is-ck.rst
|
||||
title: What is Composable Kernel?
|
||||
|
||||
- caption: Install
|
||||
entries:
|
||||
- file: install/dockerhub.rst
|
||||
title: Docker Hub
|
||||
- file: install/Composable-Kernel-prerequisites.rst
|
||||
title: Composable Kernel prerequisites
|
||||
- file: install/Composable-Kernel-install.rst
|
||||
title: Build and install Composable Kernel
|
||||
- file: install/Composable-Kernel-Docker.rst
|
||||
title: Composable Kernel Docker images
|
||||
|
||||
- caption: CK API Reference
|
||||
- caption: Conceptual
|
||||
entries:
|
||||
- file: reference/Supported_Primitives_Guide.rst
|
||||
title: Supported Primitives
|
||||
- file: reference/API_Reference_Guide.rst
|
||||
title: API Reference
|
||||
- file: reference/wrapper.rst
|
||||
title: Wrapper
|
||||
- file: conceptual/Composable-Kernel-structure.rst
|
||||
title: Composable Kernel structure
|
||||
- file: conceptual/Composable-Kernel-math.rst
|
||||
title: Composable Kernel mathematical basis
|
||||
|
||||
- caption: Tutorial
|
||||
entries:
|
||||
- file: tutorial/tutorial_hello_world.rst
|
||||
title: Hello World Tutorial
|
||||
- file: tutorial/Composable-Kernel-examples.rst
|
||||
title: Composable Kernel examples
|
||||
|
||||
- caption: Reference
|
||||
entries:
|
||||
- file: reference/Composable_Kernel_supported_scalar_types.rst
|
||||
title: Composable Kernel scalar types
|
||||
- file: reference/Composable_Kernel_custom_types.rst
|
||||
title: Composable Kernel custom types
|
||||
- file: reference/Composable_Kernel_vector_utilities.rst
|
||||
title: Composable Kernel vector utilities
|
||||
- file: reference/Composable-Kernel-wrapper.rst
|
||||
title: Composable Kernel wrapper
|
||||
- file: doxygen/html/annotated.rst
|
||||
title: Composable Kernel class list
|
||||
|
||||
- caption: About
|
||||
entries:
|
||||
- file: Contributors_Guide.rst
|
||||
title: Contributing to CK
|
||||
title: Contributing to Composable Kernel
|
||||
- file: license.rst
|
||||
title: License
|
||||
|
||||
40
docs/tutorial/Composable-Kernel-examples.rst
Normal file
40
docs/tutorial/Composable-Kernel-examples.rst
Normal file
@@ -0,0 +1,40 @@
|
||||
.. meta::
|
||||
:description: Composable Kernel examples and tests
|
||||
:keywords: composable kernel, CK, ROCm, API, examples, tests
|
||||
|
||||
********************************************************************
|
||||
Composable Kernel examples and tests
|
||||
********************************************************************
|
||||
|
||||
After :doc:`building and installing Composable Kernel <../install/Composable-Kernel-install>`, the examples and tests will be moved to ``/opt/rocm/bin/``.
|
||||
|
||||
All tests have the prefix ``test`` and all examples have the prefix ``example``.
|
||||
|
||||
Use ``ctest`` with no arguments to run all examples and tests, or use ``ctest -R`` to run a single test. For example:
|
||||
|
||||
.. code:: shell
|
||||
|
||||
ctest -R test_gemm_fp16
|
||||
|
||||
Examples can be run individually as well. For example:
|
||||
|
||||
.. code:: shell
|
||||
|
||||
./bin/example_gemm_xdl_fp16 1 1 1
|
||||
|
||||
For instructions on how to run individual examples and tests, see their README files in the |example|_ and |test|_ GitHub folders.
|
||||
|
||||
To run smoke tests, use ``make smoke``.
|
||||
|
||||
To run regression tests, use ``make regression``.
|
||||
|
||||
In general, tests that run for under thirty seconds are included in the smoke tests and tests that run for over thirty seconds are included in the regression tests.
|
||||
|
||||
.. |example| replace:: ``example``
|
||||
.. _example: https://github.com/ROCm/composable_kernel/tree/develop/example
|
||||
|
||||
.. |client_example| replace:: ``client_example``
|
||||
.. _client_example: https://github.com/ROCm/composable_kernel/tree/develop/client_example
|
||||
|
||||
.. |test| replace:: ``test``
|
||||
.. _test: https://github.com/ROCm/composable_kernel/tree/develop/test
|
||||
@@ -1,165 +0,0 @@
|
||||
.. meta::
|
||||
:description: Composable Kernel documentation and API reference library
|
||||
:keywords: composable kernel, CK, ROCm, API, documentation
|
||||
|
||||
.. _hello-world:
|
||||
|
||||
********************************************************************
|
||||
Hello World Tutorial
|
||||
********************************************************************
|
||||
|
||||
This tutorial is for engineers dealing with artificial intelligence and machine learning who
|
||||
would like to optimize pipelines and improve performance using the Composable
|
||||
Kernel (CK) library. This tutorial provides an introduction to the CK library. You will build the library and run some examples using a "Hello World" example.
|
||||
|
||||
Description
|
||||
===========
|
||||
|
||||
Modern AI technology solves more and more problems in a variety of fields, but crafting fast and
|
||||
efficient workflows is still challenging. CK can make the AI workflow fast
|
||||
and efficient. CK is a collection of optimized AI operator kernels with tools to create
|
||||
new kernels. The library has components required for modern neural network architectures
|
||||
including matrix multiplication, convolution, contraction, reduction, attention modules, a variety of activation functions, and fused operators.
|
||||
|
||||
CK library acceleration features are based on:
|
||||
|
||||
* Layered structure
|
||||
* Tile-based computation model
|
||||
* Tensor coordinate transformation
|
||||
* Hardware acceleration use
|
||||
* Support of low precision data types including fp16, bf16, int8 and int4
|
||||
|
||||
If you need more technical details and benchmarking results read the following
|
||||
`blog post <https://community.amd.com/t5/instinct-accelerators/amd-composable-kernel-library-efficient-fused-kernels-for-ai/ba-p/553224>`_.
|
||||
|
||||
To download the library visit the `composable_kernel repository <https://github.com/ROCm/composable_kernel>`_.
|
||||
|
||||
Hardware targets
|
||||
================
|
||||
|
||||
CK library fully supports `gfx908` and `gfx90a` GPU architectures, while only some operators are
|
||||
supported for `gfx1030` devices. Check your hardware to determine the target GPU architecture.
|
||||
|
||||
========== =========
|
||||
GPU Target AMD GPU
|
||||
========== =========
|
||||
gfx908 Radeon Instinct MI100
|
||||
gfx90a Radeon Instinct MI210, MI250, MI250X
|
||||
gfx1030 Radeon PRO V620, W6800, W6800X, W6800X Duo, W6900X, RX 6800, RX 6800 XT, RX 6900 XT, RX 6900 XTX, RX 6950 XT
|
||||
========== =========
|
||||
|
||||
There are also `cloud options <https://aws.amazon.com/ec2/instance-types/g4/>`_ you can find if
|
||||
you don't have an AMD GPU at hand.
|
||||
|
||||
Build the library
|
||||
=================
|
||||
|
||||
This tutorial is based on the use of docker images as explained in :ref:`docker-hub`. Download a docker image suitable for your OS and ROCm release, run or start the docker container, and then resume the tutorial from this point.
|
||||
|
||||
.. note::
|
||||
|
||||
You can also `install ROCm <https://rocm.docs.amd.com/projects/install-on-linux/en/latest/>`_ on your system, clone the `Composable Kernel repository <https://github.com/ROCm/composable_kernel.git>`_ on GitHub, and use that to build and run the examples using the commands described below.
|
||||
|
||||
Both the docker container and GitHub repository include the Composable Kernel library. Navigate to the library::
|
||||
|
||||
cd composable_kernel/
|
||||
|
||||
Create and change to a ``build`` directory::
|
||||
|
||||
mkdir build && cd build
|
||||
|
||||
The previous section discussed supported GPU architecture. Once you decide which hardware targets are needed, run CMake using the ``GPU_TARGETS`` flag::
|
||||
|
||||
cmake \
|
||||
-D CMAKE_PREFIX_PATH=/opt/rocm \
|
||||
-D CMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc \
|
||||
-D CMAKE_CXX_FLAGS="-O3" \
|
||||
-D CMAKE_BUILD_TYPE=Release \
|
||||
-D BUILD_DEV=OFF \
|
||||
-D GPU_TARGETS="gfx908;gfx90a;gfx1030" ..
|
||||
|
||||
If everything goes well the CMake command will return::
|
||||
|
||||
-- Configuring done
|
||||
-- Generating done
|
||||
-- Build files have been written to: "/root/workspace/composable_kernel/build"
|
||||
|
||||
Finally, you can build examples and tests::
|
||||
|
||||
make -j examples tests
|
||||
|
||||
When complete you should see::
|
||||
|
||||
Scanning dependencies of target tests
|
||||
[100%] Built target tests
|
||||
|
||||
Run examples and tests
|
||||
======================
|
||||
|
||||
Examples are listed as test cases as well, so you can run all examples and tests with::
|
||||
|
||||
ctest
|
||||
|
||||
You can check the list of all tests by running::
|
||||
|
||||
ctest -N
|
||||
|
||||
You can also run examples separately as shown in the following example execution::
|
||||
|
||||
./bin/example_gemm_xdl_fp16 1 1 1
|
||||
|
||||
The arguments ``1 1 1`` mean that you want to run this example in the mode: verify results with CPU, initialize matrices with integers, and benchmark the kernel execution. You can play around with these parameters and see how output and execution results change.
|
||||
|
||||
If you have a device based on `gfx908` or `gfx90a` architecture, and if the example runs as expected, you should see something like::
|
||||
|
||||
a_m_k: dim 2, lengths {3840, 4096}, strides {4096, 1}
|
||||
b_k_n: dim 2, lengths {4096, 4096}, strides {4096, 1}
|
||||
c_m_n: dim 2, lengths {3840, 4096}, strides {4096, 1}
|
||||
Perf: 1.08153 ms, 119.136 TFlops, 89.1972 GB/s, DeviceGemm_Xdl_CShuffle<Default, 256, 256, 128, 32, 8, 2, 32, 32, 4, 2, 8, 4, 1, 2> LoopScheduler: Interwave, PipelineVersion: v1
|
||||
|
||||
However, running it on a `gfx1030` device should result in the following::
|
||||
|
||||
a_m_k: dim 2, lengths {3840, 4096}, strides {4096, 1}
|
||||
b_k_n: dim 2, lengths {4096, 4096}, strides {1, 4096}
|
||||
c_m_n: dim 2, lengths {3840, 4096}, strides {4096, 1}
|
||||
DeviceGemmXdl<256, 256, 128, 4, 8, 32, 32, 4, 2> NumPrefetch: 1, LoopScheduler: Default, PipelineVersion: v1 does not support this problem
|
||||
|
||||
Don't worry, some operators are supported on `gfx1030` architecture, so you can run a
|
||||
separate example like::
|
||||
|
||||
./bin/example_gemm_dl_fp16 1 1 1
|
||||
|
||||
and it should return something like::
|
||||
|
||||
a_m_k: dim 2, lengths {3840, 4096}, strides {1, 4096}
|
||||
b_k_n: dim 2, lengths {4096, 4096}, strides {4096, 1}
|
||||
c_m_n: dim 2, lengths {3840, 4096}, strides {4096, 1}
|
||||
arg.a_grid_desc_k0_m0_m1_k1_{2048, 3840, 2}
|
||||
arg.b_grid_desc_k0_n0_n1_k1_{2048, 4096, 2}
|
||||
arg.c_grid_desc_m_n_{ 3840, 4096}
|
||||
launch_and_time_kernel: grid_dim {960, 1, 1}, block_dim {256, 1, 1}
|
||||
Warm up 1 time
|
||||
Start running 10 times...
|
||||
Perf: 3.65695 ms, 35.234 TFlops, 26.3797 GB/s, DeviceGemmDl<256, 128, 128, 16, 2, 4, 4, 1>
|
||||
|
||||
.. note::
|
||||
|
||||
A new CMake flag ``DL_KERNELS`` has been added to the latest versions of CK. If you do not see the above results when running ``example_gemm_dl_fp16``, you might need to add ``-D DL_KERNELS=ON`` to your CMake command to build the operators supported on the `gfx1030` architecture.
|
||||
|
||||
You can also run a separate test::
|
||||
|
||||
ctest -R test_gemm_fp16
|
||||
|
||||
If everything goes well you should see something like::
|
||||
|
||||
Start 121: test_gemm_fp16
|
||||
1/1 Test #121: test_gemm_fp16 ................... Passed 51.81 sec
|
||||
|
||||
100% tests passed, 0 tests failed out of 1
|
||||
|
||||
Summary
|
||||
=======
|
||||
|
||||
In this tutorial you took the first look at the Composable Kernel library, built it on your system and ran some examples and tests. In the next tutorial you will run kernels with different configurations to find out the best one for your hardware and task.
|
||||
|
||||
P.S.: If you are running on a cloud instance, don't forget to switch off the cloud instance.
|
||||
Reference in New Issue
Block a user