Merge pull request #2398 from spolifroni-amd/spolifroni-amd/updates-for-642

Spolifroni amd/updates for 642
This commit is contained in:
spolifroni-amd
2025-06-25 10:25:05 -04:00
committed by GitHub
20 changed files with 407 additions and 415 deletions

2
.gitignore vendored
View File

@@ -55,6 +55,8 @@ _static/
_templates/
_toc.yml
_doxygen/
docs/doxygen/html
docs/doxygen/xml
# JetBrains IDE
.idea/

View File

@@ -20,10 +20,11 @@ Tejash Shah, 2019-2020
Xiaoyan Zhou, 2020
[Jianfeng Yan](https://github.com/j4yan), 2021-2022
[Jun Liu](https://github.com/junliume), 2021-2024
## Product Manager
[Jun Liu](https://github.com/junliume)
[John Afaganis](https://github.com/afagaj)
## Contributors

View File

@@ -104,6 +104,7 @@ Docker images are available on [DockerHub](https://hub.docker.com/r/rocm/composa
```bash
make -j install
```
**[See Note on -j](#notes)**
## Optional post-install steps
@@ -146,7 +147,8 @@ Docker images are available on [DockerHub](https://hub.docker.com/r/rocm/composa
python3 -m sphinx -T -E -b html -d _build/doctrees -D language=en . _build/html
```
Note the `-j` option for building with multiple threads in parallel, which speeds up the build significantly.
### Notes
The `-j` option builds with multiple threads in parallel, which speeds up the build significantly.
However, `-j` launches unlimited number of threads, which can cause the build to run out of memory and
crash. On average, you should expect each thread to use ~2Gb of RAM.
Depending on the number of CPU cores and the amount of RAM on your system, you may want to
@@ -211,4 +213,4 @@ script/uninstall_precommit.sh
```
If you need to temporarily disable pre-commit hooks, you can add the `--no-verify` option to the
`git commit` command.
`git commit` command.

View File

@@ -1,18 +1,15 @@
.. meta::
:description: Composable Kernel documentation and API reference library
:keywords: composable kernel, CK, ROCm, API, documentation
:description: Composable Kernel mathematical basis
:keywords: composable kernel, CK, ROCm, API, mathematics, algorithm
.. _supported-primitives:
********************************************************************
Supported Primitives Guide
Composable Kernel mathematical basis
********************************************************************
This document contains details of supported primitives in Composable Kernel (CK). In contrast to the API Reference Guide, the Supported Primitives Guide is an introduction to the math which underpins the algorithms implemented in CK.
This is an introduction to the math which underpins the algorithms implemented in Composable Kernel.
------------
Softmax
------------
For vectors :math:`x^{(1)}, x^{(2)}, \ldots, x^{(T)}` of size :math:`B` you can decompose the
softmax of concatenated :math:`x = [ x^{(1)}\ | \ \ldots \ | \ x^{(T)} ]` as,

View File

@@ -0,0 +1,29 @@
.. meta::
:description: Composable Kernel structure
:keywords: composable kernel, CK, ROCm, API, structure
.. _what-is-ck:
********************************************************************
Composable Kernel structure
********************************************************************
The Composable Kernel library uses a tile-based programming model and tensor coordinate transformation to achieve performance portability and code maintainability. Tensor coordinate transformation is a complexity reduction technique for complex machine learning operators.
.. image:: ../data/ck_component.png
:alt: CK Components
The Composable Kernel library consists of four layers:
* a templated tile operator layer
* a templated kernel and invoker layer
* an instantiated kernel and invoker layer
* a client API layer.
A wrapper component is included to simplify tensor transform operations.
.. image:: ../data/ck_layer.png
:alt: CK Layers

View File

@@ -1,41 +0,0 @@
.. meta::
:description: Composable Kernel documentation and API reference library
:keywords: composable kernel, CK, ROCm, API, documentation
.. _what-is-ck:
********************************************************************
What is the Composable Kernel library
********************************************************************
Methodology
===========
The Composable Kernel (CK) library provides a programming model for writing performance critical kernels for machine learning workloads across multiple architectures including GPUs and CPUs, through general purpose kernel languages like HIP C++.
CK utilizes two concepts to achieve performance portability and code maintainability:
* A tile-based programming model
* Algorithm complexity reduction for complex ML operators using an innovative technique called
"Tensor Coordinate Transformation".
.. image:: ../data/ck_component.png
:alt: CK Components
Code Structure
==============
The CK library is structured into 4 layers:
* "Templated Tile Operators" layer
* "Templated Kernel and Invoker" layer
* "Instantiated Kernel and Invoker" layer
* "Client API" layer
It also includes a simple wrapper component used to perform tensor transform operations more easily and with fewer lines of code.
.. image:: ../data/ck_layer.png
:alt: CK Layers

View File

@@ -32,19 +32,19 @@ DOXYFILE_ENCODING = UTF-8
# title of most generated pages and in a few other places.
# The default value is: My Project.
PROJECT_NAME = "ck"
PROJECT_NAME = "Composable Kernel"
# The PROJECT_NUMBER tag can be used to enter a project or revision number. This
# could be handy for archiving the generated documentation or if some version
# control system is used.
PROJECT_NUMBER = v3.0.1.0
PROJECT_NUMBER =
# Using the PROJECT_BRIEF tag one can provide an optional one line description
# for a project that appears at the top of each page and should give viewer a
# quick idea about the purpose of the project. Keep the description short.
PROJECT_BRIEF = "prototype interfaces compatible with ROCm platform and HiP"
PROJECT_BRIEF = "Prototype interfaces compatible with ROCm platform and HiP"
# With the PROJECT_LOGO tag one can specify a logo or an icon that is included
# in the documentation. The maximum height of the logo should not exceed 55
@@ -775,12 +775,10 @@ WARN_LOGFILE =
# spaces. See also FILE_PATTERNS and EXTENSION_MAPPING
# Note: If this tag is empty the current directory is searched.
INPUT = ../../include/ck/tensor_operation/gpu/grid \
../../include/ck/tensor_operation/gpu/block \
../../include/ck/tensor_operation/gpu/thread \
INPUT = ../../include \
../../include/ck/ \
../../library/include/ck/library/utility \
../../include/ck/wrapper
../../include/ck_tile
# This tag can be used to specify the character encoding of the source files
# that doxygen parses. Internally doxygen uses the UTF-8 encoding. Doxygen uses
@@ -959,7 +957,17 @@ FILTER_SOURCE_PATTERNS =
# (index.html). This can be useful if you have a project on for instance GitHub
# and want to reuse the introduction page also for the doxygen output.
USE_MDFILE_AS_MAINPAGE = ../README.md
USE_MDFILE_AS_MAINPAGE =
# The Fortran standard specifies that for fixed formatted Fortran code all
# characters from position 72 are to be considered as comment. A common
# extension is to allow longer lines before the automatic comment starts. The
# setting FORTRAN_COMMENT_AFTER will also make it possible that longer lines can
# be processed before the automatic comment starts.
# Minimum value: 7, maximum value: 10000, default value: 72.
FORTRAN_COMMENT_AFTER = 72
#---------------------------------------------------------------------------
# Configuration options related to source browsing
@@ -1179,7 +1187,20 @@ HTML_EXTRA_STYLESHEET =
# files will be copied as-is; there are no commands or markers available.
# This tag requires that the tag GENERATE_HTML is set to YES.
HTML_EXTRA_FILES =
HTML_EXTRA_FILES = ../_doxygen/extra_stylesheet.css
# The HTML_COLORSTYLE tag can be used to specify if the generated HTML output
# should be rendered with a dark or light theme.
# Possible values are: LIGHT always generate light mode output, DARK always
# generate dark mode output, AUTO_LIGHT automatically set the mode according to
# the user preference, use light mode if no preference is set (the default),
# AUTO_DARK automatically set the mode according to the user preference, use
# dark mode if no preference is set and TOGGLE allow to user to switch between
# light and dark mode via a button.
# The default value is: AUTO_LIGHT.
# This tag requires that the tag GENERATE_HTML is set to YES.
HTML_COLORSTYLE = LIGHT
# The HTML_COLORSTYLE_HUE tag controls the color of the HTML output. Doxygen
# will adjust the colors in the style sheet and background images according to

View File

@@ -8,31 +8,38 @@
Composable Kernel User Guide
********************************************************************
The Composable Kernel (CK) library provides a programming model for writing performance critical kernels for machine learning workloads across multiple architectures including GPUs and CPUs, through general purpose kernel languages like HIP C++. This document contains instructions for installing, using, and contributing to the Composable Kernel project. To learn more see :ref:`what-is-ck`.
The Composable Kernel library provides a programming model for writing performance critical kernels for machine learning workloads across multiple architectures including GPUs and CPUs, through general purpose kernel languages such as `HIP C++ <https://rocm.docs.amd.com/projects/HIP/en/latest/index.html>`_.
The CK documentation is structured as follows:
The Composable Kernel repository is located at `https://github.com/ROCm/composable_kernel <https://github.com/ROCm/composable_kernel>`_.
.. grid:: 2
:gutter: 3
.. grid-item-card:: Installation
.. grid-item-card:: Install
* :ref:`docker-hub`
* :doc:`Composable Kernel prerequisites <./install/Composable-Kernel-prerequisites>`
* :doc:`Build and install Composable Kernel <./install/Composable-Kernel-install>`
* :doc:`Build and install Composable Kernel on a Docker image <./install/Composable-Kernel-Docker>`
.. grid-item-card:: Conceptual
* :ref:`what-is-ck`
* :doc:`Composable Kernel structure <./conceptual/Composable-Kernel-structure>`
* :doc:`Composable Kernel mathematical basis <./conceptual/Composable-Kernel-math>`
.. grid-item-card:: API reference
.. grid-item-card:: Tutorials
* :ref:`supported-primitives`
* :ref:`api-reference`
* :ref:`wrapper`
* :doc:`Composable Kernel examples and tests <./tutorial/Composable-Kernel-examples>`
.. grid-item-card:: Tutorial
* :ref:`hello-world`
.. grid-item-card:: Reference
* :doc:`Composable Kernel supported scalar types <./reference/Composable_Kernel_supported_scalar_types>`
* :doc:`Composable Kernel custom types <./reference/Composable_Kernel_custom_types>`
* :doc:`Composable Kernel vector utilities <./reference/Composable_Kernel_vector_utilities>`
* :ref:`wrapper`
* :doc:`Composable Kernel API reference <./doxygen/html/namespace_c_k>`
* :doc:`CK Tile API reference <./doxygen/html/namespaceck__tile>`
* :doc:`Composable Kernel complete API class list <./doxygen/html/annotated>`
To contribute to the documentation refer to `Contributing to ROCm <https://rocm.docs.amd.com/en/latest/contribute/contributing.html>`_.
You can find licensing information on the `Licensing <https://rocm.docs.amd.com/en/latest/about/license.html>`_ page.

View File

@@ -0,0 +1,16 @@
.. meta::
:description: Composable Kernel docker files
:keywords: composable kernel, CK, ROCm, API, docker
.. _docker-hub:
********************************************************************
Composable Kernel Docker containers
********************************************************************
Docker images that include all the required prerequisites for building Composable Kernel are available on `Docker Hub <https://hub.docker.com/r/rocm/composable_kernel/tags>`_.
The images also contain `ROCm <https://rocm.docs.amd.com/en/latest/index.html>`_, `CMake <https://cmake.org/getting-started/>`_, and the `ROCm LLVM compiler infrastructure <https://rocm.docs.amd.com/projects/llvm-project/en/latest/index.html>`_.
Composable Kernel Docker images are named according to their operating system and ROCm version. For example, a Docker image named ``ck_ub22.04_rocm6.3`` would correspond to an Ubuntu 22.04 image with ROCm 6.3.

View File

@@ -0,0 +1,72 @@
.. meta::
:description: Composable Kernel build and install
:keywords: composable kernel, CK, ROCm, API, documentation, install
******************************************************
Building and installing Composable Kernel with CMake
******************************************************
Before you begin, clone the `Composable Kernel GitHub repository <https://github.com/ROCm/composable_kernel.git>`_ and create a ``build`` directory in its root:
.. code:: shell
git clone https://github.com/ROCm/composable_kernel.git
cd composable_kernel
mkdir build
Change directory to the ``build`` directory and generate the makefile using the ``cmake`` command. Two build options are required:
* ``CMAKE_PREFIX_PATH``: The ROCm installation path. ROCm is installed in ``/opt/rocm`` by default.
* ``CMAKE_CXX_COMPILER``: The path to the Clang compiler. Clang is found at ``/opt/rocm/llvm/bin/clang++`` by default.
.. code:: shell
cd build
cmake ../. -D CMAKE_PREFIX_PATH="/opt/rocm" -D CMAKE_CXX_COMPILER="/opt/rocm/llvm/bin/clang++" [-D<OPTION1=VALUE1> [-D<OPTION2=VALUE2>] ...]
Other build options are:
* ``DISABLE_DL_KERNELS``: Set this to "ON" to not build deep learning (DL) and data parallel primitive (DPP) instances.
.. note::
DL and DPP instances are useful on architectures that don't support XDL or WMMA.
* ``CK_USE_FP8_ON_UNSUPPORTED_ARCH``: Set to ``ON`` to build FP8 data type instances on gfx90a without native FP8 support.
* ``GPU_TARGETS``: Target architectures. Target architectures in this list must all be different versions of the same architectures. Enclose the list of targets in quotation marks. Separate multiple targets with semicolons (``;``). For example, ``cmake -D GPU_TARGETS="gfx908;gfx90a"``. This option is required to build tests and examples.
* ``GPU_ARCHS``: Target architectures. Target architectures in this list are not limited to different versions of the same architectures. Enclose the list of targets in quotation marks. Separate multiple targets with semicolons (``;``). For example, ``cmake -D GPU_TARGETS="gfx908;gfx1100"``.
* ``CMAKE_BUILD_TYPE``: The build type. Can be ``None``, ``Release``, ``Debug``, ``RelWithDebInfo``, or ``MinSizeRel``. CMake will use ``Release`` by default.
.. Note::
If neither ``GPU_TARGETS`` nor ``GPU_ARCHS`` is specified, Composable Kernel will be built for all targets supported by the compiler.
Build Composable Kernel using the generated makefile. This will build the library, the examples, and the tests, and save them to ``bin``.
.. code:: shell
make -j20
The ``-j`` option speeds up the build by using multiple threads in parallel. For example, ``-j20`` uses twenty threads in parallel. On average, each thread will use 2GB of memory. Make sure that the number of threads you use doesn't exceed the available memory in your system.
Using ``-j`` alone will launch an unlimited number of threads and is not recommended.
Install the Composable Kernel library:
.. code:: shell
make install
After running ``make install``, the Composable Kernel files will be saved to the following locations:
* Library files: ``/opt/rocm/lib/``
* Header files: ``/opt/rocm/include/ck/`` and ``/opt/rocm/include/ck_tile/``
* Examples, tests, and ckProfiler: ``/opt/rocm/bin/``
For information about ckProfiler, see `the ckProfiler readme file <https://github.com/ROCm/composable_kernel/blob/develop/profiler/README.md>`_.
For information about running the examples and tests, see :doc:`Composable Kernel examples and tests <../tutorial/Composable-Kernel-examples>`.

View File

@@ -0,0 +1,32 @@
.. meta::
:description: Composable Kernel prerequisites
:keywords: composable kernel, CK, ROCm, API, documentation, prerequisites
******************************************************
Composable Kernel prerequisites
******************************************************
Docker images that include all the required prerequisites for building Composable Kernel are available on `Docker Hub <https://hub.docker.com/r/rocm/composable_kernel/tags>`_.
The following prerequisites are required to build and install Composable Kernel:
* cmake
* hip-rocclr
* iputils-ping
* jq
* libelf-dev
* libncurses5-dev
* libnuma-dev
* libpthread-stubs0-dev
* llvm-amdgpu
* mpich
* net-tools
* python3
* python3-dev
* python3-pip
* redis
* rocm-llvm-dev
* zlib1g-dev
* libzstd-dev
* openssh-server
* clang-format-12

View File

@@ -1,101 +0,0 @@
.. meta::
:description: Composable Kernel documentation and API reference library
:keywords: composable kernel, CK, ROCm, API, documentation
.. _docker-hub:
********************************************************************
CK Docker Hub
********************************************************************
Why do I need this?
===================
To make things simpler, and bring Composable Kernel and its dependencies together,
docker images can be found on `Docker Hub <https://hub.docker.com/r/rocm/composable_kernel/tags>`_. Docker images provide a complete image of the OS, the Composable Kernel library, and its dependencies in a single downloadable file.
Refer to `Docker Overview <https://docs.docker.com/get-started/overview/>`_ for more information on Docker images and containers.
Which image is right for me?
============================
The image naming includes information related to the docker image.
For example ``ck_ub20.04_rocm6.0`` indicates the following:
* ``ck`` - made for running Composable Kernel;
* ``ub20.04`` - based on Ubuntu 20.04;
* ``rocm6.0`` - ROCm platform version 6.0.
Download a docker image suitable for your OS and ROCm release, run or start the docker container, and then resume the tutorial from this point. Use the ``docker pull`` command to download the file::
docker pull rocm/composable_kernel:ck_ub20.04_rocm6.0
What is inside the image?
-------------------------
The docker images have everything you need for running CK including:
* `ROCm <https://rocm.docs.amd.com/en/latest/index.html>`_
* `CMake <https://cmake.org/getting-started/>`_
* `Compiler <https://github.com/ROCm/llvm-project>`_
* `Composable Kernel library <https://github.com/ROCm/composable_kernel>`_
Running the docker container
============================
After downloading the docker image, you can start the container using one of a number of commands. Start with the ``docker run`` command as shown below::
docker run \
-it \
--privileged \
--group-add sudo \
-w /root/workspace \
-v ${PATH_TO_LOCAL_WORKSPACE}:/root/workspace \
rocm/composable_kernel:ck_ub20.04_rocm6.0 \
/bin/bash
After starting the bash shell, the docker container current folder is `~/workspace`. The library path is ``~/workspace/composable_kernel``. Navigate to the library to begin the tutorial as explained in :ref:`hello-world`:
.. note::
If your current folder is different from `${HOME}`, adjust the line ``-v ${HOME}:/root/workspace`` in the ``docker run`` command to fit your folder structure.
Stop and restart the docker image
=================================
After finishing the tutorial, or just when you have completed your work session, you can close the docker container, or stop the docker container to restart it at another time. Closing the docker container means that it is still in the active state, and can be resumed from where you left it. Stopping the container closes it, and returns the image to its initial state.
Use the ``Ctrl-D`` option to exit the container, while leaving it active, so you can return to the container in its current state to resume the tutorial, or pickup your project where you left off.
To restart the active container use the ``docker exec`` command to specify the container name and options as follows::
docker exec -it <container_name> bash
Where:
* `exec` is the docker command
* `-it` is the interactive option for `exec`
* `<container_name>` specifies an active container on the system
* `bash` specifies the command to run in the interactive shell
.. note::
You can use the ``docker container ls`` command to list the active containers on the system.
To start a container from the image, use the ``docker start`` command::
docker start <container_name>
Then use the docker exec command as shown above to start the bash shell.
Use the ``docker stop`` command to stop the container and restore the image to its initial state::
docker stop <container_name>
Editing the docker image
=======================
If you want to customize the docker image, edit the
`Dockerfile <https://github.com/ROCm/composable_kernel/blob/develop/Dockerfile>`_
from the GitHub repository to suit your needs.

View File

@@ -1,48 +0,0 @@
.. meta::
:description: Composable Kernel documentation and API reference library
:keywords: composable kernel, CK, ROCm, API, documentation
.. _api-reference:
********************************************************************
API reference guide
********************************************************************
This document contains details of the APIs for the Composable Kernel (CK) library and introduces
some of the key design principles that are used to write new classes that extend CK functionality.
=================
CK Datatypes
=================
-----------------
DeviceMem
-----------------
.. doxygenstruct:: DeviceMem
---------------------------
Kernels For Flashattention
---------------------------
The Flashattention algorithm is defined in :cite:t:`dao2022flashattention`. This section lists
the classes that are used in the CK GPU implementation of Flashattention.
**Gridwise classes**
.. doxygenstruct:: ck::GridwiseBatchedGemmSoftmaxGemm_Xdl_CShuffle
**Blockwise classes**
.. doxygenstruct:: ck::ThreadGroupTensorSliceTransfer_v4r1
.. doxygenstruct:: ck::BlockwiseGemmXdlops_v2
.. doxygenstruct:: ck::BlockwiseSoftmax
**Threadwise classes**
.. doxygenstruct:: ck::ThreadwiseTensorSliceTransfer_StaticToStatic
.. bibliography::

View File

@@ -1,20 +1,15 @@
.. meta::
:description: Composable Kernel documentation and API reference library
:keywords: composable kernel, CK, ROCm, API, documentation
:description: Composable Kernel wrapper
:keywords: composable kernel, CK, ROCm, API, wrapper
.. _wrapper:
********************************************************************
Wrapper
Composable Kernel wrapper
********************************************************************
-------------------------------------
Description
-------------------------------------
The CK library provides a lightweight wrapper for more complex operations implemented in
the library.
The Composable Kernel library provides a lightweight wrapper to simplify the more complex operations.
Example:

View File

@@ -0,0 +1,39 @@
.. meta::
:description: Composable Kernel supported custom types
:keywords: composable kernel, custom, data types, support, CK, ROCm
******************************************************
Composable Kernel custom data types
******************************************************
Composable Kernel supports the use of custom types that provide a way to implement specialized numerical formats.
To use custom types, a C++ type that implements the necessary operations for tensor computations needs to be created. These should include:
* Constructors and initialization methods
* Arithmetic operators if the type will be used in computational operations
* Any conversion functions needed to interface with other parts of an application
For example, to create a complex half-precision type:
.. code:: cpp
struct complex_half_t
{
half_t real;
half_t img;
};
struct complex_half_t
{
using type = half_t;
type real;
type img;
complex_half_t() : real{type{}}, img{type{}} {}
complex_half_t(type real_init, type img_init) : real{real_init}, img{img_init} {}
};
Custom types can be particularly useful for specialized applications such as complex number arithmetic,
custom quantization schemes, or domain-specific number representations.

View File

@@ -0,0 +1,69 @@
.. meta::
:description: Composable Kernel supported scalar types
:keywords: composable kernel, scalar, data types, support, CK, ROCm
***************************************************
Composable Kernel supported scalar data types
***************************************************
The Composable Kernel library provides support for the following scalar data types:
.. list-table::
:header-rows: 1
:widths: 25 15 60
* - Type
- Bit Width
- Description
* - ``double``
- 64-bit
- Standard IEEE 754 double precision floating point
* - ``float``
- 32-bit
- Standard IEEE 754 single precision floating point
* - ``int32_t``
- 32-bit
- Standard signed 32-bit integer
* - ``int8_t``
- 8-bit
- Standard signed 8-bit integer
* - ``uint8_t``
- 8-bit
- Standard unsigned 8-bit integer
* - ``bool``
- 1-bit
- Boolean type
* - ``ck::half_t``
- 16-bit
- IEEE 754 half precision floating point with 5 exponent bits, 10 mantissa bits, and 1 sign bit
* - ``ck::bhalf_t``
- 16-bit
- Brain floating point with 8 exponent bits, 7 mantissa bits, and 1 sign bit
* - ``ck::f8_t``
- 8-bit
- 8-bit floating point (E4M3 format) with 4 exponent bits, 3 mantissa bits, and 1 sign bit
* - ``ck::bf8_t``
- 8-bit
- 8-bit brain floating point (E5M2 format) with 5 exponent bits, 2 mantissa bits, and 1 sign bit
* - ``ck::f4_t``
- 4-bit
- 4-bit floating point format (E2M1 format) with 2 exponent bits, 1 mantissa bit, and 1 sign bit
* - ``ck::f6_t``
- 6-bit
- 6-bit floating point format (E2M3 format) with 2 exponent bits, 3 mantissa bits, and 1 sign bit
* - ``ck::bf6_t``
- 6-bit
- 6-bit brain floating point format (E3M2 format) with 3 exponent bits, 2 mantissa bits, and 1 sign bit

View File

@@ -0,0 +1,16 @@
.. meta::
:description: Composable Kernel supported precision types and custom type support
:keywords: composable kernel, precision, data types, ROCm
******************************************************
Composable Kernel vector template utilities
******************************************************
Composable Kernel includes template utilities for creating vector types with customizable widths. These template utilities also flatten nested vector types into a single, wider vector, preventing the creation of vectors of vectors.
Vectors composed of supported scalar and custom types can be created with the ``ck::vector_type`` template.
For example, ``ck::vector_type<float, 4>`` creates a vector composed of four floats and ``ck::vector_type<ck::half_t, 8>`` creates a vector composed of eight half-precision scalars.
For vector operations to be valid, the underlying types must be either a :doc:`supported scalar type <Composable_Kernel_supported_scalar_types>` or :doc:`a custom type <Composable_Kernel_custom_types>` that implements the required operations.

View File

@@ -3,34 +3,43 @@ defaults:
root: index
subtrees:
- caption: Conceptual
entries:
- file: conceptual/what-is-ck.rst
title: What is Composable Kernel?
- caption: Install
entries:
- file: install/dockerhub.rst
title: Docker Hub
- file: install/Composable-Kernel-prerequisites.rst
title: Composable Kernel prerequisites
- file: install/Composable-Kernel-install.rst
title: Build and install Composable Kernel
- file: install/Composable-Kernel-Docker.rst
title: Composable Kernel Docker images
- caption: CK API Reference
- caption: Conceptual
entries:
- file: reference/Supported_Primitives_Guide.rst
title: Supported Primitives
- file: reference/API_Reference_Guide.rst
title: API Reference
- file: reference/wrapper.rst
title: Wrapper
- file: conceptual/Composable-Kernel-structure.rst
title: Composable Kernel structure
- file: conceptual/Composable-Kernel-math.rst
title: Composable Kernel mathematical basis
- caption: Tutorial
entries:
- file: tutorial/tutorial_hello_world.rst
title: Hello World Tutorial
- file: tutorial/Composable-Kernel-examples.rst
title: Composable Kernel examples
- caption: Reference
entries:
- file: reference/Composable_Kernel_supported_scalar_types.rst
title: Composable Kernel scalar types
- file: reference/Composable_Kernel_custom_types.rst
title: Composable Kernel custom types
- file: reference/Composable_Kernel_vector_utilities.rst
title: Composable Kernel vector utilities
- file: reference/Composable-Kernel-wrapper.rst
title: Composable Kernel wrapper
- file: doxygen/html/annotated.rst
title: Composable Kernel class list
- caption: About
entries:
- file: Contributors_Guide.rst
title: Contributing to CK
title: Contributing to Composable Kernel
- file: license.rst
title: License

View File

@@ -0,0 +1,40 @@
.. meta::
:description: Composable Kernel examples and tests
:keywords: composable kernel, CK, ROCm, API, examples, tests
********************************************************************
Composable Kernel examples and tests
********************************************************************
After :doc:`building and installing Composable Kernel <../install/Composable-Kernel-install>`, the examples and tests will be moved to ``/opt/rocm/bin/``.
All tests have the prefix ``test`` and all examples have the prefix ``example``.
Use ``ctest`` with no arguments to run all examples and tests, or use ``ctest -R`` to run a single test. For example:
.. code:: shell
ctest -R test_gemm_fp16
Examples can be run individually as well. For example:
.. code:: shell
./bin/example_gemm_xdl_fp16 1 1 1
For instructions on how to run individual examples and tests, see their README files in the |example|_ and |test|_ GitHub folders.
To run smoke tests, use ``make smoke``.
To run regression tests, use ``make regression``.
In general, tests that run for under thirty seconds are included in the smoke tests and tests that run for over thirty seconds are included in the regression tests.
.. |example| replace:: ``example``
.. _example: https://github.com/ROCm/composable_kernel/tree/develop/example
.. |client_example| replace:: ``client_example``
.. _client_example: https://github.com/ROCm/composable_kernel/tree/develop/client_example
.. |test| replace:: ``test``
.. _test: https://github.com/ROCm/composable_kernel/tree/develop/test

View File

@@ -1,165 +0,0 @@
.. meta::
:description: Composable Kernel documentation and API reference library
:keywords: composable kernel, CK, ROCm, API, documentation
.. _hello-world:
********************************************************************
Hello World Tutorial
********************************************************************
This tutorial is for engineers dealing with artificial intelligence and machine learning who
would like to optimize pipelines and improve performance using the Composable
Kernel (CK) library. This tutorial provides an introduction to the CK library. You will build the library and run some examples using a "Hello World" example.
Description
===========
Modern AI technology solves more and more problems in a variety of fields, but crafting fast and
efficient workflows is still challenging. CK can make the AI workflow fast
and efficient. CK is a collection of optimized AI operator kernels with tools to create
new kernels. The library has components required for modern neural network architectures
including matrix multiplication, convolution, contraction, reduction, attention modules, a variety of activation functions, and fused operators.
CK library acceleration features are based on:
* Layered structure
* Tile-based computation model
* Tensor coordinate transformation
* Hardware acceleration use
* Support of low precision data types including fp16, bf16, int8 and int4
If you need more technical details and benchmarking results read the following
`blog post <https://community.amd.com/t5/instinct-accelerators/amd-composable-kernel-library-efficient-fused-kernels-for-ai/ba-p/553224>`_.
To download the library visit the `composable_kernel repository <https://github.com/ROCm/composable_kernel>`_.
Hardware targets
================
CK library fully supports `gfx908` and `gfx90a` GPU architectures, while only some operators are
supported for `gfx1030` devices. Check your hardware to determine the target GPU architecture.
========== =========
GPU Target AMD GPU
========== =========
gfx908 Radeon Instinct MI100
gfx90a Radeon Instinct MI210, MI250, MI250X
gfx1030 Radeon PRO V620, W6800, W6800X, W6800X Duo, W6900X, RX 6800, RX 6800 XT, RX 6900 XT, RX 6900 XTX, RX 6950 XT
========== =========
There are also `cloud options <https://aws.amazon.com/ec2/instance-types/g4/>`_ you can find if
you don't have an AMD GPU at hand.
Build the library
=================
This tutorial is based on the use of docker images as explained in :ref:`docker-hub`. Download a docker image suitable for your OS and ROCm release, run or start the docker container, and then resume the tutorial from this point.
.. note::
You can also `install ROCm <https://rocm.docs.amd.com/projects/install-on-linux/en/latest/>`_ on your system, clone the `Composable Kernel repository <https://github.com/ROCm/composable_kernel.git>`_ on GitHub, and use that to build and run the examples using the commands described below.
Both the docker container and GitHub repository include the Composable Kernel library. Navigate to the library::
cd composable_kernel/
Create and change to a ``build`` directory::
mkdir build && cd build
The previous section discussed supported GPU architecture. Once you decide which hardware targets are needed, run CMake using the ``GPU_TARGETS`` flag::
cmake \
-D CMAKE_PREFIX_PATH=/opt/rocm \
-D CMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc \
-D CMAKE_CXX_FLAGS="-O3" \
-D CMAKE_BUILD_TYPE=Release \
-D BUILD_DEV=OFF \
-D GPU_TARGETS="gfx908;gfx90a;gfx1030" ..
If everything goes well the CMake command will return::
-- Configuring done
-- Generating done
-- Build files have been written to: "/root/workspace/composable_kernel/build"
Finally, you can build examples and tests::
make -j examples tests
When complete you should see::
Scanning dependencies of target tests
[100%] Built target tests
Run examples and tests
======================
Examples are listed as test cases as well, so you can run all examples and tests with::
ctest
You can check the list of all tests by running::
ctest -N
You can also run examples separately as shown in the following example execution::
./bin/example_gemm_xdl_fp16 1 1 1
The arguments ``1 1 1`` mean that you want to run this example in the mode: verify results with CPU, initialize matrices with integers, and benchmark the kernel execution. You can play around with these parameters and see how output and execution results change.
If you have a device based on `gfx908` or `gfx90a` architecture, and if the example runs as expected, you should see something like::
a_m_k: dim 2, lengths {3840, 4096}, strides {4096, 1}
b_k_n: dim 2, lengths {4096, 4096}, strides {4096, 1}
c_m_n: dim 2, lengths {3840, 4096}, strides {4096, 1}
Perf: 1.08153 ms, 119.136 TFlops, 89.1972 GB/s, DeviceGemm_Xdl_CShuffle<Default, 256, 256, 128, 32, 8, 2, 32, 32, 4, 2, 8, 4, 1, 2> LoopScheduler: Interwave, PipelineVersion: v1
However, running it on a `gfx1030` device should result in the following::
a_m_k: dim 2, lengths {3840, 4096}, strides {4096, 1}
b_k_n: dim 2, lengths {4096, 4096}, strides {1, 4096}
c_m_n: dim 2, lengths {3840, 4096}, strides {4096, 1}
DeviceGemmXdl<256, 256, 128, 4, 8, 32, 32, 4, 2> NumPrefetch: 1, LoopScheduler: Default, PipelineVersion: v1 does not support this problem
Don't worry, some operators are supported on `gfx1030` architecture, so you can run a
separate example like::
./bin/example_gemm_dl_fp16 1 1 1
and it should return something like::
a_m_k: dim 2, lengths {3840, 4096}, strides {1, 4096}
b_k_n: dim 2, lengths {4096, 4096}, strides {4096, 1}
c_m_n: dim 2, lengths {3840, 4096}, strides {4096, 1}
arg.a_grid_desc_k0_m0_m1_k1_{2048, 3840, 2}
arg.b_grid_desc_k0_n0_n1_k1_{2048, 4096, 2}
arg.c_grid_desc_m_n_{ 3840, 4096}
launch_and_time_kernel: grid_dim {960, 1, 1}, block_dim {256, 1, 1}
Warm up 1 time
Start running 10 times...
Perf: 3.65695 ms, 35.234 TFlops, 26.3797 GB/s, DeviceGemmDl<256, 128, 128, 16, 2, 4, 4, 1>
.. note::
A new CMake flag ``DL_KERNELS`` has been added to the latest versions of CK. If you do not see the above results when running ``example_gemm_dl_fp16``, you might need to add ``-D DL_KERNELS=ON`` to your CMake command to build the operators supported on the `gfx1030` architecture.
You can also run a separate test::
ctest -R test_gemm_fp16
If everything goes well you should see something like::
Start 121: test_gemm_fp16
1/1 Test #121: test_gemm_fp16 ................... Passed 51.81 sec
100% tests passed, 0 tests failed out of 1
Summary
=======
In this tutorial you took the first look at the Composable Kernel library, built it on your system and ran some examples and tests. In the next tutorial you will run kernels with different configurations to find out the best one for your hardware and task.
P.S.: If you are running on a cloud instance, don't forget to switch off the cloud instance.