mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-17 19:40:04 +00:00
Reorganize project folders (#6)
This commit is contained in:
105
docs/Contributors_Guide.rst
Normal file
105
docs/Contributors_Guide.rst
Normal file
@@ -0,0 +1,105 @@
|
||||
.. meta::
|
||||
:description: Composable Kernel documentation and API reference library
|
||||
:keywords: composable kernel, CK, ROCm, API, documentation
|
||||
|
||||
.. _contributing-to:
|
||||
|
||||
********************************************************************
|
||||
Contributor's guide
|
||||
********************************************************************
|
||||
|
||||
This chapter explains the rules for contributing to the Composable Kernel project, and how to contribute.
|
||||
|
||||
Getting started
|
||||
===============
|
||||
|
||||
#. **Documentation:** Before contributing to the library, familiarize yourself with the
|
||||
`Composable Kernel User Guide <https://rocm.docs.amd.com/projects/composable_kernel/en/latest/>`_.
|
||||
It provides insight into the core concepts, environment configuration, and steps to obtain or
|
||||
build the library. You can also find some of this information in the
|
||||
`README file <https://github.com/ROCm/composable_kernel/blob/develop/README.md>`_
|
||||
on the project's GitHub page.
|
||||
#. **Additional reading:** The blog post `AMD Composable Kernel library: efficient fused kernels for AI apps with just a few lines of code <https://community.amd.com/t5/instinct-accelerators/amd-composable-kernel-library-efficient-fused-kernels-for-ai/ba-p/553224>`_ provides a deeper understanding of the CK library and showcases its performance capabilities.
|
||||
<https://community.amd.com/t5/instinct-accelerators/amd-composable-kernel-library-efficient-fused-kernels-for-ai/ba-p/553224>`_
|
||||
from the AMD Community portal. It offers a deeper understanding of the library's objectives and showcases its performance capabilities.
|
||||
#. **General information:** For broader information about AMD products, consider exploring the
|
||||
`AMD Developer Central portal <https://www.amd.com/en/developer.html>`_.
|
||||
|
||||
How to contribute
|
||||
===================
|
||||
|
||||
You can make an impact by reporting issues or proposing code enhancements through pull requests.
|
||||
|
||||
Reporting issues
|
||||
----------------
|
||||
|
||||
Use `Github issues <https://github.com/ROCm/composable_kernel/issues>`_
|
||||
to track public bugs and enhancement requests.
|
||||
|
||||
If you encounter an issue with the library, please check if the problem has already been
|
||||
reported by searching existing issues on GitHub. If your issue seems unique, please submit a new
|
||||
issue. All reported issues must include:
|
||||
|
||||
* A comprehensive description of the problem, including:
|
||||
|
||||
* What did you observe?
|
||||
* Why do you think it is a bug (if it seems like one)?
|
||||
* What did you expect to happen? What would indicate the resolution of the problem?
|
||||
* Are there any known workarounds?
|
||||
|
||||
* Your configuration details, including:
|
||||
|
||||
* Which GPU are you using?
|
||||
* Which OS version are you on?
|
||||
* Which ROCm version are you using?
|
||||
* Are you using a Docker image? If so, which one?
|
||||
|
||||
* Steps to reproduce the issue, including:
|
||||
|
||||
* What actions trigger the issue? What are the reproduction steps?
|
||||
|
||||
* If you build the library from scratch, what CMake command did you use?
|
||||
|
||||
* How frequently does this issue happen? Does it reproduce every time? Or is it a sporadic issue?
|
||||
|
||||
Before submitting any issue, ensure you have addressed all relevant questions from the checklist.
|
||||
|
||||
Creating Pull Requests
|
||||
----------------------
|
||||
|
||||
You can submit `Pull Requests (PR) on GitHub
|
||||
<https://github.com/ROCm/composable_kernel/pulls>`_.
|
||||
|
||||
All contributors are required to develop their changes on a separate branch and then create a
|
||||
pull request to merge their changes into the `develop` branch, which is the default
|
||||
development branch in the Composable Kernel project. All external contributors must use their own
|
||||
forks of the project to develop their changes.
|
||||
|
||||
When submitting a Pull Request you should:
|
||||
|
||||
* Describe the change providing information about the motivation for the change and a general
|
||||
description of all code modifications.
|
||||
|
||||
* Verify and test the change:
|
||||
|
||||
* Run any relevant existing tests.
|
||||
* Write new tests if added functionality is not covered by current tests.
|
||||
|
||||
* Ensure your changes align with the coding style defined in the ``.clang-format`` file located in
|
||||
the project's root directory. We leverage `pre-commit` to run `clang-format` automatically. We
|
||||
highly recommend contributors utilize this method to maintain consistent code formatting.
|
||||
Instructions on setting up `pre-commit` can be found in the project's
|
||||
`README file <https://github.com/ROCm/composable_kernel/blob/develop/README.md>`_
|
||||
|
||||
* Link your PR to any related issues:
|
||||
|
||||
* If there is an issue that is resolved by your change, please provide a link to the issue in
|
||||
the description of your pull request.
|
||||
|
||||
* For larger contributions, structure your change into a sequence of smaller, focused commits, each
|
||||
addressing a particular aspect or fix.
|
||||
|
||||
Following the above guidelines ensures a seamless review process and faster assistance from our
|
||||
end.
|
||||
|
||||
Thank you for your commitment to enhancing the Composable Kernel project!
|
||||
77
docs/conceptual/Composable-Kernel-math.rst
Normal file
77
docs/conceptual/Composable-Kernel-math.rst
Normal file
@@ -0,0 +1,77 @@
|
||||
.. meta::
|
||||
:description: Composable Kernel mathematical basis
|
||||
:keywords: composable kernel, CK, ROCm, API, mathematics, algorithm
|
||||
|
||||
.. _supported-primitives:
|
||||
|
||||
********************************************************************
|
||||
Composable Kernel mathematical basis
|
||||
********************************************************************
|
||||
|
||||
This is an introduction to the math which underpins the algorithms implemented in Composable Kernel.
|
||||
|
||||
|
||||
For vectors :math:`x^{(1)}, x^{(2)}, \ldots, x^{(T)}` of size :math:`B` you can decompose the
|
||||
softmax of concatenated :math:`x = [ x^{(1)}\ | \ \ldots \ | \ x^{(T)} ]` as,
|
||||
|
||||
.. math::
|
||||
:nowrap:
|
||||
|
||||
\begin{align}
|
||||
m(x) & = m( [ x^{(1)}\ | \ \ldots \ | \ x^{(T)} ] ) = \max( m(x^{(1)}),\ldots, m(x^{(T)}) ) \\
|
||||
f(x) & = [\exp( m(x^{(1)}) - m(x) ) f( x^{(1)} )\ | \ \ldots \ | \ \exp( m(x^{(T)}) - m(x) ) f( x^{(T)} )] \\
|
||||
z(x) & = \exp( m(x^{(1)}) - m(x) )\ z(x^{(1)}) + \ldots + \exp( m(x^{(T)}) - m(x) )\ z(x^{(1)}) \\
|
||||
\operatorname{softmax}(x) &= f(x)\ / \ z(x)
|
||||
\end{align}
|
||||
|
||||
where :math:`f(x^{(j)}) = \exp( x^{(j)} - m(x^{(j)}) )` is of size :math:`B` and
|
||||
:math:`z(x^{(j)}) = f(x_1^{(j)})+ \ldots+ f(x_B^{(j)})` is a scalar.
|
||||
|
||||
For a matrix :math:`X` composed of :math:`T_r \times T_c` tiles, :math:`X_{ij}`, of size
|
||||
:math:`B_r \times B_c` you can compute the row-wise softmax as follows.
|
||||
|
||||
For :math:`j` from :math:`1` to :math:`T_c`, and :math:`i` from :math:`1` to :math:`T_r` calculate,
|
||||
|
||||
.. math::
|
||||
:nowrap:
|
||||
|
||||
\begin{align}
|
||||
\tilde{m}_{ij} &= \operatorname{rowmax}( X_{ij} ) \\
|
||||
\tilde{P}_{ij} &= \exp(X_{ij} - \tilde{m}_{ij} ) \\
|
||||
\tilde{z}_{ij} &= \operatorname{rowsum}( P_{ij} ) \\
|
||||
\end{align}
|
||||
|
||||
If :math:`j=1`, initialize running max, running sum, and the first column block of the output,
|
||||
|
||||
.. math::
|
||||
:nowrap:
|
||||
|
||||
\begin{align}
|
||||
m_i &= \tilde{m}_{i1} \\
|
||||
z_i &= \tilde{z}_{i1} \\
|
||||
\tilde{Y}_{i1} &= \diag(\tilde{z}_{ij})^{-1} \tilde{P}_{i1}
|
||||
\end{align}
|
||||
|
||||
Else if :math:`j>1`,
|
||||
|
||||
1. Update running max, running sum and column blocks :math:`k=1` to :math:`k=j-1`
|
||||
|
||||
.. math::
|
||||
:nowrap:
|
||||
|
||||
\begin{align}
|
||||
m^{new}_i &= \max(m_i, \tilde{m}_{ij} ) \\
|
||||
z^{new}_i &= \exp(m_i - m^{new}_i)\ z_i + \exp( \tilde{m}_{ij} - m^{new}_i )\ \tilde{z}_{ij} \\
|
||||
Y_{ik} &= \diag(z^{new}_{i})^{-1} \diag(z_{i}) \exp(m_i - m^{new}_i)\ Y_{ik}
|
||||
\end{align}
|
||||
|
||||
2. Initialize column block :math:`j` of output and reset running max and running sum variables:
|
||||
|
||||
.. math::
|
||||
:nowrap:
|
||||
|
||||
\begin{align}
|
||||
\tilde{Y}_{ij} &= \diag(z^{new}_{i})^{-1} \exp(\tilde{m}_{ij} - m^{new}_i ) \tilde{P}_{ij} \\
|
||||
z_i &= z^{new}_i \\
|
||||
m_i &= m^{new}_i \\
|
||||
\end{align}
|
||||
29
docs/conceptual/Composable-Kernel-structure.rst
Normal file
29
docs/conceptual/Composable-Kernel-structure.rst
Normal file
@@ -0,0 +1,29 @@
|
||||
.. meta::
|
||||
:description: Composable Kernel structure
|
||||
:keywords: composable kernel, CK, ROCm, API, structure
|
||||
|
||||
.. _what-is-ck:
|
||||
|
||||
********************************************************************
|
||||
Composable Kernel structure
|
||||
********************************************************************
|
||||
|
||||
The Composable Kernel library uses a tile-based programming model and tensor coordinate transformation to achieve performance portability and code maintainability. Tensor coordinate transformation is a complexity reduction technique for complex machine learning operators.
|
||||
|
||||
|
||||
.. image:: ../data/ck_component.png
|
||||
:alt: CK Components
|
||||
|
||||
|
||||
The Composable Kernel library consists of four layers:
|
||||
|
||||
* a templated tile operator layer
|
||||
* a templated kernel and invoker layer
|
||||
* an instantiated kernel and invoker layer
|
||||
* a client API layer.
|
||||
|
||||
A wrapper component is included to simplify tensor transform operations.
|
||||
|
||||
.. image:: ../data/ck_layer.png
|
||||
:alt: CK Layers
|
||||
|
||||
50
docs/conf.py
Normal file
50
docs/conf.py
Normal file
@@ -0,0 +1,50 @@
|
||||
# Configuration file for the Sphinx documentation builder.
|
||||
#
|
||||
# This file only contains a selection of the most common options. For a full
|
||||
# list see the documentation:
|
||||
# https://www.sphinx-doc.org/en/master/usage/configuration.html
|
||||
|
||||
import re
|
||||
|
||||
from rocm_docs import ROCmDocs
|
||||
|
||||
html_theme_options = {"flavor": "list"}
|
||||
|
||||
with open('../CMakeLists.txt', encoding='utf-8') as f:
|
||||
match = re.search(r'.*set\(version ([0-9.]+)[^0-9.]+', f.read())
|
||||
if not match:
|
||||
raise ValueError("VERSION not found!")
|
||||
version_number = match[1]
|
||||
left_nav_title = f"Composable Kernel {version_number} Documentation"
|
||||
|
||||
# for PDF output on Read the Docs
|
||||
project = "Composable Kernel Documentation"
|
||||
author = "Advanced Micro Devices, Inc."
|
||||
copyright = "Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved."
|
||||
version = version_number
|
||||
release = version_number
|
||||
|
||||
external_toc_path = "./sphinx/_toc.yml"
|
||||
|
||||
docs_core = ROCmDocs(left_nav_title)
|
||||
docs_core.run_doxygen(doxygen_root="doxygen", doxygen_path="doxygen/xml")
|
||||
docs_core.enable_api_reference()
|
||||
docs_core.setup()
|
||||
|
||||
external_projects_current_project = "composable_kernel"
|
||||
|
||||
mathjax3_config = {
|
||||
'tex': {
|
||||
'macros': {
|
||||
'diag': '\\operatorname{diag}',
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
for sphinx_var in ROCmDocs.SPHINX_VARS:
|
||||
globals()[sphinx_var] = getattr(docs_core, sphinx_var)
|
||||
|
||||
extensions += ['sphinxcontrib.bibtex']
|
||||
bibtex_bibfiles = ['refs.bib']
|
||||
|
||||
cpp_id_attributes = ["__global__", "__device__", "__host__"]
|
||||
BIN
docs/data/ck_component.png
Normal file
BIN
docs/data/ck_component.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 552 KiB |
BIN
docs/data/ck_layer.png
Normal file
BIN
docs/data/ck_layer.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 536 KiB |
2779
docs/doxygen/Doxyfile
Normal file
2779
docs/doxygen/Doxyfile
Normal file
File diff suppressed because it is too large
Load Diff
43
docs/index.rst
Normal file
43
docs/index.rst
Normal file
@@ -0,0 +1,43 @@
|
||||
.. meta::
|
||||
:description: Composable Kernel documentation and API reference library
|
||||
:keywords: composable kernel, CK, ROCm, API, documentation
|
||||
|
||||
.. _composable-kernel:
|
||||
|
||||
********************************************************************
|
||||
Composable Kernel User Guide
|
||||
********************************************************************
|
||||
|
||||
The Composable Kernel library provides a programming model for writing performance critical kernels for machine learning workloads across multiple architectures including GPUs and CPUs, through general purpose kernel languages such as `HIP C++ <https://rocm.docs.amd.com/projects/HIP/en/latest/index.html>`_.
|
||||
|
||||
The Composable Kernel repository is located at `https://github.com/ROCm/composable_kernel <https://github.com/ROCm/composable_kernel>`_.
|
||||
|
||||
.. grid:: 2
|
||||
:gutter: 3
|
||||
|
||||
.. grid-item-card:: Install
|
||||
|
||||
* :doc:`Composable Kernel prerequisites <./install/Composable-Kernel-prerequisites>`
|
||||
* :doc:`Build and install Composable Kernel <./install/Composable-Kernel-install>`
|
||||
* :doc:`Build and install Composable Kernel on a Docker image <./install/Composable-Kernel-Docker>`
|
||||
|
||||
.. grid-item-card:: Conceptual
|
||||
|
||||
* :doc:`Composable Kernel structure <./conceptual/Composable-Kernel-structure>`
|
||||
* :doc:`Composable Kernel mathematical basis <./conceptual/Composable-Kernel-math>`
|
||||
|
||||
.. grid-item-card:: Tutorials
|
||||
|
||||
* :doc:`Composable Kernel examples and tests <./tutorial/Composable-Kernel-examples>`
|
||||
|
||||
.. grid-item-card:: Reference
|
||||
|
||||
* :doc:`Composable Kernel supported scalar types <./reference/Composable_Kernel_supported_scalar_types>`
|
||||
* :doc:`Composable Kernel custom types <./reference/Composable_Kernel_custom_types>`
|
||||
* :doc:`Composable Kernel vector utilities <./reference/Composable_Kernel_vector_utilities>`
|
||||
* :ref:`api-reference`
|
||||
* :ref:`wrapper`
|
||||
|
||||
To contribute to the documentation refer to `Contributing to ROCm <https://rocm.docs.amd.com/en/latest/contribute/contributing.html>`_.
|
||||
|
||||
You can find licensing information on the `Licensing <https://rocm.docs.amd.com/en/latest/about/license.html>`_ page.
|
||||
16
docs/install/Composable-Kernel-Docker.rst
Normal file
16
docs/install/Composable-Kernel-Docker.rst
Normal file
@@ -0,0 +1,16 @@
|
||||
.. meta::
|
||||
:description: Composable Kernel docker files
|
||||
:keywords: composable kernel, CK, ROCm, API, docker
|
||||
|
||||
.. _docker-hub:
|
||||
|
||||
********************************************************************
|
||||
Composable Kernel Docker containers
|
||||
********************************************************************
|
||||
|
||||
Docker images that include all the required prerequisites for building Composable Kernel are available on `Docker Hub <https://hub.docker.com/r/rocm/composable_kernel/tags>`_.
|
||||
|
||||
The images also contain `ROCm <https://rocm.docs.amd.com/en/latest/index.html>`_, `CMake <https://cmake.org/getting-started/>`_, and the `ROCm LLVM compiler infrastructure <https://rocm.docs.amd.com/projects/llvm-project/en/latest/index.html>`_.
|
||||
|
||||
Composable Kernel Docker images are named according to their operating system and ROCm version. For example, a Docker image named ``ck_ub22.04_rocm6.3`` would correspond to an Ubuntu 22.04 image with ROCm 6.3.
|
||||
|
||||
72
docs/install/Composable-Kernel-install.rst
Normal file
72
docs/install/Composable-Kernel-install.rst
Normal file
@@ -0,0 +1,72 @@
|
||||
.. meta::
|
||||
:description: Composable Kernel build and install
|
||||
:keywords: composable kernel, CK, ROCm, API, documentation, install
|
||||
|
||||
******************************************************
|
||||
Building and installing Composable Kernel with CMake
|
||||
******************************************************
|
||||
|
||||
Before you begin, clone the `Composable Kernel GitHub repository <https://github.com/ROCm/composable_kernel.git>`_ and create a ``build`` directory in its root:
|
||||
|
||||
.. code:: shell
|
||||
|
||||
git clone https://github.com/ROCm/composable_kernel.git
|
||||
cd composable_kernel
|
||||
mkdir build
|
||||
|
||||
Change directory to the ``build`` directory and generate the makefile using the ``cmake`` command. Two build options are required:
|
||||
|
||||
* ``CMAKE_PREFIX_PATH``: The ROCm installation path. ROCm is installed in ``/opt/rocm`` by default.
|
||||
* ``CMAKE_CXX_COMPILER``: The path to the Clang compiler. Clang is found at ``/opt/rocm/llvm/bin/clang++`` by default.
|
||||
|
||||
|
||||
.. code:: shell
|
||||
|
||||
cd build
|
||||
cmake ../. -D CMAKE_PREFIX_PATH="/opt/rocm" -D CMAKE_CXX_COMPILER="/opt/rocm/llvm/bin/clang++" [-D<OPTION1=VALUE1> [-D<OPTION2=VALUE2>] ...]
|
||||
|
||||
|
||||
Other build options are:
|
||||
|
||||
* ``DISABLE_DL_KERNELS``: Set this to "ON" to not build deep learning (DL) and data parallel primitive (DPP) instances.
|
||||
|
||||
.. note::
|
||||
|
||||
DL and DPP instances are useful on architectures that don't support XDL or WMMA.
|
||||
|
||||
* ``CK_USE_FP8_ON_UNSUPPORTED_ARCH``: Set to ``ON`` to build FP8 data type instances on gfx90a without native FP8 support.
|
||||
* ``GPU_TARGETS``: Target architectures. Target architectures in this list must all be different versions of the same architectures. Enclose the list of targets in quotation marks. Separate multiple targets with semicolons (``;``). For example, ``cmake -D GPU_TARGETS="gfx908;gfx90a"``. This option is required to build tests and examples.
|
||||
* ``GPU_ARCHS``: Target architectures. Target architectures in this list are not limited to different versions of the same architectures. Enclose the list of targets in quotation marks. Separate multiple targets with semicolons (``;``). For example, ``cmake -D GPU_TARGETS="gfx908;gfx1100"``.
|
||||
* ``CMAKE_BUILD_TYPE``: The build type. Can be ``None``, ``Release``, ``Debug``, ``RelWithDebInfo``, or ``MinSizeRel``. CMake will use ``Release`` by default.
|
||||
|
||||
.. Note::
|
||||
|
||||
If neither ``GPU_TARGETS`` nor ``GPU_ARCHS`` is specified, Composable Kernel will be built for all targets supported by the compiler.
|
||||
|
||||
Build Composable Kernel using the generated makefile. This will build the library, the examples, and the tests, and save them to ``bin``.
|
||||
|
||||
.. code:: shell
|
||||
|
||||
make -j20
|
||||
|
||||
The ``-j`` option speeds up the build by using multiple threads in parallel. For example, ``-j20`` uses twenty threads in parallel. On average, each thread will use 2GB of memory. Make sure that the number of threads you use doesn't exceed the available memory in your system.
|
||||
|
||||
Using ``-j`` alone will launch an unlimited number of threads and is not recommended.
|
||||
|
||||
Install the Composable Kernel library:
|
||||
|
||||
.. code:: shell
|
||||
|
||||
make install
|
||||
|
||||
After running ``make install``, the Composable Kernel files will be saved to the following locations:
|
||||
|
||||
* Library files: ``/opt/rocm/lib/``
|
||||
* Header files: ``/opt/rocm/include/ck/`` and ``/opt/rocm/include/ck_tile/``
|
||||
* Examples, tests, and ckProfiler: ``/opt/rocm/bin/``
|
||||
|
||||
For information about ckProfiler, see `the ckProfiler readme file <https://github.com/ROCm/composable_kernel/blob/develop/profiler/README.md>`_.
|
||||
|
||||
For information about running the examples and tests, see :doc:`Composable Kernel examples and tests <../tutorial/Composable-Kernel-examples>`.
|
||||
|
||||
|
||||
32
docs/install/Composable-Kernel-prerequisites.rst
Normal file
32
docs/install/Composable-Kernel-prerequisites.rst
Normal file
@@ -0,0 +1,32 @@
|
||||
.. meta::
|
||||
:description: Composable Kernel prerequisites
|
||||
:keywords: composable kernel, CK, ROCm, API, documentation, prerequisites
|
||||
|
||||
******************************************************
|
||||
Composable Kernel prerequisites
|
||||
******************************************************
|
||||
|
||||
Docker images that include all the required prerequisites for building Composable Kernel are available on `Docker Hub <https://hub.docker.com/r/rocm/composable_kernel/tags>`_.
|
||||
|
||||
The following prerequisites are required to build and install Composable Kernel:
|
||||
|
||||
* cmake
|
||||
* hip-rocclr
|
||||
* iputils-ping
|
||||
* jq
|
||||
* libelf-dev
|
||||
* libncurses5-dev
|
||||
* libnuma-dev
|
||||
* libpthread-stubs0-dev
|
||||
* llvm-amdgpu
|
||||
* mpich
|
||||
* net-tools
|
||||
* python3
|
||||
* python3-dev
|
||||
* python3-pip
|
||||
* redis
|
||||
* rocm-llvm-dev
|
||||
* zlib1g-dev
|
||||
* libzstd-dev
|
||||
* openssh-server
|
||||
* clang-format-12
|
||||
11
docs/license.rst
Normal file
11
docs/license.rst
Normal file
@@ -0,0 +1,11 @@
|
||||
.. meta::
|
||||
:description: Composable Kernel documentation and API reference library
|
||||
:keywords: composable kernel, CK, ROCm, API, documentation
|
||||
|
||||
.. _license:
|
||||
|
||||
********************************************************************
|
||||
License
|
||||
********************************************************************
|
||||
|
||||
.. include:: ../LICENSE
|
||||
42
docs/reference/Composable-Kernel-API-reference.rst
Normal file
42
docs/reference/Composable-Kernel-API-reference.rst
Normal file
@@ -0,0 +1,42 @@
|
||||
.. meta::
|
||||
:description: Composable Kernel documentation and API reference library
|
||||
:keywords: composable kernel, CK, ROCm, API, documentation
|
||||
|
||||
.. _api-reference:
|
||||
|
||||
********************************************************************
|
||||
Composable Kernel API reference guide
|
||||
********************************************************************
|
||||
|
||||
This document contains details of the APIs for the Composable Kernel library and introduces some of the key design principles that are used to write new classes that extend the functionality of the Composable Kernel library.
|
||||
|
||||
=================
|
||||
DeviceMem
|
||||
=================
|
||||
|
||||
.. doxygenstruct:: DeviceMem
|
||||
|
||||
=============================
|
||||
Kernels For Flashattention
|
||||
=============================
|
||||
|
||||
The Flashattention algorithm is defined in :cite:t:`dao2022flashattention`. This section lists
|
||||
the classes that are used in the CK GPU implementation of Flashattention.
|
||||
|
||||
**Gridwise classes**
|
||||
|
||||
.. doxygenstruct:: ck::GridwiseBatchedGemmSoftmaxGemm_Xdl_CShuffle
|
||||
|
||||
**Blockwise classes**
|
||||
|
||||
.. doxygenstruct:: ck::ThreadGroupTensorSliceTransfer_v4r1
|
||||
|
||||
.. doxygenstruct:: ck::BlockwiseGemmXdlops_v2
|
||||
|
||||
.. doxygenstruct:: ck::BlockwiseSoftmax
|
||||
|
||||
**Threadwise classes**
|
||||
|
||||
.. doxygenstruct:: ck::ThreadwiseTensorSliceTransfer_StaticToStatic
|
||||
|
||||
.. bibliography::
|
||||
89
docs/reference/Composable-Kernel-wrapper.rst
Normal file
89
docs/reference/Composable-Kernel-wrapper.rst
Normal file
@@ -0,0 +1,89 @@
|
||||
.. meta::
|
||||
:description: Composable Kernel wrapper
|
||||
:keywords: composable kernel, CK, ROCm, API, wrapper
|
||||
|
||||
.. _wrapper:
|
||||
|
||||
********************************************************************
|
||||
Composable Kernel wrapper
|
||||
********************************************************************
|
||||
|
||||
|
||||
The Composable Kernel library provides a lightweight wrapper to simplify the more complex operations.
|
||||
|
||||
Example:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
const auto shape_4x2x4 = ck::make_tuple(4, ck::make_tuple(2, 4));
|
||||
const auto strides_s2x1x8 = ck::make_tuple(2, ck::make_tuple(1, 8));
|
||||
const auto layout = ck::wrapper::make_layout(shape_4x2x4, strides_s2x1x8);
|
||||
|
||||
std::array<ck::index_t, 32> data;
|
||||
auto tensor = ck::wrapper::make_tensor<ck::wrapper::MemoryTypeEnum::Generic>(&data[0], layout);
|
||||
|
||||
for(ck::index_t w = 0; w < size(tensor); w++) {
|
||||
tensor(w) = w;
|
||||
}
|
||||
|
||||
// slice() == slice(0, -1) (whole dimension)
|
||||
auto tensor_slice = tensor(ck::wrapper::slice(1, 3), ck::make_tuple(ck::wrapper::slice(), ck::wrapper::slice()));
|
||||
std::cout << "dims:2,(2,4) strides:2,(1,8)" << std::endl;
|
||||
for(ck::index_t h = 0; h < ck::wrapper::size<0>(tensor_slice); h++)
|
||||
{
|
||||
for(ck::index_t w = 0; w < ck::wrapper::size<1>(tensor_slice); w++)
|
||||
{
|
||||
std::cout << tensor_slice(h, w) << " ";
|
||||
}
|
||||
std::cout << std::endl;
|
||||
}
|
||||
|
||||
Output::
|
||||
|
||||
dims:2,(2,4) strides:2,(1,8)
|
||||
1 5 9 13 17 21 25 29
|
||||
2 6 10 14 18 22 26 30
|
||||
|
||||
|
||||
Tutorials:
|
||||
|
||||
* `GEMM tutorial <https://github.com/ROCm/composable_kernel/blob/develop/client_example/25_wrapper/README.md>`_
|
||||
|
||||
Advanced examples:
|
||||
|
||||
* `Image to column <https://github.com/ROCm/composable_kernel/blob/develop/client_example/25_wrapper/wrapper_img2col.cpp>`_
|
||||
* `Basic gemm <https://github.com/ROCm/composable_kernel/blob/develop/client_example/25_wrapper/wrapper_basic_gemm.cpp>`_
|
||||
* `Optimized gemm <https://github.com/ROCm/composable_kernel/blob/develop/client_example/25_wrapper/wrapper_optimized_gemm.cpp>`_
|
||||
|
||||
-------------------------------------
|
||||
Layout
|
||||
-------------------------------------
|
||||
|
||||
.. doxygenstruct:: Layout
|
||||
|
||||
-------------------------------------
|
||||
Layout helpers
|
||||
-------------------------------------
|
||||
|
||||
.. doxygenfile:: include/ck/wrapper/utils/layout_utils.hpp
|
||||
|
||||
-------------------------------------
|
||||
Tensor
|
||||
-------------------------------------
|
||||
|
||||
.. doxygenstruct:: Tensor
|
||||
|
||||
-------------------------------------
|
||||
Tensor helpers
|
||||
-------------------------------------
|
||||
|
||||
.. doxygenfile:: include/ck/wrapper/utils/tensor_utils.hpp
|
||||
|
||||
.. doxygenfile:: include/ck/wrapper/utils/tensor_partition.hpp
|
||||
|
||||
-------------------------------------
|
||||
Operations
|
||||
-------------------------------------
|
||||
|
||||
.. doxygenfile:: include/ck/wrapper/operations/copy.hpp
|
||||
.. doxygenfile:: include/ck/wrapper/operations/gemm.hpp
|
||||
39
docs/reference/Composable_Kernel_custom_types.rst
Normal file
39
docs/reference/Composable_Kernel_custom_types.rst
Normal file
@@ -0,0 +1,39 @@
|
||||
.. meta::
|
||||
:description: Composable Kernel supported custom types
|
||||
:keywords: composable kernel, custom, data types, support, CK, ROCm
|
||||
|
||||
******************************************************
|
||||
Composable Kernel custom data types
|
||||
******************************************************
|
||||
|
||||
Composable Kernel supports the use of custom types that provide a way to implement specialized numerical formats.
|
||||
|
||||
To use custom types, a C++ type that implements the necessary operations for tensor computations needs to be created. These should include:
|
||||
|
||||
* Constructors and initialization methods
|
||||
* Arithmetic operators if the type will be used in computational operations
|
||||
* Any conversion functions needed to interface with other parts of an application
|
||||
|
||||
For example, to create a complex half-precision type:
|
||||
|
||||
.. code:: cpp
|
||||
|
||||
struct complex_half_t
|
||||
{
|
||||
half_t real;
|
||||
half_t img;
|
||||
};
|
||||
|
||||
struct complex_half_t
|
||||
{
|
||||
using type = half_t;
|
||||
type real;
|
||||
type img;
|
||||
|
||||
complex_half_t() : real{type{}}, img{type{}} {}
|
||||
complex_half_t(type real_init, type img_init) : real{real_init}, img{img_init} {}
|
||||
};
|
||||
|
||||
Custom types can be particularly useful for specialized applications such as complex number arithmetic,
|
||||
custom quantization schemes, or domain-specific number representations.
|
||||
|
||||
69
docs/reference/Composable_Kernel_supported_scalar_types.rst
Normal file
69
docs/reference/Composable_Kernel_supported_scalar_types.rst
Normal file
@@ -0,0 +1,69 @@
|
||||
.. meta::
|
||||
:description: Composable Kernel supported scalar types
|
||||
:keywords: composable kernel, scalar, data types, support, CK, ROCm
|
||||
|
||||
***************************************************
|
||||
Composable Kernel supported scalar data types
|
||||
***************************************************
|
||||
|
||||
The Composable Kernel library provides support for the following scalar data types:
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
:widths: 25 15 60
|
||||
|
||||
* - Type
|
||||
- Bit Width
|
||||
- Description
|
||||
|
||||
* - ``double``
|
||||
- 64-bit
|
||||
- Standard IEEE 754 double precision floating point
|
||||
|
||||
* - ``float``
|
||||
- 32-bit
|
||||
- Standard IEEE 754 single precision floating point
|
||||
|
||||
* - ``int32_t``
|
||||
- 32-bit
|
||||
- Standard signed 32-bit integer
|
||||
|
||||
* - ``int8_t``
|
||||
- 8-bit
|
||||
- Standard signed 8-bit integer
|
||||
|
||||
* - ``uint8_t``
|
||||
- 8-bit
|
||||
- Standard unsigned 8-bit integer
|
||||
|
||||
* - ``bool``
|
||||
- 1-bit
|
||||
- Boolean type
|
||||
|
||||
* - ``ck::half_t``
|
||||
- 16-bit
|
||||
- IEEE 754 half precision floating point with 5 exponent bits, 10 mantissa bits, and 1 sign bit
|
||||
|
||||
* - ``ck::bhalf_t``
|
||||
- 16-bit
|
||||
- Brain floating point with 8 exponent bits, 7 mantissa bits, and 1 sign bit
|
||||
|
||||
* - ``ck::f8_t``
|
||||
- 8-bit
|
||||
- 8-bit floating point (E4M3 format) with 4 exponent bits, 3 mantissa bits, and 1 sign bit
|
||||
|
||||
* - ``ck::bf8_t``
|
||||
- 8-bit
|
||||
- 8-bit brain floating point (E5M2 format) with 5 exponent bits, 2 mantissa bits, and 1 sign bit
|
||||
|
||||
* - ``ck::f4_t``
|
||||
- 4-bit
|
||||
- 4-bit floating point format (E2M1 format) with 2 exponent bits, 1 mantissa bit, and 1 sign bit
|
||||
|
||||
* - ``ck::f6_t``
|
||||
- 6-bit
|
||||
- 6-bit floating point format (E2M3 format) with 2 exponent bits, 3 mantissa bits, and 1 sign bit
|
||||
|
||||
* - ``ck::bf6_t``
|
||||
- 6-bit
|
||||
- 6-bit brain floating point format (E3M2 format) with 3 exponent bits, 2 mantissa bits, and 1 sign bit
|
||||
16
docs/reference/Composable_Kernel_vector_utilities.rst
Normal file
16
docs/reference/Composable_Kernel_vector_utilities.rst
Normal file
@@ -0,0 +1,16 @@
|
||||
.. meta::
|
||||
:description: Composable Kernel supported precision types and custom type support
|
||||
:keywords: composable kernel, precision, data types, ROCm
|
||||
|
||||
******************************************************
|
||||
Composable Kernel vector template utilities
|
||||
******************************************************
|
||||
|
||||
Composable Kernel includes template utilities for creating vector types with customizable widths. These template utilities also flatten nested vector types into a single, wider vector, preventing the creation of vectors of vectors.
|
||||
|
||||
Vectors composed of supported scalar and custom types can be created with the ``ck::vector_type`` template.
|
||||
|
||||
For example, ``ck::vector_type<float, 4>`` creates a vector composed of four floats and ``ck::vector_type<ck::half_t, 8>`` creates a vector composed of eight half-precision scalars.
|
||||
|
||||
For vector operations to be valid, the underlying types must be either a :doc:`supported scalar type <Composable_Kernel_supported_scalar_types>` or :doc:`a custom type <Composable_Kernel_custom_types>` that implements the required operations.
|
||||
|
||||
7
docs/refs.bib
Normal file
7
docs/refs.bib
Normal file
@@ -0,0 +1,7 @@
|
||||
|
||||
@article{dao2022flashattention,
|
||||
title={Flashattention: Fast and memory-efficient exact attention with io-awareness},
|
||||
author={Dao, Tri and Fu, Daniel Y and Ermon, Stefano and Rudra, Atri and R{\'e}, Christopher},
|
||||
journal={arXiv preprint arXiv:2205.14135},
|
||||
year={2022}
|
||||
}
|
||||
45
docs/sphinx/_toc.yml.in
Normal file
45
docs/sphinx/_toc.yml.in
Normal file
@@ -0,0 +1,45 @@
|
||||
defaults:
|
||||
numbered: False
|
||||
root: index
|
||||
subtrees:
|
||||
|
||||
- caption: Install
|
||||
entries:
|
||||
- file: install/Composable-Kernel-prerequisites.rst
|
||||
title: Composable Kernel prerequisites
|
||||
- file: install/Composable-Kernel-install.rst
|
||||
title: Build and install Composable Kernel
|
||||
- file: install/Composable-Kernel-Docker.rst
|
||||
title: Composable Kernel Docker images
|
||||
|
||||
- caption: Conceptual
|
||||
entries:
|
||||
- file: conceptual/Composable-Kernel-structure.rst
|
||||
title: Composable Kernel structure
|
||||
- file: conceptual/Composable-Kernel-math.rst
|
||||
title: Composable Kernel mathematical basis
|
||||
|
||||
- caption: Tutorial
|
||||
entries:
|
||||
- file: tutorial/Composable-Kernel-examples.rst
|
||||
title: Composable Kernel examples
|
||||
|
||||
- caption: Reference
|
||||
entries:
|
||||
- file: reference/Composable_Kernel_supported_scalar_types.rst
|
||||
title: Composable Kernel scalar types
|
||||
- file: reference/Composable_Kernel_custom_types.rst
|
||||
title: Composable Kernel custom types
|
||||
- file: reference/Composable_Kernel_vector_utilities.rst
|
||||
title: Composable Kernel vector utilities
|
||||
- file: reference/Composable-Kernel-API-reference.rst
|
||||
title: Composable Kernel API reference
|
||||
- file: reference/Composable-Kernel-wrapper.rst
|
||||
title: Composable Kernel Wrapper
|
||||
|
||||
- caption: About
|
||||
entries:
|
||||
- file: Contributors_Guide.rst
|
||||
title: Contributing to Composable Kernel
|
||||
- file: license.rst
|
||||
title: License
|
||||
2
docs/sphinx/requirements.in
Normal file
2
docs/sphinx/requirements.in
Normal file
@@ -0,0 +1,2 @@
|
||||
rocm-docs-core[api_reference]==1.18.2
|
||||
sphinxcontrib-bibtex==2.6.3
|
||||
335
docs/sphinx/requirements.txt
Normal file
335
docs/sphinx/requirements.txt
Normal file
@@ -0,0 +1,335 @@
|
||||
#
|
||||
# This file is autogenerated by pip-compile with Python 3.10
|
||||
# by the following command:
|
||||
#
|
||||
# pip-compile requirements.in
|
||||
#
|
||||
accessible-pygments==0.0.5
|
||||
# via pydata-sphinx-theme
|
||||
alabaster==1.0.0
|
||||
# via sphinx
|
||||
asttokens==3.0.0
|
||||
# via stack-data
|
||||
attrs==25.3.0
|
||||
# via
|
||||
# jsonschema
|
||||
# jupyter-cache
|
||||
# referencing
|
||||
babel==2.17.0
|
||||
# via
|
||||
# pydata-sphinx-theme
|
||||
# sphinx
|
||||
beautifulsoup4==4.13.4
|
||||
# via pydata-sphinx-theme
|
||||
breathe==4.36.0
|
||||
# via rocm-docs-core
|
||||
certifi==2025.1.31
|
||||
# via requests
|
||||
cffi==1.17.1
|
||||
# via
|
||||
# cryptography
|
||||
# pynacl
|
||||
charset-normalizer==3.4.1
|
||||
# via requests
|
||||
click==8.1.8
|
||||
# via
|
||||
# click-log
|
||||
# doxysphinx
|
||||
# jupyter-cache
|
||||
# sphinx-external-toc
|
||||
click-log==0.4.0
|
||||
# via doxysphinx
|
||||
comm==0.2.2
|
||||
# via ipykernel
|
||||
contourpy==1.3.2
|
||||
# via matplotlib
|
||||
cryptography==44.0.2
|
||||
# via pyjwt
|
||||
cycler==0.12.1
|
||||
# via matplotlib
|
||||
debugpy==1.8.14
|
||||
# via ipykernel
|
||||
decorator==5.2.1
|
||||
# via ipython
|
||||
deprecated==1.2.18
|
||||
# via pygithub
|
||||
docutils==0.21.2
|
||||
# via
|
||||
# myst-parser
|
||||
# pybtex-docutils
|
||||
# pydata-sphinx-theme
|
||||
# sphinx
|
||||
# sphinxcontrib-bibtex
|
||||
doxysphinx==3.3.12
|
||||
# via rocm-docs-core
|
||||
exceptiongroup==1.2.2
|
||||
# via ipython
|
||||
executing==2.2.0
|
||||
# via stack-data
|
||||
fastjsonschema==2.21.1
|
||||
# via
|
||||
# nbformat
|
||||
# rocm-docs-core
|
||||
fonttools==4.57.0
|
||||
# via matplotlib
|
||||
gitdb==4.0.12
|
||||
# via gitpython
|
||||
gitpython==3.1.44
|
||||
# via rocm-docs-core
|
||||
greenlet==3.2.1
|
||||
# via sqlalchemy
|
||||
idna==3.10
|
||||
# via requests
|
||||
imagesize==1.4.1
|
||||
# via sphinx
|
||||
importlib-metadata==8.6.1
|
||||
# via
|
||||
# jupyter-cache
|
||||
# myst-nb
|
||||
ipykernel==6.29.5
|
||||
# via myst-nb
|
||||
ipython==8.35.0
|
||||
# via
|
||||
# ipykernel
|
||||
# myst-nb
|
||||
jedi==0.19.2
|
||||
# via ipython
|
||||
jinja2==3.1.6
|
||||
# via
|
||||
# myst-parser
|
||||
# sphinx
|
||||
jsonschema==4.23.0
|
||||
# via nbformat
|
||||
jsonschema-specifications==2024.10.1
|
||||
# via jsonschema
|
||||
jupyter-cache==1.0.1
|
||||
# via myst-nb
|
||||
jupyter-client==8.6.3
|
||||
# via
|
||||
# ipykernel
|
||||
# nbclient
|
||||
jupyter-core==5.7.2
|
||||
# via
|
||||
# ipykernel
|
||||
# jupyter-client
|
||||
# nbclient
|
||||
# nbformat
|
||||
kiwisolver==1.4.8
|
||||
# via matplotlib
|
||||
latexcodec==3.0.0
|
||||
# via pybtex
|
||||
libsass==0.22.0
|
||||
# via doxysphinx
|
||||
lxml==5.2.1
|
||||
# via doxysphinx
|
||||
markdown-it-py==3.0.0
|
||||
# via
|
||||
# mdit-py-plugins
|
||||
# myst-parser
|
||||
markupsafe==3.0.2
|
||||
# via jinja2
|
||||
matplotlib==3.10.1
|
||||
# via doxysphinx
|
||||
matplotlib-inline==0.1.7
|
||||
# via
|
||||
# ipykernel
|
||||
# ipython
|
||||
mdit-py-plugins==0.4.2
|
||||
# via myst-parser
|
||||
mdurl==0.1.2
|
||||
# via markdown-it-py
|
||||
mpire==2.10.2
|
||||
# via doxysphinx
|
||||
myst-nb==1.2.0
|
||||
# via rocm-docs-core
|
||||
myst-parser==4.0.1
|
||||
# via myst-nb
|
||||
nbclient==0.10.2
|
||||
# via
|
||||
# jupyter-cache
|
||||
# myst-nb
|
||||
nbformat==5.10.4
|
||||
# via
|
||||
# jupyter-cache
|
||||
# myst-nb
|
||||
# nbclient
|
||||
nest-asyncio==1.6.0
|
||||
# via ipykernel
|
||||
numpy==1.26.4
|
||||
# via
|
||||
# contourpy
|
||||
# doxysphinx
|
||||
# matplotlib
|
||||
packaging==25.0
|
||||
# via
|
||||
# ipykernel
|
||||
# matplotlib
|
||||
# pydata-sphinx-theme
|
||||
# sphinx
|
||||
parso==0.8.4
|
||||
# via jedi
|
||||
pexpect==4.9.0
|
||||
# via ipython
|
||||
pillow==11.2.1
|
||||
# via matplotlib
|
||||
platformdirs==4.3.7
|
||||
# via jupyter-core
|
||||
prompt-toolkit==3.0.51
|
||||
# via ipython
|
||||
psutil==7.0.0
|
||||
# via ipykernel
|
||||
ptyprocess==0.7.0
|
||||
# via pexpect
|
||||
pure-eval==0.2.3
|
||||
# via stack-data
|
||||
pybtex==0.24.0
|
||||
# via
|
||||
# pybtex-docutils
|
||||
# sphinxcontrib-bibtex
|
||||
pybtex-docutils==1.0.3
|
||||
# via sphinxcontrib-bibtex
|
||||
pycparser==2.22
|
||||
# via cffi
|
||||
pydata-sphinx-theme==0.15.4
|
||||
# via
|
||||
# rocm-docs-core
|
||||
# sphinx-book-theme
|
||||
pygithub==2.6.1
|
||||
# via rocm-docs-core
|
||||
pygments==2.19.1
|
||||
# via
|
||||
# accessible-pygments
|
||||
# ipython
|
||||
# mpire
|
||||
# pydata-sphinx-theme
|
||||
# sphinx
|
||||
pyjson5==1.6.8
|
||||
# via doxysphinx
|
||||
pyjwt[crypto]==2.10.1
|
||||
# via pygithub
|
||||
pynacl==1.5.0
|
||||
# via pygithub
|
||||
pyparsing==3.2.3
|
||||
# via
|
||||
# doxysphinx
|
||||
# matplotlib
|
||||
python-dateutil==2.9.0.post0
|
||||
# via
|
||||
# jupyter-client
|
||||
# matplotlib
|
||||
pyyaml==6.0.2
|
||||
# via
|
||||
# jupyter-cache
|
||||
# myst-nb
|
||||
# myst-parser
|
||||
# pybtex
|
||||
# rocm-docs-core
|
||||
# sphinx-external-toc
|
||||
pyzmq==26.4.0
|
||||
# via
|
||||
# ipykernel
|
||||
# jupyter-client
|
||||
referencing==0.36.2
|
||||
# via
|
||||
# jsonschema
|
||||
# jsonschema-specifications
|
||||
requests==2.32.3
|
||||
# via
|
||||
# pygithub
|
||||
# sphinx
|
||||
rocm-docs-core[api-reference]==1.18.2
|
||||
# via -r requirements.in
|
||||
rpds-py==0.24.0
|
||||
# via
|
||||
# jsonschema
|
||||
# referencing
|
||||
six==1.17.0
|
||||
# via
|
||||
# pybtex
|
||||
# python-dateutil
|
||||
smmap==5.0.2
|
||||
# via gitdb
|
||||
snowballstemmer==2.2.0
|
||||
# via sphinx
|
||||
soupsieve==2.7
|
||||
# via beautifulsoup4
|
||||
sphinx==8.1.3
|
||||
# via
|
||||
# breathe
|
||||
# myst-nb
|
||||
# myst-parser
|
||||
# pydata-sphinx-theme
|
||||
# rocm-docs-core
|
||||
# sphinx-book-theme
|
||||
# sphinx-copybutton
|
||||
# sphinx-design
|
||||
# sphinx-external-toc
|
||||
# sphinx-notfound-page
|
||||
# sphinxcontrib-bibtex
|
||||
sphinx-book-theme==1.1.4
|
||||
# via rocm-docs-core
|
||||
sphinx-copybutton==0.5.2
|
||||
# via rocm-docs-core
|
||||
sphinx-design==0.6.1
|
||||
# via rocm-docs-core
|
||||
sphinx-external-toc==1.0.1
|
||||
# via rocm-docs-core
|
||||
sphinx-notfound-page==1.1.0
|
||||
# via rocm-docs-core
|
||||
sphinxcontrib-applehelp==2.0.0
|
||||
# via sphinx
|
||||
sphinxcontrib-bibtex==2.6.3
|
||||
# via -r requirements.in
|
||||
sphinxcontrib-devhelp==2.0.0
|
||||
# via sphinx
|
||||
sphinxcontrib-htmlhelp==2.1.0
|
||||
# via sphinx
|
||||
sphinxcontrib-jsmath==1.0.1
|
||||
# via sphinx
|
||||
sphinxcontrib-qthelp==2.0.0
|
||||
# via sphinx
|
||||
sphinxcontrib-serializinghtml==2.0.0
|
||||
# via sphinx
|
||||
sqlalchemy==2.0.40
|
||||
# via jupyter-cache
|
||||
stack-data==0.6.3
|
||||
# via ipython
|
||||
tabulate==0.9.0
|
||||
# via jupyter-cache
|
||||
tomli==2.2.1
|
||||
# via sphinx
|
||||
tornado==6.4.2
|
||||
# via
|
||||
# ipykernel
|
||||
# jupyter-client
|
||||
tqdm==4.67.1
|
||||
# via mpire
|
||||
traitlets==5.14.3
|
||||
# via
|
||||
# comm
|
||||
# ipykernel
|
||||
# ipython
|
||||
# jupyter-client
|
||||
# jupyter-core
|
||||
# matplotlib-inline
|
||||
# nbclient
|
||||
# nbformat
|
||||
typing-extensions==4.13.2
|
||||
# via
|
||||
# beautifulsoup4
|
||||
# ipython
|
||||
# myst-nb
|
||||
# pydata-sphinx-theme
|
||||
# pygithub
|
||||
# referencing
|
||||
# sqlalchemy
|
||||
urllib3==2.4.0
|
||||
# via
|
||||
# pygithub
|
||||
# requests
|
||||
wcwidth==0.2.13
|
||||
# via prompt-toolkit
|
||||
wrapt==1.17.2
|
||||
# via deprecated
|
||||
zipp==3.21.0
|
||||
# via importlib-metadata
|
||||
40
docs/tutorial/Composable-Kernel-examples.rst
Normal file
40
docs/tutorial/Composable-Kernel-examples.rst
Normal file
@@ -0,0 +1,40 @@
|
||||
.. meta::
|
||||
:description: Composable Kernel examples and tests
|
||||
:keywords: composable kernel, CK, ROCm, API, examples, tests
|
||||
|
||||
********************************************************************
|
||||
Composable Kernel examples and tests
|
||||
********************************************************************
|
||||
|
||||
After :doc:`building and installing Composable Kernel <../install/Composable-Kernel-install>`, the examples and tests will be moved to ``/opt/rocm/bin/``.
|
||||
|
||||
All tests have the prefix ``test`` and all examples have the prefix ``example``.
|
||||
|
||||
Use ``ctest`` with no arguments to run all examples and tests, or use ``ctest -R`` to run a single test. For example:
|
||||
|
||||
.. code:: shell
|
||||
|
||||
ctest -R test_gemm_fp16
|
||||
|
||||
Examples can be run individually as well. For example:
|
||||
|
||||
.. code:: shell
|
||||
|
||||
./bin/example_gemm_xdl_fp16 1 1 1
|
||||
|
||||
For instructions on how to run individual examples and tests, see their README files in the |example|_ and |test|_ GitHub folders.
|
||||
|
||||
To run smoke tests, use ``make smoke``.
|
||||
|
||||
To run regression tests, use ``make regression``.
|
||||
|
||||
In general, tests that run for under thirty seconds are included in the smoke tests and tests that run for over thirty seconds are included in the regression tests.
|
||||
|
||||
.. |example| replace:: ``example``
|
||||
.. _example: https://github.com/ROCm/composable_kernel/tree/develop/example
|
||||
|
||||
.. |client_example| replace:: ``client_example``
|
||||
.. _client_example: https://github.com/ROCm/composable_kernel/tree/develop/client_example
|
||||
|
||||
.. |test| replace:: ``test``
|
||||
.. _test: https://github.com/ROCm/composable_kernel/tree/develop/test
|
||||
Reference in New Issue
Block a user