Reorganize project folders (#6)

This commit is contained in:
Joseph Macaranas
2025-04-30 13:46:39 -04:00
committed by GitHub
commit 1eb2e57380
3952 changed files with 654944 additions and 0 deletions

105
docs/Contributors_Guide.rst Normal file
View File

@@ -0,0 +1,105 @@
.. meta::
:description: Composable Kernel documentation and API reference library
:keywords: composable kernel, CK, ROCm, API, documentation
.. _contributing-to:
********************************************************************
Contributor's guide
********************************************************************
This chapter explains the rules for contributing to the Composable Kernel project, and how to contribute.
Getting started
===============
#. **Documentation:** Before contributing to the library, familiarize yourself with the
`Composable Kernel User Guide <https://rocm.docs.amd.com/projects/composable_kernel/en/latest/>`_.
It provides insight into the core concepts, environment configuration, and steps to obtain or
build the library. You can also find some of this information in the
`README file <https://github.com/ROCm/composable_kernel/blob/develop/README.md>`_
on the project's GitHub page.
#. **Additional reading:** The blog post `AMD Composable Kernel library: efficient fused kernels for AI apps with just a few lines of code <https://community.amd.com/t5/instinct-accelerators/amd-composable-kernel-library-efficient-fused-kernels-for-ai/ba-p/553224>`_ provides a deeper understanding of the CK library and showcases its performance capabilities.
<https://community.amd.com/t5/instinct-accelerators/amd-composable-kernel-library-efficient-fused-kernels-for-ai/ba-p/553224>`_
from the AMD Community portal. It offers a deeper understanding of the library's objectives and showcases its performance capabilities.
#. **General information:** For broader information about AMD products, consider exploring the
`AMD Developer Central portal <https://www.amd.com/en/developer.html>`_.
How to contribute
===================
You can make an impact by reporting issues or proposing code enhancements through pull requests.
Reporting issues
----------------
Use `Github issues <https://github.com/ROCm/composable_kernel/issues>`_
to track public bugs and enhancement requests.
If you encounter an issue with the library, please check if the problem has already been
reported by searching existing issues on GitHub. If your issue seems unique, please submit a new
issue. All reported issues must include:
* A comprehensive description of the problem, including:
* What did you observe?
* Why do you think it is a bug (if it seems like one)?
* What did you expect to happen? What would indicate the resolution of the problem?
* Are there any known workarounds?
* Your configuration details, including:
* Which GPU are you using?
* Which OS version are you on?
* Which ROCm version are you using?
* Are you using a Docker image? If so, which one?
* Steps to reproduce the issue, including:
* What actions trigger the issue? What are the reproduction steps?
* If you build the library from scratch, what CMake command did you use?
* How frequently does this issue happen? Does it reproduce every time? Or is it a sporadic issue?
Before submitting any issue, ensure you have addressed all relevant questions from the checklist.
Creating Pull Requests
----------------------
You can submit `Pull Requests (PR) on GitHub
<https://github.com/ROCm/composable_kernel/pulls>`_.
All contributors are required to develop their changes on a separate branch and then create a
pull request to merge their changes into the `develop` branch, which is the default
development branch in the Composable Kernel project. All external contributors must use their own
forks of the project to develop their changes.
When submitting a Pull Request you should:
* Describe the change providing information about the motivation for the change and a general
description of all code modifications.
* Verify and test the change:
* Run any relevant existing tests.
* Write new tests if added functionality is not covered by current tests.
* Ensure your changes align with the coding style defined in the ``.clang-format`` file located in
the project's root directory. We leverage `pre-commit` to run `clang-format` automatically. We
highly recommend contributors utilize this method to maintain consistent code formatting.
Instructions on setting up `pre-commit` can be found in the project's
`README file <https://github.com/ROCm/composable_kernel/blob/develop/README.md>`_
* Link your PR to any related issues:
* If there is an issue that is resolved by your change, please provide a link to the issue in
the description of your pull request.
* For larger contributions, structure your change into a sequence of smaller, focused commits, each
addressing a particular aspect or fix.
Following the above guidelines ensures a seamless review process and faster assistance from our
end.
Thank you for your commitment to enhancing the Composable Kernel project!

View File

@@ -0,0 +1,77 @@
.. meta::
:description: Composable Kernel mathematical basis
:keywords: composable kernel, CK, ROCm, API, mathematics, algorithm
.. _supported-primitives:
********************************************************************
Composable Kernel mathematical basis
********************************************************************
This is an introduction to the math which underpins the algorithms implemented in Composable Kernel.
For vectors :math:`x^{(1)}, x^{(2)}, \ldots, x^{(T)}` of size :math:`B` you can decompose the
softmax of concatenated :math:`x = [ x^{(1)}\ | \ \ldots \ | \ x^{(T)} ]` as,
.. math::
:nowrap:
\begin{align}
m(x) & = m( [ x^{(1)}\ | \ \ldots \ | \ x^{(T)} ] ) = \max( m(x^{(1)}),\ldots, m(x^{(T)}) ) \\
f(x) & = [\exp( m(x^{(1)}) - m(x) ) f( x^{(1)} )\ | \ \ldots \ | \ \exp( m(x^{(T)}) - m(x) ) f( x^{(T)} )] \\
z(x) & = \exp( m(x^{(1)}) - m(x) )\ z(x^{(1)}) + \ldots + \exp( m(x^{(T)}) - m(x) )\ z(x^{(1)}) \\
\operatorname{softmax}(x) &= f(x)\ / \ z(x)
\end{align}
where :math:`f(x^{(j)}) = \exp( x^{(j)} - m(x^{(j)}) )` is of size :math:`B` and
:math:`z(x^{(j)}) = f(x_1^{(j)})+ \ldots+ f(x_B^{(j)})` is a scalar.
For a matrix :math:`X` composed of :math:`T_r \times T_c` tiles, :math:`X_{ij}`, of size
:math:`B_r \times B_c` you can compute the row-wise softmax as follows.
For :math:`j` from :math:`1` to :math:`T_c`, and :math:`i` from :math:`1` to :math:`T_r` calculate,
.. math::
:nowrap:
\begin{align}
\tilde{m}_{ij} &= \operatorname{rowmax}( X_{ij} ) \\
\tilde{P}_{ij} &= \exp(X_{ij} - \tilde{m}_{ij} ) \\
\tilde{z}_{ij} &= \operatorname{rowsum}( P_{ij} ) \\
\end{align}
If :math:`j=1`, initialize running max, running sum, and the first column block of the output,
.. math::
:nowrap:
\begin{align}
m_i &= \tilde{m}_{i1} \\
z_i &= \tilde{z}_{i1} \\
\tilde{Y}_{i1} &= \diag(\tilde{z}_{ij})^{-1} \tilde{P}_{i1}
\end{align}
Else if :math:`j>1`,
1. Update running max, running sum and column blocks :math:`k=1` to :math:`k=j-1`
.. math::
:nowrap:
\begin{align}
m^{new}_i &= \max(m_i, \tilde{m}_{ij} ) \\
z^{new}_i &= \exp(m_i - m^{new}_i)\ z_i + \exp( \tilde{m}_{ij} - m^{new}_i )\ \tilde{z}_{ij} \\
Y_{ik} &= \diag(z^{new}_{i})^{-1} \diag(z_{i}) \exp(m_i - m^{new}_i)\ Y_{ik}
\end{align}
2. Initialize column block :math:`j` of output and reset running max and running sum variables:
.. math::
:nowrap:
\begin{align}
\tilde{Y}_{ij} &= \diag(z^{new}_{i})^{-1} \exp(\tilde{m}_{ij} - m^{new}_i ) \tilde{P}_{ij} \\
z_i &= z^{new}_i \\
m_i &= m^{new}_i \\
\end{align}

View File

@@ -0,0 +1,29 @@
.. meta::
:description: Composable Kernel structure
:keywords: composable kernel, CK, ROCm, API, structure
.. _what-is-ck:
********************************************************************
Composable Kernel structure
********************************************************************
The Composable Kernel library uses a tile-based programming model and tensor coordinate transformation to achieve performance portability and code maintainability. Tensor coordinate transformation is a complexity reduction technique for complex machine learning operators.
.. image:: ../data/ck_component.png
:alt: CK Components
The Composable Kernel library consists of four layers:
* a templated tile operator layer
* a templated kernel and invoker layer
* an instantiated kernel and invoker layer
* a client API layer.
A wrapper component is included to simplify tensor transform operations.
.. image:: ../data/ck_layer.png
:alt: CK Layers

50
docs/conf.py Normal file
View File

@@ -0,0 +1,50 @@
# Configuration file for the Sphinx documentation builder.
#
# This file only contains a selection of the most common options. For a full
# list see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html
import re
from rocm_docs import ROCmDocs
html_theme_options = {"flavor": "list"}
with open('../CMakeLists.txt', encoding='utf-8') as f:
match = re.search(r'.*set\(version ([0-9.]+)[^0-9.]+', f.read())
if not match:
raise ValueError("VERSION not found!")
version_number = match[1]
left_nav_title = f"Composable Kernel {version_number} Documentation"
# for PDF output on Read the Docs
project = "Composable Kernel Documentation"
author = "Advanced Micro Devices, Inc."
copyright = "Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved."
version = version_number
release = version_number
external_toc_path = "./sphinx/_toc.yml"
docs_core = ROCmDocs(left_nav_title)
docs_core.run_doxygen(doxygen_root="doxygen", doxygen_path="doxygen/xml")
docs_core.enable_api_reference()
docs_core.setup()
external_projects_current_project = "composable_kernel"
mathjax3_config = {
'tex': {
'macros': {
'diag': '\\operatorname{diag}',
}
}
}
for sphinx_var in ROCmDocs.SPHINX_VARS:
globals()[sphinx_var] = getattr(docs_core, sphinx_var)
extensions += ['sphinxcontrib.bibtex']
bibtex_bibfiles = ['refs.bib']
cpp_id_attributes = ["__global__", "__device__", "__host__"]

BIN
docs/data/ck_component.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 552 KiB

BIN
docs/data/ck_layer.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 536 KiB

2779
docs/doxygen/Doxyfile Normal file

File diff suppressed because it is too large Load Diff

43
docs/index.rst Normal file
View File

@@ -0,0 +1,43 @@
.. meta::
:description: Composable Kernel documentation and API reference library
:keywords: composable kernel, CK, ROCm, API, documentation
.. _composable-kernel:
********************************************************************
Composable Kernel User Guide
********************************************************************
The Composable Kernel library provides a programming model for writing performance critical kernels for machine learning workloads across multiple architectures including GPUs and CPUs, through general purpose kernel languages such as `HIP C++ <https://rocm.docs.amd.com/projects/HIP/en/latest/index.html>`_.
The Composable Kernel repository is located at `https://github.com/ROCm/composable_kernel <https://github.com/ROCm/composable_kernel>`_.
.. grid:: 2
:gutter: 3
.. grid-item-card:: Install
* :doc:`Composable Kernel prerequisites <./install/Composable-Kernel-prerequisites>`
* :doc:`Build and install Composable Kernel <./install/Composable-Kernel-install>`
* :doc:`Build and install Composable Kernel on a Docker image <./install/Composable-Kernel-Docker>`
.. grid-item-card:: Conceptual
* :doc:`Composable Kernel structure <./conceptual/Composable-Kernel-structure>`
* :doc:`Composable Kernel mathematical basis <./conceptual/Composable-Kernel-math>`
.. grid-item-card:: Tutorials
* :doc:`Composable Kernel examples and tests <./tutorial/Composable-Kernel-examples>`
.. grid-item-card:: Reference
* :doc:`Composable Kernel supported scalar types <./reference/Composable_Kernel_supported_scalar_types>`
* :doc:`Composable Kernel custom types <./reference/Composable_Kernel_custom_types>`
* :doc:`Composable Kernel vector utilities <./reference/Composable_Kernel_vector_utilities>`
* :ref:`api-reference`
* :ref:`wrapper`
To contribute to the documentation refer to `Contributing to ROCm <https://rocm.docs.amd.com/en/latest/contribute/contributing.html>`_.
You can find licensing information on the `Licensing <https://rocm.docs.amd.com/en/latest/about/license.html>`_ page.

View File

@@ -0,0 +1,16 @@
.. meta::
:description: Composable Kernel docker files
:keywords: composable kernel, CK, ROCm, API, docker
.. _docker-hub:
********************************************************************
Composable Kernel Docker containers
********************************************************************
Docker images that include all the required prerequisites for building Composable Kernel are available on `Docker Hub <https://hub.docker.com/r/rocm/composable_kernel/tags>`_.
The images also contain `ROCm <https://rocm.docs.amd.com/en/latest/index.html>`_, `CMake <https://cmake.org/getting-started/>`_, and the `ROCm LLVM compiler infrastructure <https://rocm.docs.amd.com/projects/llvm-project/en/latest/index.html>`_.
Composable Kernel Docker images are named according to their operating system and ROCm version. For example, a Docker image named ``ck_ub22.04_rocm6.3`` would correspond to an Ubuntu 22.04 image with ROCm 6.3.

View File

@@ -0,0 +1,72 @@
.. meta::
:description: Composable Kernel build and install
:keywords: composable kernel, CK, ROCm, API, documentation, install
******************************************************
Building and installing Composable Kernel with CMake
******************************************************
Before you begin, clone the `Composable Kernel GitHub repository <https://github.com/ROCm/composable_kernel.git>`_ and create a ``build`` directory in its root:
.. code:: shell
git clone https://github.com/ROCm/composable_kernel.git
cd composable_kernel
mkdir build
Change directory to the ``build`` directory and generate the makefile using the ``cmake`` command. Two build options are required:
* ``CMAKE_PREFIX_PATH``: The ROCm installation path. ROCm is installed in ``/opt/rocm`` by default.
* ``CMAKE_CXX_COMPILER``: The path to the Clang compiler. Clang is found at ``/opt/rocm/llvm/bin/clang++`` by default.
.. code:: shell
cd build
cmake ../. -D CMAKE_PREFIX_PATH="/opt/rocm" -D CMAKE_CXX_COMPILER="/opt/rocm/llvm/bin/clang++" [-D<OPTION1=VALUE1> [-D<OPTION2=VALUE2>] ...]
Other build options are:
* ``DISABLE_DL_KERNELS``: Set this to "ON" to not build deep learning (DL) and data parallel primitive (DPP) instances.
.. note::
DL and DPP instances are useful on architectures that don't support XDL or WMMA.
* ``CK_USE_FP8_ON_UNSUPPORTED_ARCH``: Set to ``ON`` to build FP8 data type instances on gfx90a without native FP8 support.
* ``GPU_TARGETS``: Target architectures. Target architectures in this list must all be different versions of the same architectures. Enclose the list of targets in quotation marks. Separate multiple targets with semicolons (``;``). For example, ``cmake -D GPU_TARGETS="gfx908;gfx90a"``. This option is required to build tests and examples.
* ``GPU_ARCHS``: Target architectures. Target architectures in this list are not limited to different versions of the same architectures. Enclose the list of targets in quotation marks. Separate multiple targets with semicolons (``;``). For example, ``cmake -D GPU_TARGETS="gfx908;gfx1100"``.
* ``CMAKE_BUILD_TYPE``: The build type. Can be ``None``, ``Release``, ``Debug``, ``RelWithDebInfo``, or ``MinSizeRel``. CMake will use ``Release`` by default.
.. Note::
If neither ``GPU_TARGETS`` nor ``GPU_ARCHS`` is specified, Composable Kernel will be built for all targets supported by the compiler.
Build Composable Kernel using the generated makefile. This will build the library, the examples, and the tests, and save them to ``bin``.
.. code:: shell
make -j20
The ``-j`` option speeds up the build by using multiple threads in parallel. For example, ``-j20`` uses twenty threads in parallel. On average, each thread will use 2GB of memory. Make sure that the number of threads you use doesn't exceed the available memory in your system.
Using ``-j`` alone will launch an unlimited number of threads and is not recommended.
Install the Composable Kernel library:
.. code:: shell
make install
After running ``make install``, the Composable Kernel files will be saved to the following locations:
* Library files: ``/opt/rocm/lib/``
* Header files: ``/opt/rocm/include/ck/`` and ``/opt/rocm/include/ck_tile/``
* Examples, tests, and ckProfiler: ``/opt/rocm/bin/``
For information about ckProfiler, see `the ckProfiler readme file <https://github.com/ROCm/composable_kernel/blob/develop/profiler/README.md>`_.
For information about running the examples and tests, see :doc:`Composable Kernel examples and tests <../tutorial/Composable-Kernel-examples>`.

View File

@@ -0,0 +1,32 @@
.. meta::
:description: Composable Kernel prerequisites
:keywords: composable kernel, CK, ROCm, API, documentation, prerequisites
******************************************************
Composable Kernel prerequisites
******************************************************
Docker images that include all the required prerequisites for building Composable Kernel are available on `Docker Hub <https://hub.docker.com/r/rocm/composable_kernel/tags>`_.
The following prerequisites are required to build and install Composable Kernel:
* cmake
* hip-rocclr
* iputils-ping
* jq
* libelf-dev
* libncurses5-dev
* libnuma-dev
* libpthread-stubs0-dev
* llvm-amdgpu
* mpich
* net-tools
* python3
* python3-dev
* python3-pip
* redis
* rocm-llvm-dev
* zlib1g-dev
* libzstd-dev
* openssh-server
* clang-format-12

11
docs/license.rst Normal file
View File

@@ -0,0 +1,11 @@
.. meta::
:description: Composable Kernel documentation and API reference library
:keywords: composable kernel, CK, ROCm, API, documentation
.. _license:
********************************************************************
License
********************************************************************
.. include:: ../LICENSE

View File

@@ -0,0 +1,42 @@
.. meta::
:description: Composable Kernel documentation and API reference library
:keywords: composable kernel, CK, ROCm, API, documentation
.. _api-reference:
********************************************************************
Composable Kernel API reference guide
********************************************************************
This document contains details of the APIs for the Composable Kernel library and introduces some of the key design principles that are used to write new classes that extend the functionality of the Composable Kernel library.
=================
DeviceMem
=================
.. doxygenstruct:: DeviceMem
=============================
Kernels For Flashattention
=============================
The Flashattention algorithm is defined in :cite:t:`dao2022flashattention`. This section lists
the classes that are used in the CK GPU implementation of Flashattention.
**Gridwise classes**
.. doxygenstruct:: ck::GridwiseBatchedGemmSoftmaxGemm_Xdl_CShuffle
**Blockwise classes**
.. doxygenstruct:: ck::ThreadGroupTensorSliceTransfer_v4r1
.. doxygenstruct:: ck::BlockwiseGemmXdlops_v2
.. doxygenstruct:: ck::BlockwiseSoftmax
**Threadwise classes**
.. doxygenstruct:: ck::ThreadwiseTensorSliceTransfer_StaticToStatic
.. bibliography::

View File

@@ -0,0 +1,89 @@
.. meta::
:description: Composable Kernel wrapper
:keywords: composable kernel, CK, ROCm, API, wrapper
.. _wrapper:
********************************************************************
Composable Kernel wrapper
********************************************************************
The Composable Kernel library provides a lightweight wrapper to simplify the more complex operations.
Example:
.. code-block:: c
const auto shape_4x2x4 = ck::make_tuple(4, ck::make_tuple(2, 4));
const auto strides_s2x1x8 = ck::make_tuple(2, ck::make_tuple(1, 8));
const auto layout = ck::wrapper::make_layout(shape_4x2x4, strides_s2x1x8);
std::array<ck::index_t, 32> data;
auto tensor = ck::wrapper::make_tensor<ck::wrapper::MemoryTypeEnum::Generic>(&data[0], layout);
for(ck::index_t w = 0; w < size(tensor); w++) {
tensor(w) = w;
}
// slice() == slice(0, -1) (whole dimension)
auto tensor_slice = tensor(ck::wrapper::slice(1, 3), ck::make_tuple(ck::wrapper::slice(), ck::wrapper::slice()));
std::cout << "dims:2,(2,4) strides:2,(1,8)" << std::endl;
for(ck::index_t h = 0; h < ck::wrapper::size<0>(tensor_slice); h++)
{
for(ck::index_t w = 0; w < ck::wrapper::size<1>(tensor_slice); w++)
{
std::cout << tensor_slice(h, w) << " ";
}
std::cout << std::endl;
}
Output::
dims:2,(2,4) strides:2,(1,8)
1 5 9 13 17 21 25 29
2 6 10 14 18 22 26 30
Tutorials:
* `GEMM tutorial <https://github.com/ROCm/composable_kernel/blob/develop/client_example/25_wrapper/README.md>`_
Advanced examples:
* `Image to column <https://github.com/ROCm/composable_kernel/blob/develop/client_example/25_wrapper/wrapper_img2col.cpp>`_
* `Basic gemm <https://github.com/ROCm/composable_kernel/blob/develop/client_example/25_wrapper/wrapper_basic_gemm.cpp>`_
* `Optimized gemm <https://github.com/ROCm/composable_kernel/blob/develop/client_example/25_wrapper/wrapper_optimized_gemm.cpp>`_
-------------------------------------
Layout
-------------------------------------
.. doxygenstruct:: Layout
-------------------------------------
Layout helpers
-------------------------------------
.. doxygenfile:: include/ck/wrapper/utils/layout_utils.hpp
-------------------------------------
Tensor
-------------------------------------
.. doxygenstruct:: Tensor
-------------------------------------
Tensor helpers
-------------------------------------
.. doxygenfile:: include/ck/wrapper/utils/tensor_utils.hpp
.. doxygenfile:: include/ck/wrapper/utils/tensor_partition.hpp
-------------------------------------
Operations
-------------------------------------
.. doxygenfile:: include/ck/wrapper/operations/copy.hpp
.. doxygenfile:: include/ck/wrapper/operations/gemm.hpp

View File

@@ -0,0 +1,39 @@
.. meta::
:description: Composable Kernel supported custom types
:keywords: composable kernel, custom, data types, support, CK, ROCm
******************************************************
Composable Kernel custom data types
******************************************************
Composable Kernel supports the use of custom types that provide a way to implement specialized numerical formats.
To use custom types, a C++ type that implements the necessary operations for tensor computations needs to be created. These should include:
* Constructors and initialization methods
* Arithmetic operators if the type will be used in computational operations
* Any conversion functions needed to interface with other parts of an application
For example, to create a complex half-precision type:
.. code:: cpp
struct complex_half_t
{
half_t real;
half_t img;
};
struct complex_half_t
{
using type = half_t;
type real;
type img;
complex_half_t() : real{type{}}, img{type{}} {}
complex_half_t(type real_init, type img_init) : real{real_init}, img{img_init} {}
};
Custom types can be particularly useful for specialized applications such as complex number arithmetic,
custom quantization schemes, or domain-specific number representations.

View File

@@ -0,0 +1,69 @@
.. meta::
:description: Composable Kernel supported scalar types
:keywords: composable kernel, scalar, data types, support, CK, ROCm
***************************************************
Composable Kernel supported scalar data types
***************************************************
The Composable Kernel library provides support for the following scalar data types:
.. list-table::
:header-rows: 1
:widths: 25 15 60
* - Type
- Bit Width
- Description
* - ``double``
- 64-bit
- Standard IEEE 754 double precision floating point
* - ``float``
- 32-bit
- Standard IEEE 754 single precision floating point
* - ``int32_t``
- 32-bit
- Standard signed 32-bit integer
* - ``int8_t``
- 8-bit
- Standard signed 8-bit integer
* - ``uint8_t``
- 8-bit
- Standard unsigned 8-bit integer
* - ``bool``
- 1-bit
- Boolean type
* - ``ck::half_t``
- 16-bit
- IEEE 754 half precision floating point with 5 exponent bits, 10 mantissa bits, and 1 sign bit
* - ``ck::bhalf_t``
- 16-bit
- Brain floating point with 8 exponent bits, 7 mantissa bits, and 1 sign bit
* - ``ck::f8_t``
- 8-bit
- 8-bit floating point (E4M3 format) with 4 exponent bits, 3 mantissa bits, and 1 sign bit
* - ``ck::bf8_t``
- 8-bit
- 8-bit brain floating point (E5M2 format) with 5 exponent bits, 2 mantissa bits, and 1 sign bit
* - ``ck::f4_t``
- 4-bit
- 4-bit floating point format (E2M1 format) with 2 exponent bits, 1 mantissa bit, and 1 sign bit
* - ``ck::f6_t``
- 6-bit
- 6-bit floating point format (E2M3 format) with 2 exponent bits, 3 mantissa bits, and 1 sign bit
* - ``ck::bf6_t``
- 6-bit
- 6-bit brain floating point format (E3M2 format) with 3 exponent bits, 2 mantissa bits, and 1 sign bit

View File

@@ -0,0 +1,16 @@
.. meta::
:description: Composable Kernel supported precision types and custom type support
:keywords: composable kernel, precision, data types, ROCm
******************************************************
Composable Kernel vector template utilities
******************************************************
Composable Kernel includes template utilities for creating vector types with customizable widths. These template utilities also flatten nested vector types into a single, wider vector, preventing the creation of vectors of vectors.
Vectors composed of supported scalar and custom types can be created with the ``ck::vector_type`` template.
For example, ``ck::vector_type<float, 4>`` creates a vector composed of four floats and ``ck::vector_type<ck::half_t, 8>`` creates a vector composed of eight half-precision scalars.
For vector operations to be valid, the underlying types must be either a :doc:`supported scalar type <Composable_Kernel_supported_scalar_types>` or :doc:`a custom type <Composable_Kernel_custom_types>` that implements the required operations.

7
docs/refs.bib Normal file
View File

@@ -0,0 +1,7 @@
@article{dao2022flashattention,
title={Flashattention: Fast and memory-efficient exact attention with io-awareness},
author={Dao, Tri and Fu, Daniel Y and Ermon, Stefano and Rudra, Atri and R{\'e}, Christopher},
journal={arXiv preprint arXiv:2205.14135},
year={2022}
}

45
docs/sphinx/_toc.yml.in Normal file
View File

@@ -0,0 +1,45 @@
defaults:
numbered: False
root: index
subtrees:
- caption: Install
entries:
- file: install/Composable-Kernel-prerequisites.rst
title: Composable Kernel prerequisites
- file: install/Composable-Kernel-install.rst
title: Build and install Composable Kernel
- file: install/Composable-Kernel-Docker.rst
title: Composable Kernel Docker images
- caption: Conceptual
entries:
- file: conceptual/Composable-Kernel-structure.rst
title: Composable Kernel structure
- file: conceptual/Composable-Kernel-math.rst
title: Composable Kernel mathematical basis
- caption: Tutorial
entries:
- file: tutorial/Composable-Kernel-examples.rst
title: Composable Kernel examples
- caption: Reference
entries:
- file: reference/Composable_Kernel_supported_scalar_types.rst
title: Composable Kernel scalar types
- file: reference/Composable_Kernel_custom_types.rst
title: Composable Kernel custom types
- file: reference/Composable_Kernel_vector_utilities.rst
title: Composable Kernel vector utilities
- file: reference/Composable-Kernel-API-reference.rst
title: Composable Kernel API reference
- file: reference/Composable-Kernel-wrapper.rst
title: Composable Kernel Wrapper
- caption: About
entries:
- file: Contributors_Guide.rst
title: Contributing to Composable Kernel
- file: license.rst
title: License

View File

@@ -0,0 +1,2 @@
rocm-docs-core[api_reference]==1.18.2
sphinxcontrib-bibtex==2.6.3

View File

@@ -0,0 +1,335 @@
#
# This file is autogenerated by pip-compile with Python 3.10
# by the following command:
#
# pip-compile requirements.in
#
accessible-pygments==0.0.5
# via pydata-sphinx-theme
alabaster==1.0.0
# via sphinx
asttokens==3.0.0
# via stack-data
attrs==25.3.0
# via
# jsonschema
# jupyter-cache
# referencing
babel==2.17.0
# via
# pydata-sphinx-theme
# sphinx
beautifulsoup4==4.13.4
# via pydata-sphinx-theme
breathe==4.36.0
# via rocm-docs-core
certifi==2025.1.31
# via requests
cffi==1.17.1
# via
# cryptography
# pynacl
charset-normalizer==3.4.1
# via requests
click==8.1.8
# via
# click-log
# doxysphinx
# jupyter-cache
# sphinx-external-toc
click-log==0.4.0
# via doxysphinx
comm==0.2.2
# via ipykernel
contourpy==1.3.2
# via matplotlib
cryptography==44.0.2
# via pyjwt
cycler==0.12.1
# via matplotlib
debugpy==1.8.14
# via ipykernel
decorator==5.2.1
# via ipython
deprecated==1.2.18
# via pygithub
docutils==0.21.2
# via
# myst-parser
# pybtex-docutils
# pydata-sphinx-theme
# sphinx
# sphinxcontrib-bibtex
doxysphinx==3.3.12
# via rocm-docs-core
exceptiongroup==1.2.2
# via ipython
executing==2.2.0
# via stack-data
fastjsonschema==2.21.1
# via
# nbformat
# rocm-docs-core
fonttools==4.57.0
# via matplotlib
gitdb==4.0.12
# via gitpython
gitpython==3.1.44
# via rocm-docs-core
greenlet==3.2.1
# via sqlalchemy
idna==3.10
# via requests
imagesize==1.4.1
# via sphinx
importlib-metadata==8.6.1
# via
# jupyter-cache
# myst-nb
ipykernel==6.29.5
# via myst-nb
ipython==8.35.0
# via
# ipykernel
# myst-nb
jedi==0.19.2
# via ipython
jinja2==3.1.6
# via
# myst-parser
# sphinx
jsonschema==4.23.0
# via nbformat
jsonschema-specifications==2024.10.1
# via jsonschema
jupyter-cache==1.0.1
# via myst-nb
jupyter-client==8.6.3
# via
# ipykernel
# nbclient
jupyter-core==5.7.2
# via
# ipykernel
# jupyter-client
# nbclient
# nbformat
kiwisolver==1.4.8
# via matplotlib
latexcodec==3.0.0
# via pybtex
libsass==0.22.0
# via doxysphinx
lxml==5.2.1
# via doxysphinx
markdown-it-py==3.0.0
# via
# mdit-py-plugins
# myst-parser
markupsafe==3.0.2
# via jinja2
matplotlib==3.10.1
# via doxysphinx
matplotlib-inline==0.1.7
# via
# ipykernel
# ipython
mdit-py-plugins==0.4.2
# via myst-parser
mdurl==0.1.2
# via markdown-it-py
mpire==2.10.2
# via doxysphinx
myst-nb==1.2.0
# via rocm-docs-core
myst-parser==4.0.1
# via myst-nb
nbclient==0.10.2
# via
# jupyter-cache
# myst-nb
nbformat==5.10.4
# via
# jupyter-cache
# myst-nb
# nbclient
nest-asyncio==1.6.0
# via ipykernel
numpy==1.26.4
# via
# contourpy
# doxysphinx
# matplotlib
packaging==25.0
# via
# ipykernel
# matplotlib
# pydata-sphinx-theme
# sphinx
parso==0.8.4
# via jedi
pexpect==4.9.0
# via ipython
pillow==11.2.1
# via matplotlib
platformdirs==4.3.7
# via jupyter-core
prompt-toolkit==3.0.51
# via ipython
psutil==7.0.0
# via ipykernel
ptyprocess==0.7.0
# via pexpect
pure-eval==0.2.3
# via stack-data
pybtex==0.24.0
# via
# pybtex-docutils
# sphinxcontrib-bibtex
pybtex-docutils==1.0.3
# via sphinxcontrib-bibtex
pycparser==2.22
# via cffi
pydata-sphinx-theme==0.15.4
# via
# rocm-docs-core
# sphinx-book-theme
pygithub==2.6.1
# via rocm-docs-core
pygments==2.19.1
# via
# accessible-pygments
# ipython
# mpire
# pydata-sphinx-theme
# sphinx
pyjson5==1.6.8
# via doxysphinx
pyjwt[crypto]==2.10.1
# via pygithub
pynacl==1.5.0
# via pygithub
pyparsing==3.2.3
# via
# doxysphinx
# matplotlib
python-dateutil==2.9.0.post0
# via
# jupyter-client
# matplotlib
pyyaml==6.0.2
# via
# jupyter-cache
# myst-nb
# myst-parser
# pybtex
# rocm-docs-core
# sphinx-external-toc
pyzmq==26.4.0
# via
# ipykernel
# jupyter-client
referencing==0.36.2
# via
# jsonschema
# jsonschema-specifications
requests==2.32.3
# via
# pygithub
# sphinx
rocm-docs-core[api-reference]==1.18.2
# via -r requirements.in
rpds-py==0.24.0
# via
# jsonschema
# referencing
six==1.17.0
# via
# pybtex
# python-dateutil
smmap==5.0.2
# via gitdb
snowballstemmer==2.2.0
# via sphinx
soupsieve==2.7
# via beautifulsoup4
sphinx==8.1.3
# via
# breathe
# myst-nb
# myst-parser
# pydata-sphinx-theme
# rocm-docs-core
# sphinx-book-theme
# sphinx-copybutton
# sphinx-design
# sphinx-external-toc
# sphinx-notfound-page
# sphinxcontrib-bibtex
sphinx-book-theme==1.1.4
# via rocm-docs-core
sphinx-copybutton==0.5.2
# via rocm-docs-core
sphinx-design==0.6.1
# via rocm-docs-core
sphinx-external-toc==1.0.1
# via rocm-docs-core
sphinx-notfound-page==1.1.0
# via rocm-docs-core
sphinxcontrib-applehelp==2.0.0
# via sphinx
sphinxcontrib-bibtex==2.6.3
# via -r requirements.in
sphinxcontrib-devhelp==2.0.0
# via sphinx
sphinxcontrib-htmlhelp==2.1.0
# via sphinx
sphinxcontrib-jsmath==1.0.1
# via sphinx
sphinxcontrib-qthelp==2.0.0
# via sphinx
sphinxcontrib-serializinghtml==2.0.0
# via sphinx
sqlalchemy==2.0.40
# via jupyter-cache
stack-data==0.6.3
# via ipython
tabulate==0.9.0
# via jupyter-cache
tomli==2.2.1
# via sphinx
tornado==6.4.2
# via
# ipykernel
# jupyter-client
tqdm==4.67.1
# via mpire
traitlets==5.14.3
# via
# comm
# ipykernel
# ipython
# jupyter-client
# jupyter-core
# matplotlib-inline
# nbclient
# nbformat
typing-extensions==4.13.2
# via
# beautifulsoup4
# ipython
# myst-nb
# pydata-sphinx-theme
# pygithub
# referencing
# sqlalchemy
urllib3==2.4.0
# via
# pygithub
# requests
wcwidth==0.2.13
# via prompt-toolkit
wrapt==1.17.2
# via deprecated
zipp==3.21.0
# via importlib-metadata

View File

@@ -0,0 +1,40 @@
.. meta::
:description: Composable Kernel examples and tests
:keywords: composable kernel, CK, ROCm, API, examples, tests
********************************************************************
Composable Kernel examples and tests
********************************************************************
After :doc:`building and installing Composable Kernel <../install/Composable-Kernel-install>`, the examples and tests will be moved to ``/opt/rocm/bin/``.
All tests have the prefix ``test`` and all examples have the prefix ``example``.
Use ``ctest`` with no arguments to run all examples and tests, or use ``ctest -R`` to run a single test. For example:
.. code:: shell
ctest -R test_gemm_fp16
Examples can be run individually as well. For example:
.. code:: shell
./bin/example_gemm_xdl_fp16 1 1 1
For instructions on how to run individual examples and tests, see their README files in the |example|_ and |test|_ GitHub folders.
To run smoke tests, use ``make smoke``.
To run regression tests, use ``make regression``.
In general, tests that run for under thirty seconds are included in the smoke tests and tests that run for over thirty seconds are included in the regression tests.
.. |example| replace:: ``example``
.. _example: https://github.com/ROCm/composable_kernel/tree/develop/example
.. |client_example| replace:: ``client_example``
.. _client_example: https://github.com/ROCm/composable_kernel/tree/develop/client_example
.. |test| replace:: ``test``
.. _test: https://github.com/ROCm/composable_kernel/tree/develop/test