creation of install doc and refactor of doc in general (#1908)

* creation of install doc and refactor of doc in general

* updates based on review comments

* updated based on review comments

* updated readme and contributors markdown

* added extra note to not use -j on its own

* added note about smoke tests and regression tests

* made changes as per Illia's feedback

---------

Co-authored-by: Aviral Goel <aviral.goel@amd.com>

[ROCm/composable_kernel commit: a426f67301]
This commit is contained in:
spolifroni-amd
2025-03-27 17:13:18 -04:00
committed by GitHub
parent 00342e69a2
commit 98cc377db2
15 changed files with 244 additions and 366 deletions

View File

@@ -5,26 +5,20 @@
.. _api-reference:
********************************************************************
API reference guide
Composable Kernel API reference guide
********************************************************************
This document contains details of the APIs for the Composable Kernel (CK) library and introduces
some of the key design principles that are used to write new classes that extend CK functionality.
This document contains details of the APIs for the Composable Kernel library and introduces some of the key design principles that are used to write new classes that extend the functionality of the Composable Kernel library.
=================
CK Datatypes
=================
-----------------
DeviceMem
-----------------
=================
.. doxygenstruct:: DeviceMem
---------------------------
=============================
Kernels For Flashattention
---------------------------
=============================
The Flashattention algorithm is defined in :cite:t:`dao2022flashattention`. This section lists
the classes that are used in the CK GPU implementation of Flashattention.

View File

@@ -1,20 +1,15 @@
.. meta::
:description: Composable Kernel documentation and API reference library
:keywords: composable kernel, CK, ROCm, API, documentation
:description: Composable Kernel wrapper
:keywords: composable kernel, CK, ROCm, API, wrapper
.. _wrapper:
********************************************************************
Wrapper
Composable Kernel wrapper
********************************************************************
-------------------------------------
Description
-------------------------------------
The CK library provides a lightweight wrapper for more complex operations implemented in
the library.
The Composable Kernel library provides a lightweight wrapper to simplify the more complex operations.
Example:

View File

@@ -1,80 +0,0 @@
.. meta::
:description: Composable Kernel documentation and API reference library
:keywords: composable kernel, CK, ROCm, API, documentation
.. _supported-primitives:
********************************************************************
Supported Primitives Guide
********************************************************************
This document contains details of supported primitives in Composable Kernel (CK). In contrast to the API Reference Guide, the Supported Primitives Guide is an introduction to the math which underpins the algorithms implemented in CK.
------------
Softmax
------------
For vectors :math:`x^{(1)}, x^{(2)}, \ldots, x^{(T)}` of size :math:`B` you can decompose the
softmax of concatenated :math:`x = [ x^{(1)}\ | \ \ldots \ | \ x^{(T)} ]` as,
.. math::
:nowrap:
\begin{align}
m(x) & = m( [ x^{(1)}\ | \ \ldots \ | \ x^{(T)} ] ) = \max( m(x^{(1)}),\ldots, m(x^{(T)}) ) \\
f(x) & = [\exp( m(x^{(1)}) - m(x) ) f( x^{(1)} )\ | \ \ldots \ | \ \exp( m(x^{(T)}) - m(x) ) f( x^{(T)} )] \\
z(x) & = \exp( m(x^{(1)}) - m(x) )\ z(x^{(1)}) + \ldots + \exp( m(x^{(T)}) - m(x) )\ z(x^{(1)}) \\
\operatorname{softmax}(x) &= f(x)\ / \ z(x)
\end{align}
where :math:`f(x^{(j)}) = \exp( x^{(j)} - m(x^{(j)}) )` is of size :math:`B` and
:math:`z(x^{(j)}) = f(x_1^{(j)})+ \ldots+ f(x_B^{(j)})` is a scalar.
For a matrix :math:`X` composed of :math:`T_r \times T_c` tiles, :math:`X_{ij}`, of size
:math:`B_r \times B_c` you can compute the row-wise softmax as follows.
For :math:`j` from :math:`1` to :math:`T_c`, and :math:`i` from :math:`1` to :math:`T_r` calculate,
.. math::
:nowrap:
\begin{align}
\tilde{m}_{ij} &= \operatorname{rowmax}( X_{ij} ) \\
\tilde{P}_{ij} &= \exp(X_{ij} - \tilde{m}_{ij} ) \\
\tilde{z}_{ij} &= \operatorname{rowsum}( P_{ij} ) \\
\end{align}
If :math:`j=1`, initialize running max, running sum, and the first column block of the output,
.. math::
:nowrap:
\begin{align}
m_i &= \tilde{m}_{i1} \\
z_i &= \tilde{z}_{i1} \\
\tilde{Y}_{i1} &= \diag(\tilde{z}_{ij})^{-1} \tilde{P}_{i1}
\end{align}
Else if :math:`j>1`,
1. Update running max, running sum and column blocks :math:`k=1` to :math:`k=j-1`
.. math::
:nowrap:
\begin{align}
m^{new}_i &= \max(m_i, \tilde{m}_{ij} ) \\
z^{new}_i &= \exp(m_i - m^{new}_i)\ z_i + \exp( \tilde{m}_{ij} - m^{new}_i )\ \tilde{z}_{ij} \\
Y_{ik} &= \diag(z^{new}_{i})^{-1} \diag(z_{i}) \exp(m_i - m^{new}_i)\ Y_{ik}
\end{align}
2. Initialize column block :math:`j` of output and reset running max and running sum variables:
.. math::
:nowrap:
\begin{align}
\tilde{Y}_{ij} &= \diag(z^{new}_{i})^{-1} \exp(\tilde{m}_{ij} - m^{new}_i ) \tilde{P}_{ij} \\
z_i &= z^{new}_i \\
m_i &= m^{new}_i \\
\end{align}