mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-04-20 06:49:15 +00:00
creation of install doc and refactor of doc in general (#1908)
* creation of install doc and refactor of doc in general * updates based on review comments * updated based on review comments * updated readme and contributors markdown * added extra note to not use -j on its own * added note about smoke tests and regression tests * made changes as per Illia's feedback --------- Co-authored-by: Aviral Goel <aviral.goel@amd.com>
This commit is contained in:
77
docs/conceptual/Composable-Kernel-math.rst
Normal file
77
docs/conceptual/Composable-Kernel-math.rst
Normal file
@@ -0,0 +1,77 @@
|
||||
.. meta::
|
||||
:description: Composable Kernel mathematical basis
|
||||
:keywords: composable kernel, CK, ROCm, API, mathematics, algorithm
|
||||
|
||||
.. _supported-primitives:
|
||||
|
||||
********************************************************************
|
||||
Composable Kernel mathematical basis
|
||||
********************************************************************
|
||||
|
||||
This is an introduction to the math which underpins the algorithms implemented in Composable Kernel.
|
||||
|
||||
|
||||
For vectors :math:`x^{(1)}, x^{(2)}, \ldots, x^{(T)}` of size :math:`B` you can decompose the
|
||||
softmax of concatenated :math:`x = [ x^{(1)}\ | \ \ldots \ | \ x^{(T)} ]` as,
|
||||
|
||||
.. math::
|
||||
:nowrap:
|
||||
|
||||
\begin{align}
|
||||
m(x) & = m( [ x^{(1)}\ | \ \ldots \ | \ x^{(T)} ] ) = \max( m(x^{(1)}),\ldots, m(x^{(T)}) ) \\
|
||||
f(x) & = [\exp( m(x^{(1)}) - m(x) ) f( x^{(1)} )\ | \ \ldots \ | \ \exp( m(x^{(T)}) - m(x) ) f( x^{(T)} )] \\
|
||||
z(x) & = \exp( m(x^{(1)}) - m(x) )\ z(x^{(1)}) + \ldots + \exp( m(x^{(T)}) - m(x) )\ z(x^{(1)}) \\
|
||||
\operatorname{softmax}(x) &= f(x)\ / \ z(x)
|
||||
\end{align}
|
||||
|
||||
where :math:`f(x^{(j)}) = \exp( x^{(j)} - m(x^{(j)}) )` is of size :math:`B` and
|
||||
:math:`z(x^{(j)}) = f(x_1^{(j)})+ \ldots+ f(x_B^{(j)})` is a scalar.
|
||||
|
||||
For a matrix :math:`X` composed of :math:`T_r \times T_c` tiles, :math:`X_{ij}`, of size
|
||||
:math:`B_r \times B_c` you can compute the row-wise softmax as follows.
|
||||
|
||||
For :math:`j` from :math:`1` to :math:`T_c`, and :math:`i` from :math:`1` to :math:`T_r` calculate,
|
||||
|
||||
.. math::
|
||||
:nowrap:
|
||||
|
||||
\begin{align}
|
||||
\tilde{m}_{ij} &= \operatorname{rowmax}( X_{ij} ) \\
|
||||
\tilde{P}_{ij} &= \exp(X_{ij} - \tilde{m}_{ij} ) \\
|
||||
\tilde{z}_{ij} &= \operatorname{rowsum}( P_{ij} ) \\
|
||||
\end{align}
|
||||
|
||||
If :math:`j=1`, initialize running max, running sum, and the first column block of the output,
|
||||
|
||||
.. math::
|
||||
:nowrap:
|
||||
|
||||
\begin{align}
|
||||
m_i &= \tilde{m}_{i1} \\
|
||||
z_i &= \tilde{z}_{i1} \\
|
||||
\tilde{Y}_{i1} &= \diag(\tilde{z}_{ij})^{-1} \tilde{P}_{i1}
|
||||
\end{align}
|
||||
|
||||
Else if :math:`j>1`,
|
||||
|
||||
1. Update running max, running sum and column blocks :math:`k=1` to :math:`k=j-1`
|
||||
|
||||
.. math::
|
||||
:nowrap:
|
||||
|
||||
\begin{align}
|
||||
m^{new}_i &= \max(m_i, \tilde{m}_{ij} ) \\
|
||||
z^{new}_i &= \exp(m_i - m^{new}_i)\ z_i + \exp( \tilde{m}_{ij} - m^{new}_i )\ \tilde{z}_{ij} \\
|
||||
Y_{ik} &= \diag(z^{new}_{i})^{-1} \diag(z_{i}) \exp(m_i - m^{new}_i)\ Y_{ik}
|
||||
\end{align}
|
||||
|
||||
2. Initialize column block :math:`j` of output and reset running max and running sum variables:
|
||||
|
||||
.. math::
|
||||
:nowrap:
|
||||
|
||||
\begin{align}
|
||||
\tilde{Y}_{ij} &= \diag(z^{new}_{i})^{-1} \exp(\tilde{m}_{ij} - m^{new}_i ) \tilde{P}_{ij} \\
|
||||
z_i &= z^{new}_i \\
|
||||
m_i &= m^{new}_i \\
|
||||
\end{align}
|
||||
29
docs/conceptual/Composable-Kernel-structure.rst
Normal file
29
docs/conceptual/Composable-Kernel-structure.rst
Normal file
@@ -0,0 +1,29 @@
|
||||
.. meta::
|
||||
:description: Composable Kernel structure
|
||||
:keywords: composable kernel, CK, ROCm, API, structure
|
||||
|
||||
.. _what-is-ck:
|
||||
|
||||
********************************************************************
|
||||
Composable Kernel structure
|
||||
********************************************************************
|
||||
|
||||
The Composable Kernel library uses a tile-based programming model and tensor coordinate transformation to achieve performance portability and code maintainability. Tensor coordinate transformation is a complexity reduction technique for complex machine learning operators.
|
||||
|
||||
|
||||
.. image:: ../data/ck_component.png
|
||||
:alt: CK Components
|
||||
|
||||
|
||||
The Composable Kernel library consists of four layers:
|
||||
|
||||
* a templated tile operator layer
|
||||
* a templated kernel and invoker layer
|
||||
* an instantiated kernel and invoker layer
|
||||
* a client API layer.
|
||||
|
||||
A wrapper component is included to simplify tensor transform operations.
|
||||
|
||||
.. image:: ../data/ck_layer.png
|
||||
:alt: CK Layers
|
||||
|
||||
@@ -1,41 +0,0 @@
|
||||
.. meta::
|
||||
:description: Composable Kernel documentation and API reference library
|
||||
:keywords: composable kernel, CK, ROCm, API, documentation
|
||||
|
||||
.. _what-is-ck:
|
||||
|
||||
********************************************************************
|
||||
What is the Composable Kernel library
|
||||
********************************************************************
|
||||
|
||||
|
||||
Methodology
|
||||
===========
|
||||
|
||||
The Composable Kernel (CK) library provides a programming model for writing performance critical kernels for machine learning workloads across multiple architectures including GPUs and CPUs, through general purpose kernel languages like HIP C++.
|
||||
|
||||
CK utilizes two concepts to achieve performance portability and code maintainability:
|
||||
|
||||
* A tile-based programming model
|
||||
* Algorithm complexity reduction for complex ML operators using an innovative technique called
|
||||
"Tensor Coordinate Transformation".
|
||||
|
||||
.. image:: ../data/ck_component.png
|
||||
:alt: CK Components
|
||||
|
||||
|
||||
Code Structure
|
||||
==============
|
||||
|
||||
The CK library is structured into 4 layers:
|
||||
|
||||
* "Templated Tile Operators" layer
|
||||
* "Templated Kernel and Invoker" layer
|
||||
* "Instantiated Kernel and Invoker" layer
|
||||
* "Client API" layer
|
||||
|
||||
It also includes a simple wrapper component used to perform tensor transform operations more easily and with fewer lines of code.
|
||||
|
||||
.. image:: ../data/ck_layer.png
|
||||
:alt: CK Layers
|
||||
|
||||
Reference in New Issue
Block a user