mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-11 17:00:18 +00:00
* doc reorg and edits * Update wrapper.rst with changes from PR #1098 * Update docs/dockerhub.rst Co-authored-by: Bartlomiej Wroblewski <bwroblewski10@gmail.com> * Update docs/index.rst Co-authored-by: Bartlomiej Wroblewski <bwroblewski10@gmail.com> * Update docs/what-is-ck.rst Co-authored-by: Bartlomiej Wroblewski <bwroblewski10@gmail.com> * Update docs/what-is-ck.rst Restored to 4 bullets, with additional text for wrapper. Co-authored-by: Bartlomiej Wroblewski <bwroblewski10@gmail.com> * Update docs/Contributors_Guide.rst Co-authored-by: Lisa <lisajdelaney@gmail.com> * Update API_Reference_Guide.rst using sentence case for title * updated index structure per Lisa * separate docker hub and tutorial --------- Co-authored-by: Bartlomiej Wroblewski <bwroblewski10@gmail.com> Co-authored-by: Lisa <lisajdelaney@gmail.com> Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
81 lines
2.8 KiB
ReStructuredText
81 lines
2.8 KiB
ReStructuredText
.. meta::
|
|
:description: Composable Kernel documentation and API reference library
|
|
:keywords: composable kernel, CK, ROCm, API, documentation
|
|
|
|
.. _supported-primitives:
|
|
|
|
********************************************************************
|
|
Supported Primitives Guide
|
|
********************************************************************
|
|
|
|
This document contains details of supported primitives in Composable Kernel (CK). In contrast to the API Reference Guide, the Supported Primitives Guide is an introduction to the math which underpins the algorithms implemented in CK.
|
|
|
|
------------
|
|
Softmax
|
|
------------
|
|
|
|
For vectors :math:`x^{(1)}, x^{(2)}, \ldots, x^{(T)}` of size :math:`B` you can decompose the
|
|
softmax of concatenated :math:`x = [ x^{(1)}\ | \ \ldots \ | \ x^{(T)} ]` as,
|
|
|
|
.. math::
|
|
:nowrap:
|
|
|
|
\begin{align}
|
|
m(x) & = m( [ x^{(1)}\ | \ \ldots \ | \ x^{(T)} ] ) = \max( m(x^{(1)}),\ldots, m(x^{(T)}) ) \\
|
|
f(x) & = [\exp( m(x^{(1)}) - m(x) ) f( x^{(1)} )\ | \ \ldots \ | \ \exp( m(x^{(T)}) - m(x) ) f( x^{(T)} )] \\
|
|
z(x) & = \exp( m(x^{(1)}) - m(x) )\ z(x^{(1)}) + \ldots + \exp( m(x^{(T)}) - m(x) )\ z(x^{(1)}) \\
|
|
\operatorname{softmax}(x) &= f(x)\ / \ z(x)
|
|
\end{align}
|
|
|
|
where :math:`f(x^{(j)}) = \exp( x^{(j)} - m(x^{(j)}) )` is of size :math:`B` and
|
|
:math:`z(x^{(j)}) = f(x_1^{(j)})+ \ldots+ f(x_B^{(j)})` is a scalar.
|
|
|
|
For a matrix :math:`X` composed of :math:`T_r \times T_c` tiles, :math:`X_{ij}`, of size
|
|
:math:`B_r \times B_c` you can compute the row-wise softmax as follows.
|
|
|
|
For :math:`j` from :math:`1` to :math:`T_c`, and :math:`i` from :math:`1` to :math:`T_r` calculate,
|
|
|
|
.. math::
|
|
:nowrap:
|
|
|
|
\begin{align}
|
|
\tilde{m}_{ij} &= \operatorname{rowmax}( X_{ij} ) \\
|
|
\tilde{P}_{ij} &= \exp(X_{ij} - \tilde{m}_{ij} ) \\
|
|
\tilde{z}_{ij} &= \operatorname{rowsum}( P_{ij} ) \\
|
|
\end{align}
|
|
|
|
If :math:`j=1`, initialize running max, running sum, and the first column block of the output,
|
|
|
|
.. math::
|
|
:nowrap:
|
|
|
|
\begin{align}
|
|
m_i &= \tilde{m}_{i1} \\
|
|
z_i &= \tilde{z}_{i1} \\
|
|
\tilde{Y}_{i1} &= \diag(\tilde{z}_{ij})^{-1} \tilde{P}_{i1}
|
|
\end{align}
|
|
|
|
Else if :math:`j>1`,
|
|
|
|
1. Update running max, running sum and column blocks :math:`k=1` to :math:`k=j-1`
|
|
|
|
.. math::
|
|
:nowrap:
|
|
|
|
\begin{align}
|
|
m^{new}_i &= \max(m_i, \tilde{m}_{ij} ) \\
|
|
z^{new}_i &= \exp(m_i - m^{new}_i)\ z_i + \exp( \tilde{m}_{ij} - m^{new}_i )\ \tilde{z}_{ij} \\
|
|
Y_{ik} &= \diag(z^{new}_{i})^{-1} \diag(z_{i}) \exp(m_i - m^{new}_i)\ Y_{ik}
|
|
\end{align}
|
|
|
|
2. Initialize column block :math:`j` of output and reset running max and running sum variables:
|
|
|
|
.. math::
|
|
:nowrap:
|
|
|
|
\begin{align}
|
|
\tilde{Y}_{ij} &= \diag(z^{new}_{i})^{-1} \exp(\tilde{m}_{ij} - m^{new}_i ) \tilde{P}_{ij} \\
|
|
z_i &= z^{new}_i \\
|
|
m_i &= m^{new}_i \\
|
|
\end{align}
|