Files
composable_kernel/docs/source/Supported_Primitives_Guide.rst
pmaybank cb3fac4d2a Sphinx doc (#581)
* New docs directory with minimal config

* Based on docs directory of rocBLAS

* Config for running Doxygen then Sphinx to generate HTML

* Add minimal content - intro to doc

* Add some boilerplate sections to doc

* content still needs to be done,
* e.g., need to generate API documentation using Doxygen
* need to write contributor guide

* Start Softmax section of Support Primitives doc

* Written as a test bed for typesetting math content

* Need to decide how much detail to go into

* add doc directories to git ignore file.

* Minor edits - new line at EOF, change year in copyright notices

* Port Markdown files to ReStructuredText

* Copy Markdown files from pre-existing doc directory to docs directory

* Convert to reStructured Text (rst) - section headings, links, tables
  have a different syntax in rst

* New rst files added to index - can generate HTML with same style as
  HTML generated from rst files in previous commits

* Intention is to make all the content in doc redundant and use rst
  throughout rather than mix of md and rst

* Extend Softmax section of Primitives Guide

* rename l to z

* add material on applying softmax row-wise to matrix

* define macro for diag operator (represents diagonal matrix)

---------

Co-authored-by: zjing14 <zhangjing14@gmail.com>
2023-02-15 17:17:46 -06:00

75 lines
2.6 KiB
ReStructuredText

==========================
Supported Primitives Guide
==========================
This document contains details of supported primitives in Composable Kernel (CK). In contrast to the API Reference
Guide, the Supported Primitives Guide is an introduction to the math which underpins the algorithms implemented in CK.
------------
Softmax
------------
For vectors :math:`x^{(1)}, x^{(2)}, \ldots, x^{(T)}` of size :math:`B` we can decompose the softmax of concatenated
:math:`x = [ x^{(1)}\ | \ \ldots \ | \ x^{(T)} ]` as,
.. math::
:nowrap:
\begin{align}
m(x) & = m( [ x^{(1)}\ | \ \ldots \ | \ x^{(T)} ] ) = \max( m(x^{(1)}),\ldots, m(x^{(T)}) ) \\
f(x) & = [\exp( m(x^{(1)}) - m(x) ) f( x^{(1)} )\ | \ \ldots \ | \ \exp( m(x^{(T)}) - m(x) ) f( x^{(T)} )] \\
z(x) & = \exp( m(x^{(1)}) - m(x) )\ z(x^{(1)}) + \ldots + \exp( m(x^{(T)}) - m(x) )\ z(x^{(1)}) \\
\operatorname{softmax}(x) &= f(x)\ / \ z(x)
\end{align}
where :math:`f(x^{(j)}) = \exp( x^{(j)} - m(x^{(j)}) )` is of size :math:`B` and
:math:`z(x^{(j)}) = f(x_1^{(j)})+ \ldots+ f(x_B^{(j)})` is a scalar.
For a matrix :math:`X` composed of :math:`T_r \times T_c` tiles, :math:`X_{ij}`, of size :math:`B_r \times B_c` we can
compute the row-wise softmax as follows.
For :math:`j` from :math:`1` to :math:`T_c`, and :math:`i` from :math:`1` to :math:`T_r` calculate,
.. math::
:nowrap:
\begin{align}
\tilde{m}_{ij} &= \operatorname{rowmax}( X_{ij} ) \\
\tilde{P}_{ij} &= \exp(X_{ij} - \tilde{m}_{ij} ) \\
\tilde{z}_{ij} &= \operatorname{rowsum}( P_{ij} ) \\
\end{align}
If :math:`j=1`, initialize running max, running sum, and the first column block of the output,
.. math::
:nowrap:
\begin{align}
m_i &= \tilde{m}_{i1} \\
z_i &= \tilde{z}_{i1} \\
\tilde{Y}_{i1} &= \diag(\tilde{z}_{ij})^{-1} \tilde{P}_{i1}
\end{align}
Else if :math:`j>1`,
1. Update running max, running sum and column blocks :math:`k=1` to :math:`k=j-1`
.. math::
:nowrap:
\begin{align}
m^{new}_i &= \max(m_i, \tilde{m}_{ij} ) \\
z^{new}_i &= \exp(m_i - m^{new}_i)\ z_i + \exp( \tilde{m}_{ij} - m^{new}_i )\ \tilde{z}_{ij} \\
Y_{ik} &= \diag(z^{new}_{i})^{-1} \diag(z_{i}) \exp(m_i - m^{new}_i)\ Y_{ik}
\end{align}
2. Initialize column block :math:`j` of output and reset running max and running sum variables:
.. math::
:nowrap:
\begin{align}
\tilde{Y}_{ij} &= \diag(z^{new}_{i})^{-1} \exp(\tilde{m}_{ij} - m^{new}_i ) \tilde{P}_{ij} \\
z_i &= z^{new}_i \\
m_i &= m^{new}_i \\
\end{align}