* New docs directory with minimal config
* Based on docs directory of rocBLAS
* Config for running Doxygen then Sphinx to generate HTML
* Add minimal content - intro to doc
* Add some boilerplate sections to doc
* content still needs to be done,
* e.g., need to generate API documentation using Doxygen
* need to write contributor guide
* Start Softmax section of Support Primitives doc
* Written as a test bed for typesetting math content
* Need to decide how much detail to go into
* add doc directories to git ignore file.
* Minor edits - new line at EOF, change year in copyright notices
* Port Markdown files to ReStructuredText
* Copy Markdown files from pre-existing doc directory to docs directory
* Convert to reStructured Text (rst) - section headings, links, tables
have a different syntax in rst
* New rst files added to index - can generate HTML with same style as
HTML generated from rst files in previous commits
* Intention is to make all the content in doc redundant and use rst
throughout rather than mix of md and rst
* Extend Softmax section of Primitives Guide
* rename l to z
* add material on applying softmax row-wise to matrix
* define macro for diag operator (represents diagonal matrix)
---------
Co-authored-by: zjing14 <zhangjing14@gmail.com>
[ROCm/composable_kernel commit: cb3fac4d2a]
* Add layernorm client example
* [What] Add default make install dir to gitignore
[Why] client example need to make install
[ROCm/composable_kernel commit: e1a3fff675]
* add gitignore
* host tensor: allow generating sequentially increasing value in a given dimension
* gridwise gemm v3r1: allow distinct K0/K1 values for A/B block descriptor
- remove dangling header include
- modify example gemm_xdl accordingly
- infer KPack value from M/NPerXdl
- device conv2d fwd: update parameters accordingly for the underlying gridwise gemm v3r1
(API for conv2d fwd stays the same for now until we decide to expose individual K0s for activation and weight)
* add LDS data dump utility
* profiler: reflect API change for distinct K0/K1 for A/B matrices
* profiler: add conflict-free LDS write FP16 kernel instances
* fix accidental perf regression
* address feedback; cosmetic changes
* clang-format for new files
* format
Co-authored-by: Chao Liu <chao.liu2@amd.com>
[ROCm/composable_kernel commit: 6d4450ef15]