* update documentation dependencies
add version number to docs
rename doc config directories
enable more doc formats on rtd
add license section in docs
* New docs directory with minimal config
* Based on docs directory of rocBLAS
* Config for running Doxygen then Sphinx to generate HTML
* Add minimal content - intro to doc
* Add some boilerplate sections to doc
* content still needs to be done,
* e.g., need to generate API documentation using Doxygen
* need to write contributor guide
* Start Softmax section of Support Primitives doc
* Written as a test bed for typesetting math content
* Need to decide how much detail to go into
* add doc directories to git ignore file.
* Minor edits - new line at EOF, change year in copyright notices
* Port Markdown files to ReStructuredText
* Copy Markdown files from pre-existing doc directory to docs directory
* Convert to reStructured Text (rst) - section headings, links, tables
have a different syntax in rst
* New rst files added to index - can generate HTML with same style as
HTML generated from rst files in previous commits
* Intention is to make all the content in doc redundant and use rst
throughout rather than mix of md and rst
* Extend Softmax section of Primitives Guide
* rename l to z
* add material on applying softmax row-wise to matrix
* define macro for diag operator (represents diagonal matrix)
---------
Co-authored-by: zjing14 <zhangjing14@gmail.com>
* Switch to standard ROCm packaging
* Revert .gitignore changes
* install new rocm-cmake version
* update readme
Co-authored-by: illsilin <Illia.Silin@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>
* add gitignore
* host tensor: allow generating sequentially increasing value in a given dimension
* gridwise gemm v3r1: allow distinct K0/K1 values for A/B block descriptor
- remove dangling header include
- modify example gemm_xdl accordingly
- infer KPack value from M/NPerXdl
- device conv2d fwd: update parameters accordingly for the underlying gridwise gemm v3r1
(API for conv2d fwd stays the same for now until we decide to expose individual K0s for activation and weight)
* add LDS data dump utility
* profiler: reflect API change for distinct K0/K1 for A/B matrices
* profiler: add conflict-free LDS write FP16 kernel instances
* fix accidental perf regression
* address feedback; cosmetic changes
* clang-format for new files
* format
Co-authored-by: Chao Liu <chao.liu2@amd.com>