Added Doxygen support for extension APIs.

Details: - Added Doxyfile, a configuration file in docs directory for generating Doxygen document from source files. - Currently only CBLAS interface of (Batched gemm and gemmt)extension APIs are included. - Support for BLAS interface is yet to be added. - To generate Doxygen based document for extension API, use given command. $ doxygen docs/Doxyfile AMD-Internal: [CPUPL-3188] Change-Id: I76e70b08f0114a528e86514bcb01d666acc591e8
2026-04-20 07:38:53 +00:00 · 2023-04-14 08:19:06 -05:00
parent b531022bac
commit b85b856950
7 changed files with 4909 additions and 4 deletions
--- a/docs/Doxyfile
+++ b/docs/Doxyfile
--- a/docs/Main_Page.md
+++ b/docs/Main_Page.md
@@ -0,0 +1,138 @@
+@mainpage
+# Welcome to AOCL-BLIS
+
+---
+
+## Table of Content
+    * [Introduction](#Introduction)
+    * [Build and Installation](#Build)
+    * [Examples](#Example)
+    * [Contact Us](#Contact)
+
+
+<div id="Introduction" name="Introduction"></div>
+
+## Introduction
+
+<b> AOCL BLIS </b> BLIS is a portable software framework for instantiating high-performance BLAS-like dense linear algebra libraries. The framework was designed to isolate essential kernels of computation that, when optimized, immediately enable optimized implementations of most of its commonly used and computationally intensive operations. BLIS is written in ISO C99 and available under a new/modified/3-clause BSD license. While BLIS exports a new BLAS-like API, it also includes a BLAS compatibility layer which gives application developers access to BLIS implementations via traditional BLAS routine calls. An object-based API unique to BLIS is also available.
+
+How to Download BLIS
+--------------------
+
+There are a few ways to download BLIS. We list the most common four ways below.
+We **highly recommend** using either Option 1 or 2. Otherwise, we recommend
+Option 3 (over Option 4) so your compiler can perform optimizations specific
+to your hardware.
+
+1. **Download a source repository with `git clone`.**
+Generally speaking, we prefer using `git clone` to clone a `git` repository.
+Having a repository allows the user to periodically pull in the latest changes
+and quickly rebuild BLIS whenever they wish. Also, implicit in cloning a
+repository is that the repository defaults to using the `master` branch, which
+contains the latest "stable" commits since the most recent release. (This is
+in contrast to Option 3 in which the user is opting for code that may be
+slightly out of date.)
+
+   In order to clone a `git` repository of BLIS, please obtain a repository
+URL by clicking on the green button above the file/directory listing near the
+top of this page (as rendered by GitHub). Generally speaking, it will amount
+to executing the following command in your terminal shell:
+   ```
+   git clone https://github.com/amd/blis.git
+   ```
+
+2. **Download a source repository via a zip file.**
+If you are uncomfortable with using `git` but would still like the latest
+stable commits, we recommend that you download BLIS as a zip file.
+
+   In order to download a zip file of the BLIS source distribution, please
+click on the green button above the file listing near the top of this page.
+This should reveal a link for downloading the zip file.
+
+3. **Download a source release via a tarball/zip file.**
+Alternatively, if you would like to stick to the code that is included in
+official releases, you may download either a tarball or zip file of any of
+BLIS's previous [tagged releases](https://github.com/flame/blis/releases).
+We consider this option to be less than ideal for most people since it will
+likely mean you miss out on the latest bugfix or feature commits (in contrast
+to Options 1 or 2), and you also will not be able to update your code with a
+simple `git pull` command (in contrast to Option 1).
+
+4. **Download a binary package specific to your OS.**
+While we don't recommend this as the first choice for most users, we provide
+links to community members who generously maintain BLIS packages for various
+Linux distributions such as Debian Unstable and EPEL/Fedora. Please see the
+[External Packages](#external-packages) section below for more information.
+
+Getting Started
+---------------
+
+*NOTE: This section assumes you've either cloned a BLIS source code repository
+via `git`, downloaded the latest source code via a zip file, or downloaded the
+source code for a tagged version release---Options 1, 2, or 3, respectively,
+as discussed in [the previous section](#how-to-download-blis).*
+
+If you just want to build a sequential (not parallelized) version of BLIS
+in a hurry and come back and explore other topics later, you can configure
+and build BLIS as follows:
+```
+$ ./configure auto
+$ make [-j]
+```
+You can then verify your build by running BLAS- and BLIS-specific test
+drivers via `make check`:
+```
+$ make check [-j]
+```
+And if you would like to install BLIS to the directory specified to `configure`
+via the `--prefix` option, run the `install` target:
+```
+$ make install
+```
+Please read the output of `./configure --help` for a full list of configure-time
+options.
+If/when you have time, we *strongly* encourage you to read the detailed
+walkthrough of the build system found in our [Build System](docs/BuildSystem.md)
+guide.
+
+Example Code
+------------
+
+The BLIS source distribution provides example code in the `examples` directory.
+Example code focuses on using BLIS APIs (not BLAS or CBLAS), and resides in
+two subdirectories: [examples/oapi](examples/oapi) (which demonstrates the
+[object API](docs/BLISObjectAPI.md)) and [examples/tapi](examples/tapi) (which
+demonstrates the [typed API](docs/BLISTypedAPI.md)).
+
+Either directory contains several files, each containing various pieces of
+code that exercise core functionality of the BLIS API in question (object or
+typed). These example files should be thought of collectively like a tutorial,
+and therefore it is recommended to start from the beginning (the file that
+starts in `00`).
+
+You can build all of the examples by simply running `make` from either example
+subdirectory (`examples/oapi` or `examples/tapi`). (You can also run
+`make clean`.) The local `Makefile` assumes that you've already configured and
+built (but not necessarily installed) BLIS two directories up, in `../..`. If
+you have already installed BLIS to some permanent directory, you may refer to
+that installation by setting the environment variable `BLIS_INSTALL_PATH` prior
+to running make:
+```
+export BLIS_INSTALL_PATH=/usr/local; make
+```
+or by setting the same variable as part of the make command:
+```
+make BLIS_INSTALL_PATH=/usr/local
+```
+**Once the executable files have been built, we recommend reading the code and
+the corresponding executable output side by side. This will help you see the
+effects of each section of code.**
+
+This tutorial is not exhaustive or complete; several object API functions were
+omitted (mostly for brevity's sake) and thus more examples could be written.
+
+<div id = "Contact"></div>
+
+## CONTACTS
+
+AOCL BLIS is developed and maintained by AMD. You can contact us on the email-id <b>[aoclsupport@amd.com](mailto:aoclsupport@amd.com)</b>
--- a/docs/styling/AMD_Logo.png
+++ b/docs/styling/AMD_Logo.png
--- a/docs/styling/doxygen-awesome.css
+++ b/docs/styling/doxygen-awesome.css
--- a/docs/styling/footer.html
+++ b/docs/styling/footer.html
@@ -0,0 +1,43 @@
+<!--
+ Copyright (C) 2023, Advanced Micro Devices. All rights reserved.
+
+ Redistribution and use in source and binary forms, with or without
+ modification, are permitted provided that the following conditions are met:
+ 1. Redistributions of source code must retain the above copyright notice,
+    this list of conditions and the following disclaimer.
+ 2. Redistributions in binary form must reproduce the above copyright notice,
+    this list of conditions and the following disclaimer in the documentation
+    and/or other materials provided with the distribution.
+ 3. Neither the name of the copyright holder nor the names of its contributors
+    may be used to endorse or promote products derived from this software
+ without specific prior written permission.
+
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ POSSIBILITY OF SUCH DAMAGE. -->
+
+ <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+ <html xmlns="http://www.w3.org/1999/xhtml">
+    <style>
+        .footer {
+          position: relative;
+          left: 0;
+          bottom: 0;
+          width: 100%;
+          background-color: rgba(22, 22, 22, 0);
+          text-align: center;
+          padding: 50px 0px 25px 0px;
+        }
+        </style>
+ <body>
+    <div class = "footer"> &nbsp; Copyright (C) 2023, Advanced Micro Devices. All rights reserved. </div>
+ </body>
+ </html>
--- a/docs/styling/header.html
+++ b/docs/styling/header.html
@@ -0,0 +1,87 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml">
+<head>
+<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
+<meta http-equiv="X-UA-Compatible" content="IE=9"/>
+<meta name="generator" content="Doxygen $doxygenversion"/>
+<meta name="viewport" content="width=device-width, initial-scale=1"/>
+
+<!-- BEGIN opengraph metadata -->
+<meta property="og:title" content="Doxygen Awesome" />
+<meta property="og:image" content="https://repository-images.githubusercontent.com/348492097/4f16df80-88fb-11eb-9d31-4015ff22c452" />
+<meta property="og:description" content="Custom CSS theme for doxygen html-documentation with lots of customization parameters." />
+<meta property="og:url" content="https://jothepro.github.io/doxygen-awesome-css/" />
+<!-- END opengraph metadata -->
+
+<!-- BEGIN twitter metadata -->
+<meta name="twitter:image:src" content="https://repository-images.githubusercontent.com/348492097/4f16df80-88fb-11eb-9d31-4015ff22c452" />
+<meta name="twitter:title" content="Doxygen Awesome" />
+<meta name="twitter:description" content="Custom CSS theme for doxygen html-documentation with lots of customization parameters." />
+<!-- END twitter metadata -->
+
+<!--BEGIN PROJECT_NAME--><title>$projectname: $title</title><!--END PROJECT_NAME-->
+<!--BEGIN !PROJECT_NAME--><title>$title</title><!--END !PROJECT_NAME-->
+<link href="$relpath^tabs.css" rel="stylesheet" type="text/css"/>
+<link rel="icon" type="image/svg+xml" href="logo.drawio.svg"/>
+<script type="text/javascript" src="$relpath^jquery.js"></script>
+<script type="text/javascript" src="$relpath^dynsections.js"></script>
+<script type="text/javascript" src="$relpath^doxygen-darkmode-toggle.js"></script>
+<script type="text/javascript" src="$relpath^doxygen-fragment-copy-button.js"></script>
+<!-- <script type="text/javascript" src="$relpath^doxygen-awesome-paragraph-link.js"></script> -->
+<script type="text/javascript" src="$relpath^doxygen-interactive-toc.js"></script>
+<!-- <script type="text/javascript" src="$relpath^toggle-alternative-theme.js"></script> -->
+<script type="text/javascript">
+    DoxygenAwesomeFragmentCopyButton.init()
+    DoxygenAwesomeDarkModeToggle.init()
+    DoxygenAwesomeParagraphLink.init()
+    DoxygenAwesomeInteractiveToc.init()
+</script>
+$treeview
+$search
+$mathjax
+<link href="$relpath^$stylesheet" rel="stylesheet" type="text/css" />
+$extrastylesheet
+</head>
+<body>
+
+<!-- https://tholman.com/github-corners/ -->
+<a href="https://github.com/jothepro/doxygen-awesome-css" class="github-corner" title="View source on GitHub" target="_blank">
+    <path d="M0,0 L115,115 L130,115 L142,142 L250,250 L250,0 Z"></path><path d="M128.3,109.0 C113.8,99.7 119.0,89.6 119.0,89.6 C122.0,82.7 120.5,78.6 120.5,78.6 C119.2,72.0 123.4,76.3 123.4,76.3 C127.3,80.9 125.5,87.3 125.5,87.3 C122.9,97.6 130.6,101.9 134.4,103.2" fill="currentColor" style="transform-origin: 130px 106px;" class="octo-arm"></path><path d="M115.0,115.0 C114.9,115.1 118.7,116.5 119.8,115.4 L133.7,101.6 C136.9,99.2 139.9,98.4 142.2,98.6 C133.8,88.0 127.5,74.4 143.8,58.0 C148.5,53.4 154.0,51.2 159.7,51.0 C160.3,49.4 163.2,43.6 171.4,40.1 C171.4,40.1 176.1,42.5 178.8,56.2 C183.1,58.6 187.2,61.8 190.9,65.4 C194.5,69.0 197.7,73.2 200.1,77.6 C213.8,80.2 216.3,84.9 216.3,84.9 C212.7,93.1 206.9,96.0 205.4,96.6 C205.1,102.4 203.0,107.8 198.3,112.5 C181.9,128.9 168.3,122.5 157.7,114.1 C157.9,116.9 156.7,120.9 152.7,124.9 L141.0,136.5 C139.8,137.7 141.6,141.9 141.8,141.8 Z" fill="currentColor" class="octo-body"></path></svg></a><style>.github-corner:hover .octo-arm{animation:octocat-wave 560ms ease-in-out}@keyframes octocat-wave{0%,100%{transform:rotate(0)}20%,60%{transform:rotate(-25deg)}40%,80%{transform:rotate(10deg)}}@media (max-width:500px){.github-corner:hover .octo-arm{animation:none}.github-corner .octo-arm{animation:octocat-wave 560ms ease-in-out}}</style>
+
+
+<div id="top"><!-- do not remove this div, it is closed by doxygen! -->
+
+<!--BEGIN TITLEAREA-->
+<div id="titlearea">
+<table cellspacing="0" cellpadding="0">
+ <tbody>
+ <tr style="height: 56px;">
+  <!--BEGIN PROJECT_LOGO-->
+  <td id="projectlogo"><img alt="Logo" src="$relpath^$projectlogo"/></td>
+  <!--END PROJECT_LOGO-->
+  <!--BEGIN PROJECT_NAME-->
+  <td id="projectalign" style="padding-left: 0.5em;">
+   <div id="projectname">$projectname
+   <!--BEGIN PROJECT_NUMBER-->&#160;<span id="projectnumber">$projectnumber</span><!--END PROJECT_NUMBER-->
+   </div>
+   <!--BEGIN PROJECT_BRIEF--><div id="projectbrief">$projectbrief</div><!--END PROJECT_BRIEF-->
+  </td>
+  <!--END PROJECT_NAME-->
+  <!--BEGIN !PROJECT_NAME-->
+   <!--BEGIN PROJECT_BRIEF-->
+    <td style="padding-left: 0.5em;">
+    <div id="projectbrief">$projectbrief</div>
+    </td>
+   <!--END PROJECT_BRIEF-->
+  <!--END !PROJECT_NAME-->
+  <!--BEGIN DISABLE_INDEX-->
+   <!--BEGIN SEARCHENGINE-->
+   <td>$searchbox</td>
+   <!--END SEARCHENGINE-->
+  <!--END DISABLE_INDEX-->
+ </tr>
+ </tbody>
+</table>
+</div>
+<!--END TITLEAREA-->
+<!-- end header part -->
--- a/frame/compat/cblas/src/cblas.h
+++ b/frame/compat/cblas/src/cblas.h
@@ -490,12 +490,57 @@ void BLIS_EXPORT_BLAS cblas_strsm(enum CBLAS_ORDER Order, enum CBLAS_SIDE Side,
                 enum CBLAS_DIAG Diag, f77_int M, f77_int N,
                 float alpha, const float *A, f77_int lda,
                 float *B, f77_int ldb);
+/** \addtogroup APIS BLIS Extension API
+ *  @{
+ */
+
+/** \addtogroup INTERFACE CBLAS INTERFACE
+ * \ingroup BLIS Extension API
+ *  @{
+ */
+
+
+/**
+* sgemmt computes scalar-matrix-matrix product with general matrices. It adds the result to the upper or lower part of scalar-matrix product.
+* It accesses and updates a triangular part of the square result matrix.
+* The operation is defined as
+* C := alpha*Mat(A) * Mat(B) + beta*C,
+* where:
+* Mat(X) is one of Mat(X) = X, or Mat(X) = \f$X^T\f$, or Mat(X) = \f$X^H\f$,
+* alpha and beta are scalars,
+* A, B and C are matrices:
+* Mat(A) is an nxk matrix,
+* Mat(B) is a kxn matrix,
+* C is an nxn upper or lower triangular matrix.
+*
+* @param[in] Order Storage scheme of matrices. CblasRowMajor or CblasColMajor
+* @param[in] Uplo Specifies whether the upper or lower triangular part of the array c is used. CblasUpper or CblasLower
+* @param[in] TransA Specifies the form of Mat(A) used in the matrix multiplication:
+* if transa = CblasNoTrans, then Mat(A) = A;
+* if transa = CblasTrans, then Mat(A) =\f$A^T\f$;
+* if transa = CblasConjTrans, then Mat(A) = \f$A^H\f$.
+* @param[in] TransB Specifies the form of Mat(B) used in the matrix multiplication:
+* if transb = CblasNoTrans, then Mat(B) = B;
+* if transb = CblasTrans, then Mat(B) = \f$B^T\f$;
+* if transb = CblasConjTrans, then Mat(B) = \f$B^H\f$.
+* @param[in] N Specifies the order of the matrix C.
+* @param[in] K Specifies the number of columns of the matrix Mat(A) and the number of rows of the matrix Mat(B).
+* @param[in] alpha Specifies the scalar alpha.
+* @param[in] A  The array is float matrix A.
+* @param[in] lda Specifies the leading dimension of a
+* @param[in] B The array is float matrix B.
+* @param[in] ldb Specifies the leading dimension of b
+* @param[in] beta Specifies the scalar beta.
+* @param[in,out] C The array is float matrix C.
+* @param[in] ldc Specifies the leading dimension of c
+* @return None
+*/
 void BLIS_EXPORT_BLAS cblas_sgemmt(enum CBLAS_ORDER Order, enum CBLAS_UPLO Uplo,
         enum CBLAS_TRANSPOSE TransA, enum CBLAS_TRANSPOSE TransB,
         f77_int N, f77_int K, float alpha, const float *A,
                 f77_int lda, const float *B, f77_int ldb,
                 float beta, float *C, f77_int ldc);
-
+/** @}*/
 void BLIS_EXPORT_BLAS cblas_dgemm(enum CBLAS_ORDER Order, enum CBLAS_TRANSPOSE TransA,
                 enum CBLAS_TRANSPOSE TransB, f77_int M, f77_int N,
                 f77_int K, double alpha, const double *A,
@@ -525,12 +570,51 @@ void BLIS_EXPORT_BLAS cblas_dtrsm(enum CBLAS_ORDER Order, enum CBLAS_SIDE Side,
                 enum CBLAS_DIAG Diag, f77_int M, f77_int N,
                 double alpha, const double *A, f77_int lda,
                 double *B, f77_int ldb);
+/** \addtogroup INTERFACE CBLAS INTERFACE
+ *  @{
+ */
+
+/**
+* dgemmt computes scalar-matrix-matrix product with general matrices. It adds the result to the upper or lower part of scalar-matrix product.
+* It accesses and updates a triangular part of the square result matrix.
+* The operation is defined as
+* C := alpha*Mat(A) * Mat(B) + beta*C,
+* where:
+* Mat(X) is one of Mat(X) = X, or Mat(X) = \f$X^T\f$, or Mat(X) = \f$X^H\f$,
+* alpha and beta are scalars,
+* A, B and C are matrices:
+* Mat(A) is an nxk matrix,
+* Mat(B) is a kxn matrix,
+* C is an nxn upper or lower triangular matrix.
+*
+* @param[in] Order Storage scheme of matrices. CblasRowMajor or CblasColMajor
+* @param[in] Uplo Specifies whether the upper or lower triangular part of the array c is used. CblasUpper or CblasLower
+* @param[in] TransA Specifies the form of Mat(A) used in the matrix multiplication:
+* if transa = CblasNoTrans, then Mat(A) = A;
+* if transa = CblasTrans, then Mat(A) =\f$A^T\f$;
+* if transa = CblasConjTrans, then Mat(A) = \f$A^H\f$.
+* @param[in] TransB Specifies the form of Mat(B) used in the matrix multiplication:
+* if transb = CblasNoTrans, then Mat(B) = B;
+* if transb = CblasTrans, then Mat(B) = \f$B^T\f$;
+* if transb = CblasConjTrans, then Mat(B) = \f$B^H\f$.
+* @param[in] N Specifies the order of the matrix C.
+* @param[in] K Specifies the number of columns of the matrix Mat(A) and the number of rows of the matrix Mat(B).
+* @param[in] alpha Specifies the scalar alpha.
+* @param[in] A  The array is float matrix A.
+* @param[in] lda Specifies the leading dimension of a
+* @param[in] B The array is float matrix B.
+* @param[in] ldb Specifies the leading dimension of b
+* @param[in] beta Specifies the scalar beta.
+* @param[in,out] C The array is float matrix C.
+* @param[in] ldc Specifies the leading dimension of c
+* @return None
+*/
 void BLIS_EXPORT_BLAS cblas_dgemmt(enum CBLAS_ORDER Order, enum CBLAS_UPLO Uplo,
         enum CBLAS_TRANSPOSE TransA, enum CBLAS_TRANSPOSE TransB,
         f77_int N, f77_int K, double alpha, const double *A,
                 f77_int lda, const double *B, f77_int ldb,
                 double beta, double *C, f77_int ldc);
-
+/** @}*/
 void BLIS_EXPORT_BLAS cblas_cgemm(enum CBLAS_ORDER Order, enum CBLAS_TRANSPOSE TransA,
                 enum CBLAS_TRANSPOSE TransB, f77_int M, f77_int N,
                 f77_int K, const void *alpha, const void *A,
@@ -560,12 +644,51 @@ void BLIS_EXPORT_BLAS cblas_ctrsm(enum CBLAS_ORDER Order, enum CBLAS_SIDE Side,
                 enum CBLAS_DIAG Diag, f77_int M, f77_int N,
                 const void *alpha, const void *A, f77_int lda,
                 void *B, f77_int ldb);
+/** \addtogroup INTERFACE CBLAS INTERFACE
+ *  @{
+ */
+
+/**
+* cgemmt computes scalar-matrix-matrix product with general matrices. It adds the result to the upper or lower part of scalar-matrix product.
+* It accesses and updates a triangular part of the square result matrix.
+* The operation is defined as
+* C := alpha*Mat(A) * Mat(B) + beta*C,
+* where:
+* Mat(X) is one of Mat(X) = X, or Mat(X) = \f$X^T\f$, or Mat(X) = \f$X^H\f$,
+* alpha and beta are scalars,
+* A, B and C are matrices:
+* Mat(A) is an nxk matrix,
+* Mat(B) is a kxn matrix,
+* C is an nxn upper or lower triangular matrix.
+*
+* @param[in] Order Storage scheme of matrices. CblasRowMajor or CblasColMajor
+* @param[in] Uplo Specifies whether the upper or lower triangular part of the array c is used. CblasUpper or CblasLower
+* @param[in] TransA Specifies the form of Mat(A) used in the matrix multiplication:
+* if transa = CblasNoTrans, then Mat(A) = A;
+* if transa = CblasTrans, then Mat(A) =\f$A^T\f$;
+* if transa = CblasConjTrans, then Mat(A) = \f$A^H\f$.
+* @param[in] TransB Specifies the form of Mat(B) used in the matrix multiplication:
+* if transb = CblasNoTrans, then Mat(B) = B;
+* if transb = CblasTrans, then Mat(B) = \f$B^T\f$;
+* if transb = CblasConjTrans, then Mat(B) = \f$B^H\f$.
+* @param[in] N Specifies the order of the matrix C.
+* @param[in] K Specifies the number of columns of the matrix Mat(A) and the number of rows of the matrix Mat(B).
+* @param[in] alpha Specifies the scalar alpha.
+* @param[in] A  The array is float matrix A.
+* @param[in] lda Specifies the leading dimension of a
+* @param[in] B The array is float matrix B.
+* @param[in] ldb Specifies the leading dimension of b
+* @param[in] beta Specifies the scalar beta.
+* @param[in,out] C The array is float matrix C.
+* @param[in] ldc Specifies the leading dimension of c
+* @return None
+*/
 void BLIS_EXPORT_BLAS cblas_cgemmt(enum CBLAS_ORDER Order, enum CBLAS_UPLO Uplo,
         enum CBLAS_TRANSPOSE TransA, enum CBLAS_TRANSPOSE TransB,
         f77_int N, f77_int K, const void *alpha, const void *A,
                 f77_int lda, const void *B, f77_int ldb,
                 const void *beta, void *C, f77_int ldc);
-
+/** @}*/
 void BLIS_EXPORT_BLAS cblas_zgemm(enum CBLAS_ORDER Order, enum CBLAS_TRANSPOSE TransA,
                 enum CBLAS_TRANSPOSE TransB, f77_int M, f77_int N,
                 f77_int K, const void *alpha, const void *A,
@@ -595,12 +718,51 @@ void BLIS_EXPORT_BLAS cblas_ztrsm(enum CBLAS_ORDER Order, enum CBLAS_SIDE Side,
                 enum CBLAS_DIAG Diag, f77_int M, f77_int N,
                 const void *alpha, const void *A, f77_int lda,
                 void *B, f77_int ldb);
+/** \addtogroup INTERFACE CBLAS INTERFACE
+ *  @{
+ */
+
+/**
+* zgemmt computes scalar-matrix-matrix product with general matrices. It adds the result to the upper or lower part of scalar-matrix product.
+* It accesses and updates a triangular part of the square result matrix.
+* The operation is defined as
+* C := alpha*Mat(A) * Mat(B) + beta*C,
+* where:
+* Mat(X) is one of Mat(X) = X, or Mat(X) = \f$X^T\f$, or Mat(X) = \f$X^H\f$,
+* alpha and beta are scalars,
+* A, B and C are matrices:
+* Mat(A) is an nxk matrix,
+* Mat(B) is a kxn matrix,
+* C is an nxn upper or lower triangular matrix.
+*
+* @param[in] Order Storage scheme of matrices. CblasRowMajor or CblasColMajor
+* @param[in] Uplo Specifies whether the upper or lower triangular part of the array c is used. CblasUpper or CblasLower
+* @param[in] TransA Specifies the form of Mat(A) used in the matrix multiplication:
+* if transa = CblasNoTrans, then Mat(A) = A;
+* if transa = CblasTrans, then Mat(A) =\f$A^T\f$;
+* if transa = CblasConjTrans, then Mat(A) = \f$A^H\f$.
+* @param[in] TransB Specifies the form of Mat(B) used in the matrix multiplication:
+* if transb = CblasNoTrans, then Mat(B) = B;
+* if transb = CblasTrans, then Mat(B) = \f$B^T\f$;
+* if transb = CblasConjTrans, then Mat(B) = \f$B^H\f$.
+* @param[in] N Specifies the order of the matrix C.
+* @param[in] K Specifies the number of columns of the matrix Mat(A) and the number of rows of the matrix Mat(B).
+* @param[in] alpha Specifies the scalar alpha.
+* @param[in] A  The array is float matrix A.
+* @param[in] lda Specifies the leading dimension of a
+* @param[in] B The array is float matrix B.
+* @param[in] ldb Specifies the leading dimension of b
+* @param[in] beta Specifies the scalar beta.
+* @param[in,out] C The array is float matrix C.
+* @param[in] ldc Specifies the leading dimension of c
+* @return None
+*/
 void BLIS_EXPORT_BLAS cblas_zgemmt(enum CBLAS_ORDER Order, enum CBLAS_UPLO Uplo,
         enum CBLAS_TRANSPOSE TransA, enum CBLAS_TRANSPOSE TransB,
         f77_int N, f77_int K, const void *alpha, const void *A,
                 f77_int lda, const void *B, f77_int ldb,
                 const void *beta, void *C, f77_int ldc);
-
+/** @}*/

 /*
 * Routines with prefixes C and Z only
@@ -654,6 +816,40 @@ BLIS_EXPORT_BLAS double  cblas_dcabs1( const void *z);
 */

 // -- Batch APIs -------
+/** \addtogroup INTERFACE CBLAS INTERFACE
+ *  @{
+ */
+
+/**
+ * cblas_sgemm_batch interface resembles the GEMM interface.
+ * Arguments are arrays of pointers to matrices and parameters.
+ * It batches multiple independent small GEMM operations of fixed or variable sizes into a group
+ * and then spawn multiple threads for different GEMM instances within the group.
+ *
+ * @param[in] Order Storage scheme of matrices. CblasRowMajor or CblasColMajor
+ * @param[in] TransA_array Array of pointers, dimension (group_count), specifies the form of Mat( A ) to be used in the matrix multiplication as follows:
+ *                     Mat( A ) = A
+ *                     Mat( A ) = \f$A^T\f$
+ *                     Mat( A ) = \f$A^H\f$
+ * @param[in] TransB_array Array of pointers, dimension (group_count), specifies the form of Mat( B ) to be used in the matrix multiplication as follows:
+ *                     Mat( B ) = B
+ *                     Mat( B ) = \f$B^T\f$
+ *                     Mat( B ) = \f$B^H\f$
+ * @param[in] M_array Array of pointers, dimension (group_count), each is a number of rows of matrices A and of matrices C.
+ * @param[in] N_array Array of pointers, dimension (group_count), each is a number of columns of matrices B and of matrices C.
+ * @param[in] K_array Array of pointers, dimension (group_count), each is a number of columns of matrices A and number of rows of matrices B.
+ * @param[in] alpha_array Array of pointers, dimension (group_count) each is a scalar alpha for each GEMM.
+ * @param[in] A Array of pointers, dimension (group_count), Each is a matrix A of float datatype.
+ * @param[in] lda_array Array of pointers, dimension (group_count), each f77_int lda_array specifies the first dimension of matrix A.
+ * @param[in] B Array of pointers, dimension (group_count), Each is a matrix B of float datatype.
+ * @param[in] ldb_array Array of pointers, dimension (group_count), each f77_int ldb_array specifies the first dimension of matrix B.
+ * @param[in] beta_array Array of pointers, dimension (group_count) each is a scalar beta for each GEMM.
+ * @param[in,out] C Array of pointers, dimension (group_count), Each is a matrix C of float datatype.
+ * @param[in] ldc_array Array of pointers, dimension (group_count), each f77_int ldc_array specifies the first dimension of matrix C.
+ * @param[in] group_count group_count specifies total number of groups. Usually it is used for having batch of variable size GEMM. Where each group batches GEMMs of some fixed size.
+ * @param[in] group_size Array of pointer, each is number of GEMM to be performed per group(batch).
+ * @return None
+ */
 void BLIS_EXPORT_BLAS cblas_sgemm_batch(enum CBLAS_ORDER Order,
                 enum CBLAS_TRANSPOSE *TransA_array,
                 enum CBLAS_TRANSPOSE *TransB_array,
@@ -662,6 +858,37 @@ void BLIS_EXPORT_BLAS cblas_sgemm_batch(enum CBLAS_ORDER Order,
                 f77_int *lda_array, const float **B, f77_int *ldb_array,
                 const float *beta_array, float **C, f77_int *ldc_array,
                 f77_int group_count, f77_int *group_size);
+
+/**
+ * cblas_dgemm_batch interface resembles the GEMM interface.
+ * Arguments are arrays of pointers to matrices and parameters.
+ * It batches multiple independent small GEMM operations of fixed or variable sizes into a group
+ * and then spawn multiple threads for different GEMM instances within the group.
+ *
+ * @param[in] Order Storage scheme of matrices. CblasRowMajor or CblasColMajor
+ * @param[in] TransA_array Array of pointers, dimension (group_count), specifies the form of Mat( A ) to be used in the matrix multiplication as follows:
+ *                     Mat( A ) = A
+ *                     Mat( A ) = \f$A^T\f$
+ *                     Mat( A ) = \f$A^H\f$
+ * @param[in] TransB_array Array of pointers, dimension (group_count), specifies the form of Mat( B ) to be used in the matrix multiplication as follows:
+ *                     Mat( B ) = B
+ *                     Mat( B ) = \f$B^T\f$
+ *                     Mat( B ) = \f$B^H\f$
+ * @param[in] M_array Array of pointers, dimension (group_count), each is a number of rows of matrices A and of matrices C.
+ * @param[in] N_array Array of pointers, dimension (group_count), each is a number of columns of matrices B and of matrices C.
+ * @param[in] K_array Array of pointers, dimension (group_count), each is a number of columns of matrices A and number of rows of matrices B.
+ * @param[in] alpha_array Array of pointers, dimension (group_count) each is a scalar alpha for each GEMM.
+ * @param[in] A Array of pointers, dimension (group_count), Each is a matrix A of double datatype.
+ * @param[in] lda_array Array of pointers, dimension (group_count), each f77_int lda_array specifies the first dimension of matrix A.
+ * @param[in] B Array of pointers, dimension (group_count), Each is a matrix B of double datatype.
+ * @param[in] ldb_array Array of pointers, dimension (group_count), each f77_int ldb_array specifies the first dimension of matrix B.
+ * @param[in] beta_array Array of pointers, dimension (group_count) each is a scalar beta for each GEMM.
+ * @param[in,out] C Array of pointers, dimension (group_count), Each is a matrix C of double datatype.
+ * @param[in] ldc_array Array of pointers, dimension (group_count), each f77_int ldc_array specifies the first dimension of matrix C.
+ * @param[in] group_count group_count specifies total number of groups. Usually it is used for having batch of variable size GEMM. Where each group batches GEMMs of some fixed size.
+ * @param[in] group_size Array of pointer, each is number of GEMM to be performed per group(batch).
+ * @return None
+ */
 void BLIS_EXPORT_BLAS cblas_dgemm_batch(enum CBLAS_ORDER Order,
                 enum CBLAS_TRANSPOSE *TransA_array,
                 enum CBLAS_TRANSPOSE *TransB_array,
@@ -671,6 +898,38 @@ void BLIS_EXPORT_BLAS cblas_dgemm_batch(enum CBLAS_ORDER Order,
                 const double **B, f77_int *ldb_array,
                 const double *beta_array, double **C, f77_int *ldc_array,
                 f77_int group_count, f77_int *group_size);
+
+/**
+ * cblas_cgemm_batch interface resembles the GEMM interface.
+ * Arguments are arrays of pointers to matrices and parameters.
+ * It batches multiple independent small GEMM operations of fixed or variable sizes into a group
+ * and then spawn multiple threads for different GEMM instances within the group.
+ *
+ * @param[in] Order Storage scheme of matrices. CblasRowMajor or CblasColMajor
+ * @param[in] TransA_array Array of pointers, dimension (group_count), specifies the form of Mat( A ) to be used in the matrix multiplication as follows:
+ *                     Mat( A ) = A
+ *                     Mat( A ) = \f$A^T\f$
+ *                     Mat( A ) = \f$A^H\f$
+ * @param[in] TransB_array Array of pointers, dimension (group_count), specifies the form of Mat( B ) to be used in the matrix multiplication as follows:
+ *                     Mat( B ) = B
+ *                     Mat( B ) = \f$B^T\f$
+ *                     Mat( B ) = \f$B^H\f$
+ * @param[in] M_array Array of pointers, dimension (group_count), each is a number of rows of matrices A and of matrices C.
+ * @param[in] N_array Array of pointers, dimension (group_count), each is a number of columns of matrices B and of matrices C.
+ * @param[in] K_array Array of pointers, dimension (group_count), each is a number of columns of matrices A and number of rows of matrices B.
+ * @param[in] alpha_array Array of pointers, dimension (group_count) each is a scalar alpha for each GEMM.
+ * @param[in] A Array of pointers, dimension (group_count), Each is a matrix A of scomplex datatype.
+ * @param[in] lda_array Array of pointers, dimension (group_count), each f77_int lda_array specifies the first dimension of matrix A.
+ * @param[in] B Array of pointers, dimension (group_count), Each is a matrix B of scomplex datatype.
+ * @param[in] ldb_array Array of pointers, dimension (group_count), each f77_int ldb_array specifies the first dimension of matrix B.
+ * @param[in] beta_array Array of pointers, dimension (group_count) each is a scalar beta for each GEMM.
+ * @param[in,out] C Array of pointers, dimension (group_count), Each is a matrix C of scomplex datatype.
+ * @param[in] ldc_array Array of pointers, dimension (group_count), each f77_int ldc_array specifies the first dimension of matrix C.
+ * @param[in] group_count group_count specifies total number of groups. Usually it is used for having batch of variable size GEMM. Where each group batches GEMMs of some fixed size.
+ * @param[in] group_size Array of pointer, each is number of GEMM to be performed per group(batch).
+ * @return None
+ */
+
 void BLIS_EXPORT_BLAS cblas_cgemm_batch(enum CBLAS_ORDER Order,
                 enum CBLAS_TRANSPOSE *TransA_array,
                 enum CBLAS_TRANSPOSE *TransB_array,
@@ -679,6 +938,37 @@ void BLIS_EXPORT_BLAS cblas_cgemm_batch(enum CBLAS_ORDER Order,
                 f77_int *lda_array, const void **B, f77_int *ldb_array,
                 const void *beta_array, void **C, f77_int *ldc_array,
                 f77_int group_count, f77_int *group_size);
+
+ /**
+ * cblas_zgemm_batch interface resembles the GEMM interface.
+ * Arguments are arrays of pointers to matrices and parameters.
+ * It batches multiple independent small GEMM operations of fixed or variable sizes into a group
+ * and then spawn multiple threads for different GEMM instances within the group.
+ *
+ * @param[in] Order Storage scheme of matrices. CblasRowMajor or CblasColMajor
+ * @param[in] TransA_array Array of pointers, dimension (group_count), specifies the form of Mat( A ) to be used in the matrix multiplication as follows:
+ *                     Mat( A ) = A
+ *                     Mat( A ) = \f$A^T\f$
+ *                     Mat( A ) = \f$A^H\f$
+ * @param[in] TransB_array Array of pointers, dimension (group_count), specifies the form of Mat( B ) to be used in the matrix multiplication as follows:
+ *                     Mat( B ) = B
+ *                     Mat( B ) = \f$B^T\f$
+ *                     Mat( B ) = \f$B^H\f$
+ * @param[in] M_array Array of pointers, dimension (group_count), each is a number of rows of matrices A and of matrices C.
+ * @param[in] N_array Array of pointers, dimension (group_count), each is a number of columns of matrices B and of matrices C.
+ * @param[in] K_array Array of pointers, dimension (group_count), each is a number of columns of matrices A and number of rows of matrices B.
+ * @param[in] alpha_array Array of pointers, dimension (group_count) each is a scalar alpha for each GEMM.
+ * @param[in] A Array of pointers, dimension (group_count), Each is a matrix A of dcomplex datatype.
+ * @param[in] lda_array Array of pointers, dimension (group_count), each f77_int lda_array specifies the first dimension of matrix A.
+ * @param[in] B Array of pointers, dimension (group_count), Each is a matrix B of dcomplex datatype.
+ * @param[in] ldb_array Array of pointers, dimension (group_count), each f77_int ldb_array specifies the first dimension of matrix B.
+ * @param[in] beta_array Array of pointers, dimension (group_count) each is a scalar beta for each GEMM.
+ * @param[in,out] C Array of pointers, dimension (group_count), Each is a matrix C of dcomplex datatype.
+ * @param[in] ldc_array Array of pointers, dimension (group_count), each f77_int ldc_array specifies the first dimension of matrix C.
+ * @param[in] group_count group_count specifies total number of groups. Usually it is used for having batch of variable size GEMM. Where each group batches GEMMs of some fixed size.
+ * @param[in] group_size Array of pointer, each is number of GEMM to be performed per group(batch).
+ * @return None
+ */
 void BLIS_EXPORT_BLAS cblas_zgemm_batch(enum CBLAS_ORDER Order,
                 enum CBLAS_TRANSPOSE *TransA_array,
                 enum CBLAS_TRANSPOSE *TransB_array,
@@ -687,6 +977,7 @@ void BLIS_EXPORT_BLAS cblas_zgemm_batch(enum CBLAS_ORDER Order,
                 f77_int *lda_array, const void **B, f77_int *ldb_array,
                 const void *beta_array, void **C, f77_int *ldc_array,
                 f77_int group_count, f77_int *group_size);
+/** @}*/
 void BLIS_EXPORT_BLAS cblas_cgemm3m(enum CBLAS_ORDER Order, enum CBLAS_TRANSPOSE TransA,
                 enum CBLAS_TRANSPOSE TransB, f77_int M, f77_int N,
                 f77_int K, const void *alpha, const void *A,