Updated KernelsHowTo.md, BLISTypedAPI.md.

Details; - Added missing (basic) information in KernelsHowTo.md for level-1f and level-1v kernels. - Updated section regarding contexts.
2026-04-19 23:28:52 +00:00 · 2018-07-09 17:55:54 -05:00
parent f8913c2bf9
commit c40d30a6c9
2 changed files with 497 additions and 19 deletions
--- a/docs/BLISTypedAPI.md
+++ b/docs/BLISTypedAPI.md
@@ -87,7 +87,6 @@ The following tables list various types used throughout the BLIS API.
 ### Basic vs expert interfaces

 The functions listed in this document belong to the basic interface subset of the BLIS typed API. There is a companion "expert" interface that mirrors the basic interface, except that it also contains at least one additional parameter that is only of interest to experts and library developers. The expert interfaces use the same name as the basic function names, except for an additional "_ex" suffix. For example, the basic interface for `gemm` is
-#### gemm
 ```c
 void bli_?gemm
     (
@@ -104,7 +103,6 @@ void bli_?gemm
     );
 ```
 while the expert interface is:
-#### gemm
 ```c
 void bli_?gemm_ex
     (
@@ -123,7 +121,17 @@ void bli_?gemm_ex
 ```
 The expert interface contains an additional `cntx_t*` parameter. Note that calling a function from the expert interface with the `cntx_t*` argument set to `NULL` is equivalent to calling the corresponding basic interface.

-The `cntx_t*` argument appears in the interfaces to the `gemm`, `trsm`, and `gemmtrsm` micro-kernels. However, assembly implementations for micro-kernels generally do not make use of the context structure, and therefore `NULL` may be passed in by casual developers.
+### Contexts
+
+In general, it is permissible to pass in `NULL` for a `cntx_t*` parameter when calling an expert interface such as `bli_gemm_ex()`. However, there are cases where `NULL` values are not accepted and may result in a segmentation fault. Specifically, the `cntx_t*` argument appears in the interfaces to the `gemm`, `trsm`, and `gemmtrsm` [level-3 micro-kernels](KernelsHowTo.md#level-3) along with all [level-1v](KernelsHowTo.md#level-1v) and [level-1f](KernelsHowTo.md#level-1f) kernels. There, as a general rule, a valid pointer must be passed in. Whenever a valid context is needed, the developer may query a default context from the global kernel structure (if a context is not already available in the current scope):
+```c
+cntx_t* bli_gks_query_cntx( void );
+```
+When BLIS is configured to target a configuration family (e.g. `intel64`, `x86_64`),
+`bli_gks_query_cntx()` will use `cpuid` or an equivalent heuristic to select and
+and return the appropriate context. When BLIS is configured to target a singleton
+sub-configuration (e.g. `haswell`, `skx`), `bli_gks_query_cntx()` will unconditionally
+return a pointer to the context appropriate for the targeted configuration.


 ## BLIS header file
--- a/docs/KernelsHowTo.md
+++ b/docs/KernelsHowTo.md
@@ -46,7 +46,7 @@ The following shows the steps one would take to optimize, to varying degrees, th

 ### Level-1f

-BLIS supports the following five level-1f (fused) kernels. These kernels are used to implement optimized level-2 operations.
+BLIS supports the following five level-1f (fused) kernels. These kernels are used to implement optimized level-2 operations (as well as self-similar level-1f operations; that is, the `axpyf` kernel can be invoked indirectly via the `axpyf` operation).
  * **axpy2v**: Performs and fuses two [axpyv](BLISTypedAPI.md#axpyv) operations, accumulating to the same output vector.
  * **dotaxpyv**: Performs and fuses a [dotv](BLISTypedAPI.md#dotv) followed by an [axpyv](BLISTypedAPI.md#axpyv) operation with x.
  * **axpyf**: Performs and fuses some implementation-dependent number of [axpyv](BLISTypedAPI.md#axpyv) operations, accumulating to the same output vector. Can also be expressed as a [gemv](BLISTypedAPI.md#gemv) operation where matrix A is _m x nf_, where nf is the number of fused operations (fusing factor).
@@ -56,12 +56,21 @@ BLIS supports the following five level-1f (fused) kernels. These kernels are use

 ### Level-1v

-BLIS supports kernels for the following level-1 operations. Aside from their self-similar operations (ie: the use of an `axpyv` kernel to implement the `axpyv` operation), these kernels are used only to implement level-2 operations, and only when the developer decides to forgo more optimized approaches that involve level-1f kernels (where applicable).
-  * **axpyv**: Performs a [scale-and-accumulate vector](BLISTypedAPI.md#axpyv) operation.
+BLIS supports the following 14 level-1v kernels. These kernels are used primarily to implement their self-similar operations. However, they are occasionally used to handle special cases of level-1f kernels or in situations where level-2 operations are partially optimized.
+  * **addv**: Performs a [vector addition](BLISTypedAPI.md#addv) operation.
+  * **amaxv**: Performs a [search for the index of the element with the largest absolute value (or complex modulus)](BLISTypedAPI.md#amaxv).
+  * **axpyv**: Performs a [vector scale-and-accumulate](BLISTypedAPI.md#axpyv) operation.
+  * **axpbyv**: Performs an [extended vector scale-and-accumulate](BLISTypedAPI.md#axpbyv) operation similar to axpyv except that the output vector is scaled by a second scalar.
+  * **copyv**: Performs a [vector copy](BLISTypedAPI.md#copyv) operation
  * **dotv**: Performs a [dot product](BLISTypedAPI.md#dotv) where the output scalar is overwritten.
  * **dotxv**: Performs an [extended dot product](BLISTypedAPI.md#dotxv) operation where the dot product is first scaled and then accumulated into a scaled output scalar.
-
-There are other level-1v kernels that may be optimized, such as [addv](BLISTypedAPI.md#addv), [subv](BLISTypedAPI.md#subv), and [scalv](BLISTypedAPI.md#scalv), but their use is less common and therefore of much less importance to most users and developers.
+  * **invertv**: Performs an [element-wise vector inversion](BLISTypedAPI.md#invertv) operation.
+  * **scalv**: Performs an [in-place (destructive) vector scaling](BLISTypedAPI.md#scalv) operation.
+  * **scal2v**: Performs an [out-of-place (non-destructive) vector scaling](BLISTypedAPI.md#scal2v) operation.
+  * **setv**: Performs a [vector broadcast](BLISTypedAPI.md#setv) operation.
+  * **subv**: Performs a [vector subtraction](BLISTypedAPI.md#subv) operation.
+  * **swapv**: Performs a [vector swap](BLISTypedAPI.md#swapv) operation.
+  * **xpbyv**: Performs a [alternate vector scale-and-accumulate](BLISTypedAPI.md#xpbyv) operation.


 ### Level-1v/-1f Dependencies for Level-2 operations
@@ -80,6 +89,95 @@ Kernels marked with a "1" for a given level-2 operation are preferred for optimi

 **Note:** The "effective storage" column reflects the orientation of the matrix operand **after** transposition via the corresponding `trans_t` parameter (if applicable). For example, calling `gemv` with a column-stored matrix `A` and the `transa` parameter equal to `BLIS_TRANSPOSE` would be effectively equivalent to row-wise storage.

+---
+
+## Calling kernels
+
+Note that all kernels, whether they be reference implementations or based on fully optimized assembly code, use names that are architecture- and implementation-specific. (This appears as a `<suffix>` in the [kernel reference](KernelsHowTo.md#blis-kernels-reference) below.) Therefore, the easiest way to call the kernel is by querying a pointer from a valid context.
+
+The first step is to obtain a valid context. Contexts store all of the information
+specific to a particular sub-configuration (usually loosely specific to a
+microarchitecture or group of closely-related microarchitectuers). If a context is
+not already available in your current scope, a default context for the hardware
+for which BLIS was configured (or, in the case of multi-configuration builds, the
+hardware on which BLIS is currently running) may be queried via:
+```c
+cntx_t* bli_gks_query_cntx( void );
+```
+Once this `cntx_t*` pointer is obtained, you may call one of three functions to query any of the computation kernels described in this document:
+```c
+void* bli_cntx_get_l3_nat_ukr_dt
+     (
+       num_t   dt,
+       l3ukr_t ker_id,
+       cntx_t* cntx
+     );
+
+void* bli_cntx_get_l1f_ker_dt
+     (
+       num_t   dt,
+       l1fkr_t ker_id,
+       cntx_t* cntx
+     );
+
+void* bli_cntx_get_l1v_ker_dt
+     (
+       num_t   dt,
+       l1vkr_t ker_id,
+       cntx_t* cntx
+     );
+```
+The `dt` and `ker_id` parameters specify the floating-point datatype and the
+kernel operation you wish to query, respectively.
+Valid values for `dt` are `BLIS_FLOAT`, `BLIS_DOUBLE`, `BLIS_SCOMPLEX`, and
+`BLIS_DCOMPLEX` for single- and double-precision real, and single- and
+double-precision complex, respectively.
+Valid values for `ker_id` are given in the tables below.
+
+Also, note that the return values of `bli_cntx_get_l1v_ker_dt`
+`bli_cntx_get_l1f_ker_dt()`, and `bli_cntx_get_l3_nat_ukr_dt()`,
+will be `void*` and must be typecast to typed function pointers before being called.
+As a convenience, BLIS defines function pointer types appropriate for usage in these
+situations. The function pointer type for each operation is given in the third
+columns of each table, with the `?` taking the place of one of the supported
+datatype characters.
+
+| kernel operation |  l3ukr_t              | function pointer type |
+|:-----------------|:----------------------|:----------------------|
+| gemm             | `BLIS_GEMM`           | `?gemm_ukr_ft`        |
+| trsm_l           | `BLIS_TRSM_L_UKR`     | `?trsm_ukr_ft`        |
+| trsm_u           | `BLIS_TRSM_U_UKR`     | `?trsm_ukr_ft`        |
+| gemmtrsm_l       | `BLIS_GEMMTRSM_L_UKR` | `?gemmtrsm_ukr_ft`    |
+| gemmtrsm_u       | `BLIS_GEMMTRSM_U_UKR` | `?gemmtrsm_ukr_ft`    |
+
+| kernel operation |  l1fkr_t              | function pointer type |
+|:-----------------|:----------------------|:----------------------|
+| axpy2v           | `BLIS_AXPY2V_KER`     | `?axpy2v_ft`          |
+| dotaxpyv         | `BLIS_DOTAXPYV_KER`   | `?dotaxpyv_ft`        |
+| axpyf            | `BLIS_AXPYF_KER`      | `?axpyf_ft`           |
+| dotxf            | `BLIS_DOTXF_KER`      | `?dotxf_ft`           |
+| dotxaxpyf        | `BLIS_DOTXAXPYF_KER`  | `?dotxaxpyf_ft`       |
+
+| kernel operation |  l1vkr_t              | function pointer type |
+|:-----------------|:----------------------|:----------------------|
+| addv             | `BLIS_ADDV_KER`       | `?addv_ft`            |
+| amaxv            | `BLIS_AMAXV_KER`      | `?amaxv_ft`           |
+| axpyv            | `BLIS_AXPYV_KER`      | `?axpyv_ft`           |
+| axpbyv           | `BLIS_AXPBYV_KER`     | `?axpbyv_ft`          |
+| dotaxpyv         | `BLIS_DOTAXPYV_KER`   | `?dotaxpyv_ft`        |
+| copyv            | `BLIS_COPYV_KER`      | `?copyv_ft`           |
+| dotxv            | `BLIS_DOTXV_KER`      | `?dotxv_ft`           |
+| invertv          | `BLIS_INVERTV_KER`    | `?invertv_ft`         |
+| scalv            | `BLIS_SCALV_KER`      | `?scalv_ft`           |
+| scal2v           | `BLIS_SCAL2V_KER`     | `?scal2v_ft`          |
+| setv             | `BLIS_SETV_KER`       | `?setv_ft`            |
+| subv             | `BLIS_SUBV_KER`       | `?subv_ft`            |
+| swapv            | `BLIS_SWAPV_KER`      | `?swapv_ft`           |
+| xpybv            | `BLIS_XPBYV_KER`      | `?xpbyv_ft`           |
+
+The specific information behind a queried function pointer is not typically available.
+However, it is guaranteed that the function pointer will always be valid (usually either an optimized assembly implementation or a reference implementation).
+

 ---

@@ -92,15 +190,26 @@ This section seeks to provide developers with a complete reference for each of t
    * [trsm](KernelsHowTo.md#trsm-micro-kernels)
    * [gemmtrsm](KernelsHowTo.md#gemmtrsm-micro-kernels)
  * [Level-1f kernels](KernelsHowTo.md#level-1f-kernels)
-    * axpy2v
-    * dotaxpyv
-    * axpyf
-    * dotxf
-    * dotxaxpyf
+    * [axpy2v](KernelsHowTo.md#axpy2v-kernel)
+    * [dotaxpyv](KernelsHowTo.md#dotaxpyv-kernel)
+    * [axpyf](KernelsHowTo.md#axpyf-kernel)
+    * [dotxf](KernelsHowTo.md#dotxf-kernel)
+    * [dotxaxpyf](KernelsHowTo.md#dotxaxpyf-kernel)
  * [Level-1v kernels](KernelsHowTo.md#level-1v-kernels)
-    * axpyv
-    * dotv
-    * dotxv
+    * [addv](KernelsHowTo.md#addv-kernel)
+    * [amaxv](KernelsHowTo.md#amaxv-kernel)
+    * [axpyv](KernelsHowTo.md#axpyv-kernel)
+    * [axpbyv](KernelsHowTo.md#axpbyv-kernel)
+    * [copyv](KernelsHowTo.md#copyv-kernel)
+    * [dotv](KernelsHowTo.md#dotv-kernel)
+    * [dotxv](KernelsHowTo.md#dotxv-kernel)
+    * [invertv](KernelsHowTo.md#invertv-kernel)
+    * [scalv](KernelsHowTo.md#scalv-kernel)
+    * [scal2v](KernelsHowTo.md#scal2v-kernel)
+    * [setv](KernelsHowTo.md#setv-kernel)
+    * [subv](KernelsHowTo.md#subv-kernel)
+    * [swapv](KernelsHowTo.md#swapv-kernel)
+    * [xpbyv](KernelsHowTo.md#xpbyv-kernel)

 The function prototypes in this section follow the same guidelines as those listed in the [BLIS typed API reference](BLISTypedAPI.md#Notes_for_using_this_reference). Namely:
  * Any occurrence of `?` should be replaced with `s`, `d`, `c`, or `z` to form an actual function name.
@@ -494,11 +603,372 @@ Note that these implementations are coded in C99 and lack several kinds of optim



-
 ### Level-1f kernels

-_This section has yet to be written._
+#### axpy2v kernel
+```
+void bli_?axpy2v_<suffix>
+     (
+       conj_t           conjx,
+       conj_t           conjy,
+       dim_t            n,
+       ctype*  restrict alphax,
+       ctype*  restrict alphay,
+       ctype*  restrict x, inc_t incx,
+       ctype*  restrict y, inc_t incy,
+       ctype*  restrict z, inc_t incz,
+       cntx_t* restrict cntx
+     )
+```
+This kernel performs the following operation:
+```
+  z := z + alphax * conjx(x) + alphay * conjy(y)
+```
+where `x`, `y`, and `z` are vectors of length _n_ stored with strides `incx`, `incy`, and `incz`, respectively. This kernel is typically implemented as the fusion of two `axpyv` operations on different input vectors `x` and `y` and with different scalars `alphax` and `alpay` to update the same output vector `z`.
+
+#### dotaxpyv
+```
+void bli_?dotaxpyv_<suffix>
+     (
+       conj_t           conjxt,
+       conj_t           conjx,
+       conj_t           conjy,
+       dim_t            n,
+       ctype*  restrict alpha,
+       ctype*  restrict x, inc_t incx,
+       ctype*  restrict y, inc_t incy,
+       ctype*  restrict rho,
+       ctype*  restrict z, inc_t incz,
+       cntx_t* restrict cntx
+     )
+```
+This kernel performs the following operation:
+```
+  rho := conjxt(x)^T * conjy(y)
+  z   := z + alpha * conjx(x)
+```
+where `x`, `y`, and `z` are vectors of length _n_ stored with strides `incx`, `incy`, and `incz`, respectively, and `rho` is a scalar. This kernel is typically implemented as a `dotv` operation fused with an `axpyv` operation.
+
+#### axpyf
+```
+void bli_?axpyf_<suffix>
+     (
+       conj_t           conja,
+       conj_t           conjx,
+       dim_t            m,
+       dim_t            b,
+       ctype*  restrict alpha,
+       ctype*  restrict a, inc_t inca, inc_t lda,
+       ctype*  restrict x, inc_t incx,
+       ctype*  restrict y, inc_t incy,
+       cntx_t* restrict cntx
+     )
+```
+This kernel performs the following operation:
+```
+  y := y + alpha * conja(a) * conjy(x)
+```
+where `a` is an _m_ x _b_ matrix, `x` is a vector of length _b_, and `y` is a vector of length _m_. Vectors `x` and `y` are stored with strides `incx` and `incy`, respectively. Matrix `a` is stored with row stride `inca` and column stride `lda`, though `inca` is most often (in practice) unit. This kernel is typically implemented as a fused series of _b_ `axpyv` operations updating the same vector `y` (with the elements of `x` serving as the scalars and the columns of `a` serving as the vectors to be scaled).
+
+#### dotxf
+```
+void bli_?dotxf_<suffix>
+     (
+       conj_t           conjat,
+       conj_t           conjx,
+       dim_t            m,
+       dim_t            b,
+       ctype*  restrict alpha,
+       ctype*  restrict a, inc_t inca, inc_t lda,
+       ctype*  restrict x, inc_t incx,
+       ctype*  restrict beta,
+       ctype*  restrict y, inc_t incy,
+       cntx_t* restrict cntx
+     )
+```
+This kernel performs the following operation:
+```
+  y := beta * y + alpha * conjat(a)^T conjx(x)
+```
+where `a` is an _m_ x _b_ matrix, where `w` is a vector of length _m_, `y` is a vector of length _b_, and `alpha` is a scalar.
+Vectors `x` and `y` are stored with strides `incx` and `incy`, respectively. Matrix `a` is stored with row stride `inca` and column stride `lda`, though `inca` is most often (in practice) unit.
+This kernel is typically implemented as a series of _b_ `dotxv` operations with the same right-hand operand vector `x` (contracted with the rows of `a^T` and accumulating to the corresponding elements of vector `y`).
+
+#### dotxaxpyf
+```
+void bli_?dotxaxpyf_<suffix>
+     (
+       conj_t           conjat,
+       conj_t           conja,
+       conj_t           conjw,
+       conj_t           conjx,
+       dim_t            m,
+       dim_t            b,
+       ctype*  restrict alpha,
+       ctype*  restrict a, inc_t inca, inc_t lda,
+       ctype*  restrict w, inc_t incw,
+       ctype*  restrict x, inc_t incx,
+       ctype*  restrict beta,
+       ctype*  restrict y, inc_t incy,
+       ctype*  restrict z, inc_t incz,
+       cntx_t* restrict cntx
+     )
+```
+This kernel performs the following operation:
+```
+  y := beta * y + alpha * conjat(a)^T conjw(w)
+  z :=        z + alpha *  conja(a)   conjx(x)
+```
+where `a` is an _m_ x _b_ matrix, `w` and `z` are vectors of length _m_, `x` and `y` are vectors of length _b_, and `alpha` and `beta` are scalars.
+Vectors `w`, `z`, `x` and `y` are stored with strides `incw`, `incz`, `incx`, and `incy`, respectively. Matrix `a` is stored with row stride `inca` and column stride `lda`, though `inca` is most often (in practice) unit.
+This kernel is typically implemented as a series of _b_ `dotxv` operations with the same right-hand operand vector `w` fused with a series of _b_ `axpyv` operations updating the same vector `z`.
+
+

 ### Level-1v kernels

-_This section has yet to be written._
+#### addv
+```
+void bli_?addv_<suffix>
+     (
+       conj_t           conjx,
+       dim_t            n,
+       ctype*  restrict x, inc_t incx,
+       ctype*  restrict y, inc_t incy,
+       cntx_t* restrict cntx
+     )
+```
+This kernel performs the following operation:
+```
+  y := y + conjx(x)
+```
+where `x` and `y` are vectors of length _n_ stored with strides `incx` and `incy`, respectively.
+
+#### amaxv
+```
+void bli_?amaxv_<suffix>
+     (
+       dim_t            n,
+       ctype*  restrict x, inc_t incx,
+       dim_t*  restrict i,
+       cntx_t* restrict cntx
+     )
+```
+Given a vector of length _n_, this kernel returns the zero-based index `i` of the element of vector `x` that contains the largest absolute value (or, in the complex domain, complex modulus).
+If `NaN` is encountered, it is treated as if it were a valid value that was smaller than any other value in the vector.
+If more than one element contains the same maximum value, the index of the latter element is returned via `i`.
+
+#### axpyv
+```
+void bli_?axpyv_<suffix>
+     (
+       conj_t           conjx,
+       dim_t            n,
+       ctype*  restrict alpha,
+       ctype*  restrict x, inc_t incx,
+       ctype*  restrict y, inc_t incy,
+       cntx_t* restrict cntx
+     )
+```
+This kernel performs the following operation:
+```
+  y := y + alpha * conjx(x)
+```
+where `x` and `y` are vectors of length _n_ stored with strides `incx` and `incy`, respectively, and `alpha` is a scalar.
+
+#### axpbyv
+```
+void bli_?axpbyv_<suffix>
+     (
+       conj_t           conjx,
+       dim_t            n,
+       ctype*  restrict alpha,
+       ctype*  restrict x, inc_t incx,
+       ctype*  restrict beta,
+       ctype*  restrict y, inc_t incy,
+       cntx_t* restrict cntx
+     )
+```
+This kernel performs the following operation:
+```
+  y := beta * y + alpha * conjx(x)
+```
+where `x` and `y` are vectors of length _n_ stored with strides `incx` and `incy`, respectively, and `alpha` and `beta` are scalars.
+
+#### copyv
+```
+void bli_?copyv_<suffix>
+     (
+       conj_t           conjx,
+       dim_t            n,
+       ctype*  restrict x, inc_t incx,
+       ctype*  restrict y, inc_t incy,
+       cntx_t* restrict cntx
+     )
+```
+This kernel performs the following operation:
+```
+  y := conjx(x)
+```
+where `x` and `y` are vectors of length _n_ stored with strides `incx` and `incy`, respectively.
+
+#### dotv
+```
+void bli_?dotv_<suffix>
+     (
+       conj_t           conjx,
+       conj_t           conjy,
+       dim_t            n,
+       ctype*  restrict x, inc_t incx,
+       ctype*  restrict y, inc_t incy,
+       ctype*  restrict rho,
+       cntx_t* restrict cntx
+     )
+```
+This kernel performs the following operation:
+```
+  rho := conjxt(x)^T * conjy(y)
+```
+where `x` and `y` are vectors of length _n_ stored with strides `incx` and `incy`, respectively, and `rho` is a scalar.
+
+#### dotxv
+```
+void bli_?dotxv_<suffix>
+     (
+       conj_t           conjx,
+       conj_t           conjy,
+       dim_t            n,
+       ctype*  restrict alpha,
+       ctype*  restrict x, inc_t incx,
+       ctype*  restrict y, inc_t incy,
+       ctype*  restrict beta,
+       ctype*  restrict rho,
+       cntx_t* restrict cntx
+     )
+```
+This kernel performs the following operation:
+```
+  rho := beta * rho + alpha * conjxt(x)^T * conjy(y)
+```
+where `x` and `y` are vectors of length _n_ stored with strides `incx` and `incy`, respectively, and `alpha`, `beta`, and `rho` are scalars.
+
+#### invertv
+```
+void bli_?invertv_<suffix>
+     (
+       dim_t            n,
+       ctype*  restrict x, inc_t incx,
+       cntx_t* restrict cntx
+     )
+```
+This kernel performs the following operation:
+```
+  x := inv(x)
+```
+where inv() denotes element-wise inversion.
+
+#### scalv
+```
+void bli_?scalv_<suffix>
+     (
+       conj_t           conjalpha,
+       dim_t            n,
+       ctype*  restrict alpha,
+       ctype*  restrict x, inc_t incx,
+       cntx_t* restrict cntx
+     )
+```
+This kernel performs the following operation:
+```
+  x := conjalpha(alpha) * x
+```
+where `x` is a vector of length _n_ stored with stride `incx` and `alpha` is a scalar.
+
+#### scal2v
+```
+void bli_?scal2v_<suffix>
+     (
+       conj_t           conjx,
+       dim_t            n,
+       ctype*  restrict alpha,
+       ctype*  restrict x, inc_t incx,
+       ctype*  restrict y, inc_t incy,
+       cntx_t* restrict cntx
+     )
+```
+This kernel performs the following operation:
+```
+  y := alpha * conjx(x)
+```
+where `x` and `y` are vectors of length _n_ stored with strides `incx` and `incy`, respectively, and `alpha` is a scalar.
+
+#### setv
+```
+void bli_?setv_<suffix>
+     (
+       conj_t           conjalpha,
+       dim_t            n,
+       ctype*  restrict alpha,
+       ctype*  restrict x, inc_t incx,
+       cntx_t* restrict cntx
+     )
+```
+This kernel performs the following operation:
+```
+  x := conjalpha(alpha)
+```
+where `x` is a vector of length _n_ stored with stride `incx` and `alpha` is a scalar. Note that here, the `:=` operator represents a broadcast of `conjalpha(alpha)` to every element in `x`.
+
+#### subv
+```
+void bli_?subv_<suffix>
+     (
+       conj_t           conjx,
+       dim_t            n,
+       ctype*  restrict x, inc_t incx,
+       ctype*  restrict y, inc_t incy,
+       cntx_t* restrict cntx
+     )
+```
+This kernel performs the following operation:
+```
+  y := y - conjx(x)
+```
+where `x` and `y` are vectors of length _n_.
+
+#### swapv
+```
+void bli_?swapv_<suffix>
+     (
+       dim_t            n,
+       ctype*  restrict x, inc_t incx,
+       ctype*  restrict y, inc_t incy,
+       cntx_t* restrict cntx
+     )
+```
+This kernel performs the following operation:
+```
+  t := x
+  x := y
+  y := t
+```
+where `x` and `y` are vectors of length _n_ stored with strides `incx` and `incy`, respectively, and `t` represents a temporary vector of length _n_ for illustrative purposes only. (No additional memory is allocated as part of this operation.)
+
+#### xpbyv
+```
+void bli_?xpbyv_<suffix>
+     (
+       conj_t           conjx,
+       dim_t            n,
+       ctype*  restrict x, inc_t incx,
+       ctype*  restrict beta,
+       ctype*  restrict y, inc_t incy,
+       cntx_t* restrict cntx
+     )
+```
+This kernel performs the following operation:
+```
+  y := beta * y + conjx(x)
+```
+where `x` and `y` are vectors of length _n_ stored with strides `incx` and `incy`, respectively, and `beta` is a scalar.
+