Added unified test suite, and many fixes.

Details: - Added a highly configurable, unified test suite. - Removed DUPB configuration constant from bl2_kernel.h and macro-kernel header files. Now, instead, DUPB is computed as (NDUP != 1) within each macro-kernel. This fixes a bug in trmm/trsm whereby bp was indexed into incorrectly when DUPB was set to FALSE but the NDUP was still non-unit. By encoding both pieces of information into one constant in _kernel.h, it seems somewhat less likely others will encounter this bug in the future. - Added level-2 cache blocksizes to _kernel.h for reference configuration, and defined blocksizes in _cntl.c files to these default values. - Changed semantics of her2k and syr2k such that these operations no longer expect the B matrix to already be conjugate-transposed (or just transposed for syr2k). However, these semantics are preserved for the internal mechanics of the implementations, including the internal back-end and all blocked variants. - Inserted checks for real-valued alpha and beta for herk/her2k and herk, respectively. - Relaxed general object structure constraints in _basic_check() for gemv, ger. - Changed her front-end to NOT copy-cast to real projection; instead, this is replaced by selecting either the real part or both parts within the unblocked algorithm implementation, depending on the value of conjh. - Added conjh to all _check routines for her so that the code knows when to verify that alpha has an imaginary component equal to zero (for her, but not syr). - Changed control tree for her to forgo packing. - Added unit diagonal support to fnormm. - Redefined real versions of abval2s macros in terms of fabs(), fabsf(). - Redefined complex versions of sqrt2s macros using the actual "complex square root" formula. - Created new level-0 object-based routines, suffixed with "sc" (for "scalar"). - Defined new level-1v, -1d, and -1m versions of add and sub operations (two-operand add and subtract). - Added new scalar macros: - getris: acquire real and imaginary components. - setris: set real and imaginary components. - addjs: addition with conjugated x. - subjs: subtraction with conjugated x. - Defined new utility operations: - absumv: element-wise sum of absolute values for vector elements. - absumm: element-wise sum of absolute values for matrix elements. - mkherm: convert existing matrix to Hermitian. - mksymm: convert existing matrix to symmetric. - mktrim: convert existing matrix to triangular. - Added various error checking routines. - Added bl2_clock_min_diff(), which is used to more cleanly measure the wall clock time of a code block. - Added general stride support to bl2_obj_alloc_buffer(). - Added bl2_obj_init_scalar(). - Updated parameter mapping in bl2_param_map.c. - Added support for queriable version string. - Fixed a bug in the her2k macro-kernels (which currently are simply implemented in terms of two invocations of herk) whereby beta was being applied to both the first and second rank-k updates, rather than only the first. - Fixed a bug in trmm/trsm whereby transpose and right side cases were not properly implemented due to erroneous assumptions regarding aliasing and root objects. - Fixed a bug in the upper triangular trsm macro-kernel in which the wrong MR x NR block of B was being updated. - Fixed a bug in the inverts macro in the double real case whereby the value was typecast to float before inversion. This affected non-unit cases of dtrsm. - Fixed a bug in the reference kernels for gemmtrsm whereby the minus one constant was being applied incorrectly. - Fixed a bug in the overall treatment of non-unit alpha for trsm. The code now mimics the rank-k strategy of gemm, whereby alpah is applied during the first iteration of variant 3, with BLIS_ONE passed in instead for subsequent iterations. This also required passing alpha into the macro- kernels as well as the fused gemmtrsm micro-kernels. - Fixed a bug in trsm_u_blk_var1 whereby the gemm macro-kernel was being called for blocks strictly above the diagonal. While this sounds good in theory, this cannot be done because gemm_ker_var2 expects row panels of A to be packed from top to bottom, while for trsm_u, A is actually packed from bottom to top due to the reverse (BR->TL) nature of the algorithm. - Fixed a bug in packm_cxk() whereby panel packings with unit panel dimensions were mishandled due to incorrect arguments to the copyv kernel. Also changed the copyv kernel invocation to scal2v so that these edge cases are properly handled when scaling is requested. - Fixed a bug in packv_int() whereby an uninitialized object is passed in instead of the source object. - Fixed a bug whereby level-2 code could allocate memory dynamically via bl2_malloc() and then attempt to free it via bl2_mm_release(). Also fixed a potential future bug whereby a mem_t object that is actually no longer "allocated" from the static pool is mistaken for being allocated due to failure to NULLify the buffer when the block was most recently released. - Fixed a bug in bl2_acquire_mpart_*() whreby the uplo field was mistakenly toggled when the requested subpartition needed to be "reflected" due to it residing in an unstored region.
2026-04-20 07:38:53 +00:00 · 2013-02-11 13:20:44 -06:00
parent be94fb84c0
commit 768fcebaa8
347 changed files with 27234 additions and 958 deletions
--- a/frame/include/level0/bl2_abval2s.h
+++ b/frame/include/level0/bl2_abval2s.h
@@ -44,13 +44,11 @@

 #define bl2_ssabval2s( x, a ) \
 { \
-	bl2_ssabsq2s( x, a ); \
-	bl2_sssqrt2s( a, a ); \
+	a = ( float  )fabsf( ( float  )x ); \
 }
 #define bl2_dsabval2s( x, a ) \
 { \
-	bl2_dsabsq2s( x, a ); \
-	bl2_sssqrt2s( a, a ); \
+	a = ( float  )fabs( ( double )x ); \
 }
 #define bl2_csabval2s( x, a ) \
 { \
@@ -66,13 +64,11 @@

 #define bl2_sdabval2s( x, a ) \
 { \
-	bl2_sdabsq2s( x, a ); \
-	bl2_ddsqrt2s( a, a ); \
+	a = ( double )fabsf( ( float  )x ); \
 }
 #define bl2_ddabval2s( x, a ) \
 { \
-	bl2_ddabsq2s( x, a ); \
-	bl2_ddsqrt2s( a, a ); \
+	a = ( double )fabs( ( double )x ); \
 }
 #define bl2_cdabval2s( x, a ) \
 { \
@@ -88,13 +84,13 @@

 #define bl2_scabval2s( x, a ) \
 { \
-	bl2_scabsq2s( x, a ); \
-	bl2_ccsqrt2s( a, a ); \
+	(a).real = ( float  )fabsf( ( float  )x ); \
+	(a).imag = 0.0F; \
 }
 #define bl2_dcabval2s( x, a ) \
 { \
-	bl2_dcabsq2s( x, a ); \
-	bl2_ccsqrt2s( a, a ); \
+	(a).real = ( float  )fabs( ( double )x ); \
+	(a).imag = 0.0F; \
 }
 #define bl2_ccabval2s( x, a ) \
 { \
@@ -110,13 +106,13 @@

 #define bl2_szabval2s( x, a ) \
 { \
-	bl2_szabsq2s( x, a ); \
-	bl2_zzsqrt2s( a, a ); \
+	(a).real = ( double )fabsf( ( float  )x ); \
+	(a).imag = 0.0F; \
 }
 #define bl2_dzabval2s( x, a ) \
 { \
-	bl2_dzabsq2s( x, a ); \
-	bl2_zzsqrt2s( a, a ); \
+	(a).real = ( double )fabs( ( double )x ); \
+	(a).imag = 0.0F; \
 }
 #define bl2_czabval2s( x, a ) \
 { \
--- a/frame/include/level0/bl2_addjs.h
+++ b/frame/include/level0/bl2_addjs.h
@@ -0,0 +1,139 @@
+/*
+
+   BLIS    
+   An object-based framework for developing high-performance BLAS-like
+   libraries.
+
+   Copyright (C) 2012, The University of Texas
+
+   Redistribution and use in source and binary forms, with or without
+   modification, are permitted provided that the following conditions are
+   met:
+    - Redistributions of source code must retain the above copyright
+      notice, this list of conditions and the following disclaimer.
+    - Redistributions in binary form must reproduce the above copyright
+      notice, this list of conditions and the following disclaimer in the
+      documentation and/or other materials provided with the distribution.
+    - Neither the name of The University of Texas nor the names of its
+      contributors may be used to endorse or promote products derived
+      from this software without specific prior written permission.
+
+   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+   HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+*/
+
+#ifndef BLIS_ADDJS_H
+#define BLIS_ADDJS_H
+
+// addjs
+
+// Notes:
+// - The first char encodes the type of a.
+// - The second char encodes the type of x.
+
+#define bl2_ssaddjs( a, y ) \
+{ \
+	(y)      += ( float  )(a); \
+}
+#define bl2_sdaddjs( a, y ) \
+{ \
+	(y)      += ( double )(a); \
+}
+#define bl2_scaddjs( a, y ) \
+{ \
+	(y).real += ( float  )(a); \
+	/*(y).imag += 0.0F;*/ \
+}
+#define bl2_szaddjs( a, y ) \
+{ \
+	(y).real += ( double )(a); \
+	/*(y).imag += 0.0F;*/ \
+}
+
+#define bl2_dsaddjs( a, y ) \
+{ \
+	(y)      += ( float  )(a); \
+}
+#define bl2_ddaddjs( a, y ) \
+{ \
+	(y)      += ( double )(a); \
+}
+#define bl2_dcaddjs( a, y ) \
+{ \
+	(y).real += ( float  )(a); \
+	/*(y).imag += 0.0F;*/ \
+}
+#define bl2_dzaddjs( a, y ) \
+{ \
+	(y).real += ( double )(a); \
+	/*(y).imag += 0.0F;*/ \
+}
+
+#define bl2_csaddjs( a, y ) \
+{ \
+	(y)      += ( float  )(a).real; \
+}
+#define bl2_cdaddjs( a, y ) \
+{ \
+	(y)      += ( double )(a).real; \
+}
+#define bl2_ccaddjs( a, y ) \
+{ \
+	(y).real += ( float  )(a).real; \
+	(y).imag -= ( float  )(a).imag; \
+}
+#define bl2_czaddjs( a, y ) \
+{ \
+	(y).real += ( double )(a).real; \
+	(y).imag -= ( double )(a).imag; \
+}
+
+#define bl2_zsaddjs( a, y ) \
+{ \
+	(y)      += ( float  )(a).real; \
+}
+#define bl2_zdaddjs( a, y ) \
+{ \
+	(y)      += ( double )(a).real; \
+}
+#define bl2_zcaddjs( a, y ) \
+{ \
+	(y).real += ( float  )(a).real; \
+	(y).imag -= ( float  )(a).imag; \
+}
+#define bl2_zzaddjs( a, y ) \
+{ \
+	(y).real += ( double )(a).real; \
+	(y).imag -= ( double )(a).imag; \
+}
+
+
+#define bl2_saddjs( a, y ) \
+{ \
+	bl2_ssaddjs( a, y ); \
+}
+#define bl2_daddjs( a, y ) \
+{ \
+	bl2_ddaddjs( a, y ); \
+}
+#define bl2_caddjs( a, y ) \
+{ \
+	bl2_ccaddjs( a, y ); \
+}
+#define bl2_zaddjs( a, y ) \
+{ \
+	bl2_zzaddjs( a, y ); \
+}
+
+
+#endif
--- a/frame/include/level0/bl2_getris.h
+++ b/frame/include/level0/bl2_getris.h
@@ -0,0 +1,107 @@
+/*
+
+   BLIS    
+   An object-based framework for developing high-performance BLAS-like
+   libraries.
+
+   Copyright (C) 2012, The University of Texas
+
+   Redistribution and use in source and binary forms, with or without
+   modification, are permitted provided that the following conditions are
+   met:
+    - Redistributions of source code must retain the above copyright
+      notice, this list of conditions and the following disclaimer.
+    - Redistributions in binary form must reproduce the above copyright
+      notice, this list of conditions and the following disclaimer in the
+      documentation and/or other materials provided with the distribution.
+    - Neither the name of The University of Texas nor the names of its
+      contributors may be used to endorse or promote products derived
+      from this software without specific prior written permission.
+
+   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+   HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+*/
+
+#ifndef BLIS_GETRIS_H
+#define BLIS_GETRIS_H
+
+// getris
+
+// Notes:
+// - The first char encodes the type of x.
+// - The second char encodes the type of y.
+
+#define bl2_ssgetris( x, yr, yi ) \
+{ \
+	(yr)     = ( float  ) (x); \
+	(yi)     = 0.0F; \
+}
+#define bl2_dsgetris( x, yr, yi ) \
+{ \
+	(yr)     = ( float  ) (x); \
+	(yi)     = 0.0F; \
+}
+#define bl2_csgetris( x, yr, yi ) \
+{ \
+	(yr)     = ( float  ) (x).real; \
+	(yi)     = ( float  ) (x).imag; \
+}
+#define bl2_zsgetris( x, yr, yi ) \
+{ \
+	(yr)     = ( float  ) (x).real; \
+	(yi)     = ( float  ) (x).imag; \
+}
+
+
+#define bl2_sdgetris( x, yr, yi ) \
+{ \
+	(yr)     = ( double ) (x); \
+	(yi)     = 0.0; \
+}
+#define bl2_ddgetris( x, yr, yi ) \
+{ \
+	(yr)     = ( double ) (x); \
+	(yi)     = 0.0; \
+}
+#define bl2_cdgetris( x, yr, yi ) \
+{ \
+	(yr)     = ( double ) (x).real; \
+	(yi)     = ( double ) (x).imag; \
+}
+#define bl2_zdgetris( x, yr, yi ) \
+{ \
+	(yr)     = ( double ) (x).real; \
+	(yi)     = ( double ) (x).imag; \
+}
+
+
+
+#define bl2_sgetris( x, yr, yi ) \
+{ \
+	bl2_ssgetris( x, yr, yi ); \
+}
+#define bl2_dgetris( x, yr, yi ) \
+{ \
+	bl2_ddgetris( x, yr, yi ); \
+}
+#define bl2_cgetris( x, yr, yi ) \
+{ \
+	bl2_csgetris( x, yr, yi ); \
+}
+#define bl2_zgetris( x, yr, yi ) \
+{ \
+	bl2_zdgetris( x, yr, yi ); \
+}
+
+
+#endif
--- a/frame/include/level0/bl2_inverts.h
+++ b/frame/include/level0/bl2_inverts.h
@@ -47,7 +47,7 @@

 #define bl2_dinverts( x ) \
 { \
-	(x) = 1.0 / ( float  ) (x); \
+	(x) = 1.0  / ( double ) (x); \
 }

 #define bl2_cinverts( x ) \
--- a/frame/include/level0/bl2_setris.h
+++ b/frame/include/level0/bl2_setris.h
@@ -0,0 +1,106 @@
+/*
+
+   BLIS    
+   An object-based framework for developing high-performance BLAS-like
+   libraries.
+
+   Copyright (C) 2012, The University of Texas
+
+   Redistribution and use in source and binary forms, with or without
+   modification, are permitted provided that the following conditions are
+   met:
+    - Redistributions of source code must retain the above copyright
+      notice, this list of conditions and the following disclaimer.
+    - Redistributions in binary form must reproduce the above copyright
+      notice, this list of conditions and the following disclaimer in the
+      documentation and/or other materials provided with the distribution.
+    - Neither the name of The University of Texas nor the names of its
+      contributors may be used to endorse or promote products derived
+      from this software without specific prior written permission.
+
+   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+   HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+*/
+
+#ifndef BLIS_SETRIS_H
+#define BLIS_SETRIS_H
+
+// setris
+
+// Notes:
+// - The first char encodes the type of x.
+// - The second char encodes the type of y.
+
+#define bl2_sssetris( xr, xi, y ) \
+{ \
+	(y)      = ( float  ) (xr); \
+}
+#define bl2_dssetris( xr, xi, y ) \
+{ \
+	(y)      = ( float  ) (xr); \
+}
+
+
+#define bl2_sdsetris( xr, xi, y ) \
+{ \
+	(y)      = ( double ) (xr); \
+}
+#define bl2_ddsetris( xr, xi, y ) \
+{ \
+	(y)      = ( double ) (xr); \
+}
+
+
+#define bl2_scsetris( xr, xi, y ) \
+{ \
+	(y).real = ( float  ) (xr); \
+	(y).imag = ( float  ) (xi); \
+}
+#define bl2_dcsetris( xr, xi, y ) \
+{ \
+	(y).real = ( float  ) (xr); \
+	(y).imag = ( float  ) (xi); \
+}
+
+
+#define bl2_szsetris( xr, xi, y ) \
+{ \
+	(y).real = ( double ) (xr); \
+	(y).imag = ( double ) (xi); \
+}
+#define bl2_dzsetris( xr, xi, y ) \
+{ \
+	(y).real = ( double ) (xr); \
+	(y).imag = ( double ) (xi); \
+}
+
+
+#define bl2_ssetris( xr, xi, y ) \
+{ \
+	bl2_sssetris( xr, xi, y ); \
+}
+#define bl2_dsetris( xr, xi, y ) \
+{ \
+	bl2_ddsetris( xr, xi, y ); \
+}
+#define bl2_csetris( xr, xi, y ) \
+{ \
+	bl2_scsetris( xr, xi, y ); \
+}
+#define bl2_zsetris( xr, xi, y ) \
+{ \
+	bl2_dzsetris( xr, xi, y ); \
+}
+
+
+#endif
--- a/frame/include/level0/bl2_sqrt2s.h
+++ b/frame/include/level0/bl2_sqrt2s.h
@@ -52,11 +52,17 @@
 }
 #define bl2_cssqrt2s( x, a ) \
 { \
-	(a) = ( float  )sqrtf( (x).real ); \
+	float  mag = sqrtf( (x).real * (x).real + \
+	                    (x).imag * (x).imag ); \
+\
+	(a)      = ( float  )sqrt( ( mag + (x).real ) / 2.0F ); \
 }
 #define bl2_zssqrt2s( x, a ) \
 { \
-	(a) = ( float  )sqrt( (x).real ); \
+	double mag = sqrt( (x).real * (x).real + \
+	                   (x).imag * (x).imag ); \
+\
+	(a)      = ( float  )sqrt( ( mag + (x).real ) / 2.0 ); \
 }


@@ -70,11 +76,17 @@
 }
 #define bl2_cdsqrt2s( x, a ) \
 { \
-	(a) = ( double )sqrtf( (x).real ); \
+	float  mag = sqrtf( (x).real * (x).real + \
+	                    (x).imag * (x).imag ); \
+\
+	(a)      = ( double )sqrt( ( mag + (x).real ) / 2.0F ); \
 }
 #define bl2_zdsqrt2s( x, a ) \
 { \
-	(a) = ( double )sqrt( (x).real ); \
+	double mag = sqrt( (x).real * (x).real + \
+	                   (x).imag * (x).imag ); \
+\
+	(a)      = ( double )sqrt( ( mag + (x).real ) / 2.0 ); \
 }


@@ -90,13 +102,19 @@
 }
 #define bl2_ccsqrt2s( x, a ) \
 { \
-	(a).real = ( float  )sqrtf( (x).real ); \
-	(a).imag = 0.0F; \
+	float  mag = sqrtf( (x).real * (x).real + \
+	                    (x).imag * (x).imag ); \
+\
+	(a).real = ( float  )sqrtf( ( mag + (x).real ) / 2.0F ); \
+	(a).imag = ( float  )sqrtf( ( mag - (x).imag ) / 2.0F ); \
 }
 #define bl2_zcsqrt2s( x, a ) \
 { \
-	(a).real = ( float  )sqrt( (x).real ); \
-	(a).imag = 0.0F; \
+	double mag = sqrt( (x).real * (x).real + \
+	                   (x).imag * (x).imag ); \
+\
+	(a).real = ( float  )sqrt( ( mag + (x).real ) / 2.0 ); \
+	(a).imag = ( float  )sqrt( ( mag - (x).imag ) / 2.0 ); \
 }


@@ -112,13 +130,19 @@
 }
 #define bl2_czsqrt2s( x, a ) \
 { \
-	(a).real = ( double )sqrtf( (x).real ); \
-	(a).imag = 0.0F; \
+	float  mag = sqrtf( (x).real * (x).real + \
+	                    (x).imag * (x).imag ); \
+\
+	(a).real = ( double )sqrtf( ( mag + (x).real ) / 2.0F ); \
+	(a).imag = ( double )sqrtf( ( mag - (x).imag ) / 2.0F ); \
 }
 #define bl2_zzsqrt2s( x, a ) \
 { \
-	(a).real = ( double )sqrt( (x).real ); \
-	(a).imag = 0.0F; \
+	double mag = sqrt( (x).real * (x).real + \
+	                   (x).imag * (x).imag ); \
+\
+	(a).real = ( double )sqrt( ( mag + (x).real ) / 2.0 ); \
+	(a).imag = ( double )sqrt( ( mag - (x).imag ) / 2.0 ); \
 }


--- a/frame/include/level0/bl2_subjs.h
+++ b/frame/include/level0/bl2_subjs.h
@@ -0,0 +1,139 @@
+/*
+
+   BLIS    
+   An object-based framework for developing high-performance BLAS-like
+   libraries.
+
+   Copyright (C) 2012, The University of Texas
+
+   Redistribution and use in source and binary forms, with or without
+   modification, are permitted provided that the following conditions are
+   met:
+    - Redistributions of source code must retain the above copyright
+      notice, this list of conditions and the following disclaimer.
+    - Redistributions in binary form must reproduce the above copyright
+      notice, this list of conditions and the following disclaimer in the
+      documentation and/or other materials provided with the distribution.
+    - Neither the name of The University of Texas nor the names of its
+      contributors may be used to endorse or promote products derived
+      from this software without specific prior written permission.
+
+   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+   HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+*/
+
+#ifndef BLIS_SUBJS_H
+#define BLIS_SUBJS_H
+
+// subjs
+
+// Notes:
+// - The first char encodes the type of a.
+// - The second char encodes the type of x.
+
+#define bl2_sssubjs( a, y ) \
+{ \
+	(y)      -= ( float  )(a); \
+}
+#define bl2_sdsubjs( a, y ) \
+{ \
+	(y)      -= ( double )(a); \
+}
+#define bl2_scsubjs( a, y ) \
+{ \
+	(y).real -= ( float  )(a); \
+	/*(y).imag -= 0.0F;*/ \
+}
+#define bl2_szsubjs( a, y ) \
+{ \
+	(y).real -= ( double )(a); \
+	/*(y).imag -= 0.0F;*/ \
+}
+
+#define bl2_dssubjs( a, y ) \
+{ \
+	(y)      -= ( float  )(a); \
+}
+#define bl2_ddsubjs( a, y ) \
+{ \
+	(y)      -= ( double )(a); \
+}
+#define bl2_dcsubjs( a, y ) \
+{ \
+	(y).real -= ( float  )(a); \
+	/*(y).imag -= 0.0F;*/ \
+}
+#define bl2_dzsubjs( a, y ) \
+{ \
+	(y).real -= ( double )(a); \
+	/*(y).imag -= 0.0F;*/ \
+}
+
+#define bl2_cssubjs( a, y ) \
+{ \
+	(y)      -= ( float  )(a).real; \
+}
+#define bl2_cdsubjs( a, y ) \
+{ \
+	(y)      -= ( double )(a).real; \
+}
+#define bl2_ccsubjs( a, y ) \
+{ \
+	(y).real -= ( float  )(a).real; \
+	(y).imag += ( float  )(a).imag; \
+}
+#define bl2_czsubjs( a, y ) \
+{ \
+	(y).real -= ( double )(a).real; \
+	(y).imag += ( double )(a).imag; \
+}
+
+#define bl2_zssubjs( a, y ) \
+{ \
+	(y)      -= ( float  )(a).real; \
+}
+#define bl2_zdsubjs( a, y ) \
+{ \
+	(y)      -= ( double )(a).real; \
+}
+#define bl2_zcsubjs( a, y ) \
+{ \
+	(y).real -= ( float  )(a).real; \
+	(y).imag += ( float  )(a).imag; \
+}
+#define bl2_zzsubjs( a, y ) \
+{ \
+	(y).real -= ( double )(a).real; \
+	(y).imag += ( double )(a).imag; \
+}
+
+
+#define bl2_ssubjs( a, y ) \
+{ \
+	bl2_sssubjs( a, y ); \
+}
+#define bl2_dsubjs( a, y ) \
+{ \
+	bl2_ddsubjs( a, y ); \
+}
+#define bl2_csubjs( a, y ) \
+{ \
+	bl2_ccsubjs( a, y ); \
+}
+#define bl2_zsubjs( a, y ) \
+{ \
+	bl2_zzsubjs( a, y ); \
+}
+
+
+#endif