amd/blis - blis - Public git mirror

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-11 17:50:00 +00:00

Author	SHA1	Message	Date
Field G. Van Zee	d1f8e5d9b2	Merge pull request #60 from esauvage/master sgemm µkernel for bulldozer : bug correction for k%4 != 0	2016-04-05 12:21:27 -05:00
Etienne Sauvage	c11d28eed8	cgemm µkernel for bulldozer : bug correction for k%4 != 0	2016-04-02 21:15:48 +02:00
Field G. Van Zee	36c3abb05f	Merge pull request #58 from esauvage/master cgemm & zgemm micro-kernels for FMA4 instruction set (bulldozer confi…	2016-03-31 10:26:17 -05:00
Etienne Sauvage	917ce75482	cgemm & zgemm micro-kernels for FMA4 instruction set (bulldozer configuration), based on x86_64/avx micro-kernel	2016-03-30 22:03:09 +02:00
Field G. Van Zee	3090fff64c	Merge pull request #44 from esauvage/master sgemm micro-kernel for FMA4 instruction set	2016-03-28 12:36:25 -05:00
figual	af92773f4f	Updated and improved ARMv8 micro-kernels.	2016-03-23 22:07:02 +01:00
Etienne Sauvage	4ca5d5b1fd	sgemm micro-kernel for FMA4 instruction set (bulldozer configuration), based on x86_64/avx micro-kernel	2016-03-01 21:33:01 +01:00
Field G. Van Zee	f0a4f41b5a	Fixed unimplemented case in core2 sgemm ukernel. Details: - Implemented the "beta == 0" case for general stride output for the dunnington sgemm micro-kernel. This case had been, up until now, identical to the "beta != 0" case, which does not work when the output matrix has nan's and inf's. It had manifested as nan residuals in the test suite for right-side tests of ctrsm4m1a. Thanks to Devin Matthews for reporting this bug.	2015-11-12 15:22:50 -06:00
Francisco Igual	a0a7b85ac3	Fixed incomplete code in the double precision ARMv8 microkernel.	2015-10-27 08:59:15 +00:00
Field G. Van Zee	b489152e11	Use vzeroall in haswell micro-kernels.	2015-10-21 14:53:17 -05:00
Field G. Van Zee	fdfe14f1e1	Added support for Intel Haswell/Broadwell. Details: - Added sgemm and dgemm micro-kernels, which employ 256-bit AVX vectors and FMA instructions. (Complex support is currently provided by default induced method, 4m1a.) - Added a 'haswell' configuration, which uses the aforementioned kernels. - Inserted auto-detection support for haswell configuration in build/auto-detect/cpuid_x86.c. - Modified configure script to explicitly echo when automatic or manual configuration is in progress. - Changed beta scalar in test_gemm.c module of test suite to -1.0 to 0.9.	2015-07-09 13:52:39 -05:00
Francisco D. Igual	483e4d6a3f	Adding armv8a configuration and micro-kernels. Only sgemm micro-kernel is fully functional at this point.	2014-12-07 20:27:49 +01:00
Field G. Van Zee	7bbc95a54f	Added new piledriver micro-kernels. Details: - Added new micro-kernels for the AMD piledriver architecture (one for each datatype). - Updates and tweaks to piledriver configuration. - Added 3xk packm micro-kernel support. - Explicitly unrolled some of the smaller packm micro-kernels. - Added notes to avx/sandybridge and piledriver micro-kernel files acknowledging the influence of the corresponding kernel code in OpenBLAS.	2014-10-29 10:52:23 -05:00
Tyler Smith	95cdae65d6	Fixed bug in KNC microkernel where k=0 and beta != 1	2014-10-22 16:30:16 -05:00
Field G. Van Zee	d1e86e1876	More minor tweaks to sandybridge/avx micro-kernel. Details: - Re-enabled use of b_next for dgemm and cgemm micro-kernels.	2014-10-12 13:43:47 -05:00
Field G. Van Zee	7b6fe4cae5	Minor tweaks to sandybridge/avx micro-kernels. Details: - Changed the MC blocksize for zgemm micro-kernel from 128 to 64. - Removed usage of b_next in all x86_64/avx gemm micro-kernels.	2014-10-12 12:01:51 -05:00
Field G. Van Zee	a6a156e9fe	Added cgemm ukernel for avx/sandybridge. Details: - Implemented AVX-based cgemm micro-kernel (via GNU extended inline assembly syntax). - Updated sandybridge configuration accordingly.	2014-10-10 14:26:41 -05:00
Field G. Van Zee	6f8575ab25	Added zgemm ukernel for avx/sandybridge. Details: - Implemented AVX-based zgemm micro-kernel (via GNU extended inline assembly syntax). - Updated sandybridge configuration accordingly.	2014-10-10 10:01:45 -05:00
Tyler Smith	7a8ad47fb2	Minor changes to knc configuration, including preference row major storage Also fixed a bug in the knc micro-kernel where it would fail if k == 0	2014-10-08 15:52:13 -05:00
Field G. Van Zee	e80a453784	Fixed bug introduced by bugfix in `25b258d`. Details: - We actually need to check alignment of lda*sizeof(double) and NOT a+lda because in the latter case, alignment could cancel out and still allow the optimized code to run when it shouldn't. Thanks to Devin for pointing this out.	2014-09-18 10:24:20 -05:00
Field G. Van Zee	25b258d61f	Fixed a non-fatal problem with bugfix in `a68b316c`. Details: - The bugfix in `a68b316c` was inadvertantly checkin alignment of the leading dimension itself, rather than the byte size of the leading dimension. Now, we simply check alignment of a+lda.	2014-09-18 10:10:49 -05:00
Field G. Van Zee	a68b316ca4	Fixed alignment bugs in level-1f kernels. Details: - Fixed bugs whereby the level-1f dotxf, axpyxf, and dotxaxpyf kernels were attempting to compute problems with unaligned leading dimensions with optimized code, rather than (correctly) using the reference implementations. Thanks to Devin Matthews for reporting this bug.	2014-09-17 11:10:07 -05:00
Tyler Smith	86fc7e4076	Added bulldozer configuration and updated piledriver micro-kernel	2014-09-15 10:35:46 -05:00
Field G. Van Zee	bc1d86b2d4	Sandy Bridge configuration, micro-kernel update. Details: - Minor updates to bli_config and bli_kernel.h for sandybridge configuration. - Renamed existing AVX intrinsic-based micro-kernel file to bli_gemm_int_d8x4.c. - Added new file, bli_gemm_asm_d8x4.c, which provides assembly-based gemm micro-kernels for single- and double-precision real.	2014-08-07 19:01:20 -05:00
Field G. Van Zee	45692e3ad4	Reverted some accidental changes. Details: - Reverted some changes that were unintentionally included in the previous commit (`9526ce98`). Thanks to Tony Kelman for pointing this out. (Note: a few select changes were not reverted.)	2014-08-07 13:21:15 -05:00
Field G. Van Zee	9526ce9881	Updated copyright headers of emscripten configuration files.	2014-08-06 14:15:34 -05:00
Field G. Van Zee	c73261f17e	More minor cleanups post-copyright update.	2014-07-14 16:23:51 -05:00
Field G. Van Zee	2a09d24463	Reverted power7 symlinks destroyed by sed script. Details: - Reverted two symlinks, in kernels/power7/3/test, back to being symlinks after recursive-sed.sh mistakenly replaced them with copies of the actual files to which they referred. Meant to include this in previous commit.	2014-07-14 16:17:09 -05:00
Field G. Van Zee	7ed415824d	Updated copyright headers (continued). Details: - Inserted "at Austin" into third clause of license declarations. Meant to include this change in previous commit.	2014-07-14 16:14:33 -05:00
Field G. Van Zee	5c2c6c8561	Updated copyright headers to contain "at Austin". Details: - Updated copyright headers to include "at Austin" in the name of the University of Texas. - Updated the copyright years of a few headers to 2014 (from 2011 and 2012).	2014-07-14 16:05:03 -05:00
Marat Dukhan	b693b0cddc	[SC]AXPY kernels for PNaCl	2014-06-22 13:44:25 -07:00
Marat Dukhan	020a831bc5	Code clean-up in PNaCl port	2014-06-19 00:58:26 -07:00
Marat Dukhan	491be4f91e	Optimized dot product kernels for PNaCl	2014-06-19 00:45:44 -07:00
Marat Dukhan	b2ffb4de8b	Reformatted PNaCl GEMM kernels	2014-06-15 18:41:30 -04:00
Marat Dukhan	6de2d472d9	CGEMM and ZGEMM kernels for PNaCl	2014-06-15 08:44:31 -04:00
Marat Dukhan	f064711a5e	SGEMM and DGEMM kernels for PNaCl	2014-06-15 06:27:37 -04:00
Tyler Smith	00f232f8ed	Added single-precision micro-kernel for Knights Corner aka MIC aka Xeon Phi	2014-06-02 13:40:57 -05:00
Field G. Van Zee	3fc60e4914	Fixed ldim alignment bug in core2 gemm ukernel. Details: - Fixed a bug in the dunnington/core2 gemm micro-kernels that resulted in a segmentation fault if a column-stored matrix's starting address was aligned, but its leading dimension was such that its second column was unaligned. Basically, the micro-kernel was assuming that aligned load instructions were safe when they actually were not. An extra condition that checks the alignment of cs_c (ie: the leading dimension in the column storage case) has now been added. Thanks to Michael Lehn for reporting this bug.	2014-05-21 11:34:42 -05:00
Tyler Michael Smith	20e24430a7	Some fixes for the bgq kernels	2014-04-08 17:50:44 +00:00
Tyler Smith	2b6848b239	Merge http://github.com/flame/blis Conflicts: kernels/bgq/1/bli_axpyv_opt_var1.c kernels/bgq/1/bli_dotv_opt_var1.c	2014-04-04 09:54:54 -05:00
Tyler Michael Smith	4e3eb39aca	Some fixes to the bgq config MR and NR for double complex were wrong Default fusing factor for double precision was wrong as well	2014-04-04 14:50:03 +00:00
Field G. Van Zee	21a0efb33d	Fixed follow-up to issue #6 .	2014-04-03 16:38:44 -05:00
Field G. Van Zee	c318157a9b	Fixed issue #6 (incorrect 'restrict' usage). Details: - Fixed improper usage of restrict keyword in axpyv and dotv bgq kernels. (However, there may be other instances of similar misuse elsewhere in BLIS.) Thanks to Jeff Hammond for reporting this issue.	2014-04-03 16:24:34 -05:00
Field G. Van Zee	b5150a1bf3	Added #include "arm_neon.h" to ARM gemm ukernel. Details: - Inserted #include "arm_neon.h" into gemm ukernel source file for arm/neon. Thanks to Jean-Michel Hautbois for suggesting this fix.	2014-04-03 12:25:45 -05:00
Field G. Van Zee	d27b4f690c	Use generic paths for toolchain in POWER7. Details: - Fixed issue #4. Thanks to Jeff Hammond for contributing changes.	2014-04-01 12:57:24 -05:00
Tyler Michael Smith	73b3db5948	Some fixes for the bgq configuration	2014-03-26 15:39:05 +00:00
Field G. Van Zee	fde5f1fdec	Added extensive support for configuration defaults. Details: - Standard names for reference kernels (levels-1v, -1f and 3) are now macro constants. Examples: BLIS_SAXPYV_KERNEL_REF BLIS_DDOTXF_KERNEL_REF BLIS_ZGEMM_UKERNEL_REF - Developers no longer have to name all datatype instances of a kernel with a common base name; [sdcz] datatype flavors of each kernel or micro-kernel (level-1v, -1f, or 3) may now be named independently. This means you can now, if you wish, encode the datatype-specific register blocksizes in the name of the micro-kernel functions. - Any datatype instances of any kernel (1v, 1f, or 3) that is left undefined in bli_kernel.h will default to the corresponding reference implementation. For example, if BLIS_DGEMM_UKERNEL is left undefined, it will be defined to be BLIS_DGEMM_UKERNEL_REF. - Developers no longer need to name level-1v/-1f kernels with multiple datatype chars to match the number of types the kernel WOULD take in a mixed type environment, as in bli_dddaxpyv_opt(). Now, one char is sufficient, as in bli_daxpyv_opt(). - There is no longer a need to define an obj_t wrapper to go along with your level-1v/-1f kernels. The framework now prvides a _kernel() function which serves as the obj_t wrapper for whatever kernels are specified (or defaulted to) via bli_kernel.h - Developers no longer need to prototype their kernels, and thus no longer need to include any prototyping headers from within bli_kernel.h. The framework now generates kernel prototypes, with the proper type signature, based on the kernel names defined (or defaulted to) via bli_kernel.h. - If the complex datatype x (of [cz]) implementation of the gemm micro- kernel is left undefined by bli_kernel.h, but its same-precision real domain equivalent IS defined, BLIS will use a 4m-based implementation for the datatype x implementations of all level-3 operations, using only the real gemm micro-kernel.	2014-02-25 13:34:56 -06:00
Field G. Van Zee	6363a9f658	Added level-3 support for complex via 4m-/3m. Details: - Added the ability to induce complex domain level-3 operations via new virtual complex micro-kernels which are implemented via only real domain micro-kernels. Two new implementations are provided: 4m and 3m. 4m implements complex matrix multiplication in terms of four real matrix multiplications, where as 3m uses only three and thus is capable of even higher (than peak) performance. However, the 3m method has somewhat weaker numerical properties, making it less desirable in general. - Further refined packing routines, which were recently revamped, and added packing functionality for 4m and 3m. - Some modifications to trmm and trsm macro-kernels to facilitate indexing into micro-panels which were packed for 4m/3m virtual kernels. - Added 4m and 3m interfaces for each level-3 operation. - Various other minor changes to facilitate 4m/3m methods.	2014-02-19 17:00:52 -06:00
Tyler Smith	ce06686368	Fixed more Xeon Phi bugs, especially with scattered update	2014-02-14 13:52:18 -06:00
Tyler Smith	31134b5c70	Some fixes, changes, and improvements to the microkernel to the Xeon Phi	2014-02-14 11:19:44 -06:00

1 2

87 Commits