amd/blis - blis - Public git mirror

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-13 10:35:38 +00:00

Author	SHA1	Message	Date
Field G. Van Zee	d10e05bbd1	Sandbox header edits trigger full library rebuild. Details: - Adjusted the top-level Makefile so that any change to a sandbox header file will result in blis.h being regenerated along with a full recompilation of the library. Previously, sandbox files were omitted from the list of header files that, when touched, could trigger a full rebuild. Why was it like that previously? Because originally we only envisioned using sandboxes to replace gemm, not augment the library with new functionality. When replacing gemm, blis.h does not need to contain any local sandbox defintions in order for the user to be able to (indirectly) use that sandbox. But if you are adding functions to the library, those functions need to be prototyped so the compiler can perform type checking against the user's invocation of those new functions. Thanks to Jeff Diamond for helping us discover this deficiency in the build system.	2021-06-13 19:36:16 -05:00
Devin Matthews	7c3eb44efa	Add vhsubpd/vhsubpd. Horizontal subtraction instructions added to bli_x86_asm_macros.h, currently unused [ci skip].	2021-06-02 11:28:22 -05:00
Field G. Van Zee	7f7d72610c	Fixed bugs in cpackm kernels, gemmlike code. Details: - Fixed intermittent bugs in bli_packm_haswell_asm_c3xk.c and bli_packm_haswell_asm_c8xk.c whereby the imaginary component of the kappa scalar was incorrectly loaded at an offset of 8 bytes (instead of 4 bytes) from the real component. This was almost certainly a copy- paste bug carried over from the corresonding zpackm kernels. Thanks to Devin Matthews for bringing this to my attention. - Added missing code to gemmlike sandbox files bls_gemm_bp_var1.c and bls_gemm_bp_var2.c that initializes the elements of the temporary microtile to zero. (This bug was never observed in output but rather noticed analytically. It probably would have also manifested as intermittent failures, this time involving edge cases.) - Minor commented-out/disabled changes to testsuite/src/test_gemm.c relating to debugging.	2021-05-31 16:50:18 -05:00
Field G. Van Zee	213dce32d2	Added a new 'gemmlike' sandbox. Details: - Added a new sandbox called 'gemmlike', which implements sequential and multithreaded gemm in the style of gemmsup but also unconditionally employs packing. The purpose of this sandbox is to (1) avoid select abstractions, such as objects and control trees, in order to allow readers to better understand how a real-world implementation of high-performance gemm can be constructed; (2) provide a starting point for expert users who wish to build something that is gemm-like without "reinventing the wheel." Thanks to Jeff Diamond, Tze Meng Low, Nicholai Tukanov, and Devangi Parikh for requesting and inspiring this work. - The functions defined in this sandbox currently use the "bls_" prefix instead of "bli_" in order to avoid any symbol collisions in the main library. - The sandbox contains two variants, each of which implements gemm via a block-panel algorithm. The only difference between the two is that variant 1 calls the microkernel directly while variant 2 calls the microkernel indirectly, via a function wrapper, which allows the edge case handling to be abstracted away from the classic five loops. - This sandbox implementation utilizes the conventional gemm microkernel (not the skinny/unpacked gemmsup kernels). - Updated some typos in the comments of a few files in the main framework.	2021-05-28 14:49:57 -05:00
Field G. Van Zee	82af05f54c	Updated Fugaku (a64fx) performance results. Details: - Updated the performance graphs (pdfs and pngs) for the Fugaku/a64fx entry within Performance.md, and also updated the experiment details accordingly. Thanks to RuQing Xu for re-running the BLIS and SSL2 experiments reflected in this commit. - In Performance.md, added an English translation of the project name under which the Fugaku results were gathered, courtesy of RuQing Xu.	2021-05-25 15:25:08 -05:00
Devin Matthews	e5c85da376	Merge pull request #503 from flame/windows-compiler-check Add explicit compiler check for Windows.	2021-05-24 16:56:22 -05:00
Devin Matthews	cbd8d39325	Merge pull request #500 from xrq-phys/armsve+travis Upgrade Travis CI for Arm SVE	2021-05-24 16:32:42 -05:00
Devin Matthews	5feb04e233	Add explicit compiler check for Windows. Check the C compiler for a predefined macro `_WIN32` to indicate (cross-)compilation for Windows. Fixes #463.	2021-05-23 18:46:56 -05:00
Devin Matthews	6d4ab0223d	Merge pull request #502 from flame/rm-rm-dupls Remove `rm-dupls` function in common.mk.	2021-05-23 18:39:53 -05:00
Devin Matthews	859fb77a32	Remove `rm-dupls` function in common.mk. AMD requested removal due to unclear licensing terms; original code was from stackoverflow. The function is unused but could easily be replaced by new implementation.	2021-05-23 18:15:23 -05:00
RuQing Xu	932dfe6abb	Travis CI Revert Unnecessary Extras from `91d3636` - Removed `V=1` in make line - Removed `CFLAGS` in configure line - Restored `pwd` surrounding OOT line	2021-05-20 02:07:31 +09:00
RuQing Xu	bd156a210d	Adjust TravisCI - ArmSVE don't test gemmt (seems Qemu-only problem); - Clang use TravisCI-provided version instead of fixing to clang-8 due to that clang-8 seems conflicting with TravisCI's clang-7.	2021-05-20 00:52:04 +09:00
RuQing Xu	91d3636031	Travis Support Arm SVE - Updated distro to 20.04 focal aarch64-gcc-10. This is minimal version required by aarch64-gcc-10. SVE intrinsics would not compile without GCC >=10. - x86 toolchains use official repo instead of ubuntu-toolchain-r/test. 20.04 focal is not supported by that PPA at the moment. - Add extra configuration-time options to .travis.yml. - Add Arm SVE entry to .travis.yml.	2021-05-20 00:52:01 +09:00
RuQing Xu	61584deddf	Added 512b SVE-based a64fx subconfig + SVE kernels. Details: - Added 512-bit specific 'a64fx' subconfiguration that uses empirically tuned block size by Stepan Nassyr. This subconfig also sets the sector cache size and enables memory-tagging code in SVE gemm kernels. This subconfig utilizes (16, k) and (10, k) DPACKM kernels. - Added a vector-length agnostic 'armsve' subconfiguration that computes blocksizes according to the analytical model. This part is ported from Stepan Nassyr's repository. - Implemented vector-length-agnostic [d/s/sh] gemm kernels for Arm SVE at size (2*VL, 10). These kernels use unindexed FMLA instructions because indexed FMLA takes 2 FMA units in many implementations. PS: There are indexed-FLMA kernels in Stepan Nassyr's repository. - Implemented 512-bit SVE dpackm kernels with in-register transpose support for sizes (16, k) and (10, k). - Extended 256-bit SVE dpackm kernels by Linaro Ltd. to 512-bit for size (12, k). This dpackm kernel is not currently used by any subconfiguration. - Implemented several experimental dgemmsup kernels which would improve performance in a few cases. However, those dgemmsup kernels generally underperform hence they are not currently used in any subconfig. - Note: This commit squashes several commits submitted by RuQing Xu via PR #424.	2021-05-19 09:52:29 -05:00
Devin Matthews	5d46dbee4a	Replace bli_dlamch with something less archaic (#498 ) Details: - Added new implementations of bli_slamch() and bli_dlamch() that use constants from the standard C library in lieu of dynamically-computed values (via code inherited from netlib). The previous implementation is still available when the cpp macro BLIS_ENABLE_LEGACY_LAMCH is defined by the subconfiguration at compile-time. Thanks to Devin Matthews for providing this patch, and to Stefano Zampini for reporting the issue (#497) that prompted Devin to propose the patch.	2021-05-12 18:42:09 -05:00
Field G. Van Zee	6a4aa986ff	Fixed typo in Table of Contents.	2021-04-23 13:10:01 -05:00
Field G. Van Zee	f6424b5b82	Added dedicated Performance section to README.md. Details: - Spun off the Performance.md and PerformanceSmall.md links in the Documentation section into a new Performance section dedicated to those two links. (The previous entries remain redundantly listed within Documentation section.) Thanks to Robert van de Geijn for suggesting this change.	2021-04-23 13:08:06 -05:00
Devin Matthews	40ce5fd241	Merge pull request #493 from cassiersg/patch-1 Fix typo in FAQ.md	2021-04-21 09:54:25 -05:00
Gaëtan Cassiers	1f3461a5a5	Fix typo in FAQ.md	2021-04-21 16:49:05 +02:00
Field G. Van Zee	6280757be3	Minor updates to a64fx section of Performance.md.	2021-04-07 13:03:56 -05:00
RuQing Xu	1e6ed823c6	Additional A64fx Comments (#490 ) * Performance.md Update A64fx Comments - Reason for ARMPL's missing data; - Additional envs / flags for kernel selection; - Update BLIS SRC commit. * Include Another Fix in armsve-cfg-vendor A prototype was forgotten, causing that void* pointer was not fully returned.	2021-04-07 12:59:26 -05:00
Field G. Van Zee	2688f21a5b	Added Fujitsu A64fx (512-bit SVE) perf results. Details: - Added single-threaded and multithreaded performance results to docs/Performance.md. These results were gathered on the "Fugaku" Fujitsu A64fx supercomputer at the RIKEN Center for Computational Science in Kobe, Japan. Special thanks to RuQing Xu and Stepan Nassyr for their work in developing and optimizing A64fx support in BLIS and RuQing for gathering the performance data that is reflected in these new graphs.	2021-04-06 19:02:37 -05:00
Field G. Van Zee	ba3ba8da83	Minor updates and fixes to test/3/octave scripts. Details: - Fixed an issue where the wrong string was being passed in for the vendor legend string. - Changed the graph in which the legends appear. - Updates to runthese.m.	2021-04-06 18:39:58 -05:00
Devin Matthews	90508192f2	Update do_sde.sh (#489 ) Update to a newer version of SDE, and do a direct download as it seems you don't have to click-through the license anymore.	2021-03-30 21:16:44 -05:00
Nicholai Tukanov	22c6b5dc4c	Fixed bug in power10 microkernel I/O. (#488 ) Details: - Fixed a bug in the POWER10 DGEMM kernel whereby the microkernel did not store the microtile result correctly due to incorrect indices calculations. (The error was introduced when I reorganized the 'kernels/power10/3' directory.)	2021-03-30 19:07:42 -05:00
Field G. Van Zee	159ca6f01a	Made test/3/octave scripts robust to missing data. Details: - Modified the octave scripts in test/3 so that the script does not choke when one or more of the expected OpenBLAS, Eigen, or vendor data files is missing. (The BLIS data set, however, must be complete.) When a file is missing, that data series is simply not included on that particular graph. Also factored out a lot of the redundant logic from plot_panel_4x5.m into a separate function in read_data.m.	2021-03-24 15:57:32 -05:00
Field G. Van Zee	545e6c2f6d	CHANGELOG update (0.8.1)	2021-03-22 17:42:33 -05:00
Field G. Van Zee	8535b3e11d	Version file update (0.8.1)	2021-03-22 17:42:33 -05:00
Field G. Van Zee	e56d9f2d94	ReleaseNotes.md update in advance of next version.	2021-03-22 17:40:50 -05:00
Field G. Van Zee	ca83f955d4	CREDITS file update.	2021-03-22 17:21:21 -05:00
Field G. Van Zee	57ef61f6cd	Merge branch 'master' of github.com:flame/blis	2021-03-19 13:05:43 -05:00
Field G. Van Zee	bf1b578ea3	Reduced KC on skx from 384 to 256. Details: - Reduced the KC cache blocksize for double real on the skx subconfig from 384 to 256. The maximum (extended) KC was also reduced accordingly from 480 to 320. Thanks to Tze Meng Low for suggesting this change.	2021-03-19 13:03:17 -05:00
Nicholai Tukanov	e7a4a8edc9	Fix calculation of new pb size (#487 ) Details: - Added missing parentheses to the i8 and i4 instantiations of the GENERIC_GEMM macro in sandbox/power10/generic_gemm.c.	2021-03-17 19:43:31 -05:00
Field G. Van Zee	4493cf516e	Redefined BLIS_NUM_ARCHS to update automatically. Details: - Changed BLIS_NUM_ARCHS from a cpp macro definition to the last enum value in the arch_t enum. This means that it no longer needs to get updated manually whenever new subconfigurations are added to BLIS. Also removed the explicit initial index assigment of 0 from the first enum value, which was unnecessary due to how the C language standard mandates indexing of enum values. Thanks to Devin Matthews for originally submitting this as a PR in #446. - Updated docs/ConfigurationHowTo.md to reflect the aforementioned change.	2021-03-15 13:12:49 -05:00
Field G. Van Zee	a4b73de84c	Disabled _self() and _equal() in bli_pthread API. Details: - Disabled the _self() and _equal() extensions to the bli_pthread API introduced in d479654. These functions were disabled after I realized that they aren't actually needed yet. Thanks to Devin Matthews for helping me reason through the appropriate consumer code that will appear in BLIS (eventually) in a future commit. (Also, I could never get the Windows branch to link properly in clang builds in AppVeyor. See the comment I left in the code, and #485, for more info.)	2021-03-12 19:47:39 -06:00
Field G. Van Zee	f9d604679d	Added _self() and _equal() to bli_pthread API. Details: - Expanded the bli_pthread API to include equivalents to pthread_self() and pthread_equal(). Implemented these two functions for all three cpp branches present within bli_pthread.c: systemless, Windows, and Linux/BSD.	2021-03-12 19:47:39 -06:00
Field G. Van Zee	fa9b3c8f6b	Shuffled code in Windows branch of bli_pthreads.c. Details: - Reordered the definitions in the cpp branch in bli_pthreads.c that defines the bli_pthreads API in terms of Windows API calls. Also added missing comments that mark sections of the API, which brings the code into harmony with other cpp branches (as well as bli_pthread.h).	2021-03-11 15:13:51 -06:00
Field G. Van Zee	95d4f3934d	Moved cpp macro redef of strerror_r to bli_env.c. Details: - Relocated the _MSC_VER-guarded cpp macro re-definition of strerror_r (in terms of strerror_s) from bli_thread.h to bli_env.c. It was likely left behind in bli_thread.h in a previous commit, when code that now resides in bli_env.c was moved from bli_thread.c. (I couldn't find any other instance of strerror_r being used in BLIS, so I moved the #define directly to bli_env.c rather than place it in bli_env.h.) The code that uses strerror_r is currently disabled, though, so this commit should have no affect on BLIS.	2021-03-11 13:50:40 -06:00
Field G. Van Zee	8a3066c315	Relocated gemmsup_ref general stride handling. Details: - Moved the logic that checks for general stridedness in any of the matrix operands in a gemmsup problem. The logic previously resided near the top of bli_gemmsup_int(), which is the thread entry point for the parallel region of the current gemmsup implementation. The problem with this setup was that the code would attempt to reject problems with any general-strided operands by returning BLIS_FAILURE, and that return value was then being ignored by the l3_sup thread decorator, which unconditionally returns BLIS_SUCCESS. To solve this issue, rather than try to manage n return values, one from each of n threads, I simply moved the logic into bli_gemmsup_ref(). I didn't move it any higher (e.g. bli_gemmsup()) because I still want the logic to be part of the current gemmsup handler implementation. That is, perhaps someone else will create a different handler, and that author wants to handle general stride differently. (We don't want to force them into a particular way of handling general stride.) - Removed the general stride handling from bli_gemmtsup_int(), even though this function is inoperative for now. - This commit addresses issue #484. Thanks to RuQing Xu for reporting this issue.	2021-03-09 17:52:59 -06:00
Nicholai Tukanov	670bc7b60f	Add low-precision POWER10 gemm kernels (#467 ) Details: - This commit adds a new BLIS sandbox that (1) provides implementations based on low-precision gemm kernels, and (2) extends the BLIS typed API for those new implementations. Currently, these new kernels can only be used for the POWER10 microarchitecture; however, they may provide a template for developing similar kernels for other microarchitectures (even those beyond POWER), as changes would likely be limited to select places in the microkernel and possibly the packing routines. The new low-precision operations that are now supported include: shgemm, sbgemm, i16gemm, i8gemm, i4gemm. For more information, refer to the POWER10.md document that is included in 'sandbox/power10'.	2021-03-05 13:53:43 -06:00
RuQing Xu	b8dcc5bc75	Fixed typed API definition for gemmt (#476 ) Details: - Fixed incorrect definition and prototype of bli_?gemmt() in frame/3/bli_l3_tapi.c and .h, respectively. gemmt was previously defined identically to gemm, which was wrong because it did not take into account the uplo property of C. - Fixed incorrect API documentation for her2k/syr2k in BLISTypedAPI.md. Specifically, the document erroneously listed only a single transab parameter instead of transa and transb.	2021-03-01 16:58:24 -06:00
Ilknur	a0e4fe2340	Fixed double free() in level1v example (#482 ) Details: - In exampls/tapi/00level1v.c, pointer 'z' was being freed twice and pointer 'a' was not being freed at all. This commit correctly frees each pointer exactly once.	2021-03-01 16:06:56 -06:00
Field G. Van Zee	f5871c7e06	Added complex asm packm kernels for 'haswell' set. Details: - Implemented assembly-based packm kernels for single- and double- precision complex domain (c and z) and housed them in the 'haswell' kernel set. This means c3xk, c8xk, z3xk, and z4xk are now all optimized. - Registered the aforementioned packm kernels in the haswell, zen, and zen2 subconfigs. - Minor modifications to the corresponding s and d packm kernels that were introduced in `426ad67`. - Thanks to AMD, who originally contributed the double-precision real packm kernels (d6xk and d8xk), upon which these complex kernels are partially based.	2021-02-28 17:03:57 -06:00
Field G. Van Zee	426ad679f5	Added assembly packm kernels for 'haswell' set. Details: - Implemented assembly-based packm kernels for single- and double- precision real domain (s and d) and housed them in the 'haswell' kernel set. This means s6xk, s16xk, d6xk, and d8xk are now all optimized. - Registered the aforementioned packm kernels in the haswell, zen, and zen2 subconfigs. - Thanks to AMD, who originally contributed the double-precision real packm kernels (d6xk and d8xk), which I have now tweaked and used to create comparable single-precision real kernels (s6xk and s16xk).	2021-02-27 18:39:56 -06:00
Devin Matthews	f50c1b7e58	Merge pull request #473 from ajaypanyala/pkgconfig build: generate pkgconfig file	2021-02-01 11:55:51 -06:00
Field G. Van Zee	8f39aea11f	Merge branch 'dev'	2021-01-30 17:59:56 -06:00
Field G. Van Zee	f8db9fb33b	Fixed missing parentheses in README.md Citations.	2021-01-28 08:04:52 -06:00
Ajay Panyala	b3953b938e	drop CFLAGS in the generated pkgconfig file	2021-01-12 17:07:04 -08:00
Ajay Panyala	b02d9376ba	add datadir	2021-01-12 11:47:58 -08:00
Ajay Panyala	d8d8deeb6d	generate pkgconfig file	2021-01-11 17:47:50 -08:00

1 2 3 4 5 ...

1997 Commits