amd/blis - blis - Public git mirror

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-11 17:50:00 +00:00

Author	SHA1	Message	Date
prangana	3620e472e3	Replace back major version number variable in Makefile Change-Id: I0f902e32085058ec618d08470793f5e5e49719b3	2020-06-10 13:11:14 +05:30
Dipal M Zambare	305c744131	Added traces in dgemm and sgemm paths. Added traces from blas/cblas API's till kernels for dgemm and sgemm. By default the traces will be disabled, user need to enable them in their local workspace, please check aocl_dtl/aocldtlcf.h file. AMD Internal : CPUPL-806 Change-Id: I83b310509fb1a599c114387192bcf882ef0480f9	2020-06-08 12:01:22 +05:30
Meghana	9fce1ec4a4	Optimized SGEMV kernel and changed BLAS interface call Details: - Optimized saxpyf kernel with fuse_factor=5 and iter_unroll=2. - Modified framework files of sgemv to remove dependency on cntx variable. - Updated cntx_init file of zen2 to choose optimized kernels. - Modified BLAS interface call for SGEMV to reduce framework overhread. - Currently these changes are applicable for zen2 configuration. Change-Id: Iabc36ae640e82e65f8764f3c6dee513ad64b22fd Signed-off-by: Meghana Vankadari <Meghana.Vankadari@amd.com> AMD-Internal: [CPUPL-707]	2020-06-04 02:49:08 -04:00
Dipal Madhukar Zambare	8a367c993e	Merge "Checking for zero dimension is moved to bli_gemm_xx call." into amd-staging-rome-2.2	2020-06-04 02:16:56 -04:00
Meghana	f4d2bb2fed	Enabled AOCC specific flags for all versions of AOCC compiler Change-Id: Icad0ff1c1858c1762792ba8f2c5c3e846909cbb5	2020-06-03 10:50:00 -04:00
dzambare	5d57d67cb3	Checking for zero dimension is moved to bli_gemm_xx call. This will ensure early return in case full gemm processing is not needed. Based on dimension which is found to be zero following actions will be taken: If 'c' has zero dimension, no further processing is requried If alpha is zero or if 'a' or 'b' has zero diemension, we perform scalm operation instead of gemm. (c = alphaa + betab) Change-Id: Icc031944fc4e80138adf991974547f2d57ab570b AMD-Internal: [CPUPL-904]	2020-06-03 16:50:11 +05:30
managalv	b4e599ecc2	CPUPL-929: Improve Complex GEMM performance - Support all storage formats and non Transpose/Conjugate Matrices Failure was seen in libflame function (FLASH_UDdate_UT_inc) Due to typecasting double complex pointer as double pointer Change-Id: If6e2f4663575450a13a9a07dddd5622628f5c6b0	2020-06-02 22:27:54 +05:30
Nallani Bhaskar	6f01cd2c54	Fix for sblat3.x failure in make check Details: Using of ymm registers storing 8 float values than 4 floats values Changed register from ymm to xmm in required places. This can be found only when leading dimension is greater than the actual dimension. Change-Id: I39f04eac18c4fa3a8c93048c977d6a83aa92b800	2020-06-01 17:04:59 +05:30
managalv	f7bc37ea32	CPUPL-929: Improve Complex GEMM performance - Support all storage formats and non Transpose/Conjugate Matrices Details Added Support of N SUP kernel for complex float and complex double Removed prefetching in M SUP kernels for complex float and complex double Removed all warnings Change-Id: I05ffde0f0613681927fe7576db7f5f1a4486fd05	2020-06-01 06:24:12 -04:00
Kiran Varaganti	c8f3cec5f7	Merge "Code cleanup in 6xk DGEMM pack Kernel" into amd-staging-rome-2.2	2020-06-01 05:08:58 -04:00
Nallani Bhaskar	5e0ad13f8e	Code Cleanup and replaced vzeroall with vxorps Change-Id: I74c2cc2183a407aad86eab5c3285c33690de9abd	2020-06-01 10:14:06 +05:30
Nallani Bhaskar	2413c31672	CPUPL-923: Implemented dot Product Kernels in SGEMM SUP for transpose cases. Details: Added two new kernels bli_sgemmsup_rd_zen_asm_6x16m and bli_sgemmsup_rd_zen_asm_6x16n to support dot product in Row Major (A * Tranpose(B)) and in Column Major (Tranpose(A) * B) Change-Id: I264fd75c4c4b68fb7dc4fd229eaa44d09e9f3432	2020-05-31 22:37:03 +05:30
Kiran Varaganti	3ebd5f8aa0	Code cleanup in 6xk DGEMM pack Kernel Removed conditional check if(*kappa_cast==0.0) in 6xk dgemm packing kernel Change-Id: Ie543787133d303aeb2532e67b83d6ba96e3d558e	2020-05-31 21:41:45 +05:30
prangana	711f26129e	Update AMD BLIS version to 2.2 Also updated Makefile to fix issue of multiple symbolic links being created Change-Id: Ie9a680cedd5c96fcd7f6af1ce0f849a58c3ed4d3	2020-05-31 21:37:32 +05:30
Kiran Varaganti	f8ddd48594	Code Clean-up in DGEMM packing kernels Removed conditional check for (*kappa_cast == 1.0) because its always 1.0 in DGEMM packing kernels. [CPUPL-636] Change-Id: Ib04f2a3cdbb0f138036a8b0486d1dec073e40407	2020-05-30 21:55:29 +05:30
prangana	0c52aaefe1	Merge branch 'ref/heads/amd-staging-rome-2.2' of ssh://git.amd.com:29418/cpulibraries/er/blis into amd-staging-rome-2.2 Change-Id: I46acf48354ff73fb4eaeac255132d21095ea4d98	2020-05-30 10:31:10 +05:30
prangana	bb7eeec843	Change loop test expression in bli_packm_zen_int.c PRAGMA SIMD loop has issues with test expression (k !=0) Changed usage to (k > 0) Change-Id: I50204dbd0194de43f0d6cdcbfc586bb16aa25968	2020-05-30 10:00:21 +05:30
Kiran Varaganti	739803a441	DGEMM Packing Kernels for Native DGEMM implementation [CPUPL-858] Packing kernels for dgemm 6x8 kernel are added explicitly for zen2 configuration. Apart from generic packing kernels used by level-3 routines and for all combinations of the input parameters, introduced DGEMM specific packing kernels for the case op(A) & op(B) is no transpose. This helps us to vectorize these packing kernels and eliminate un-necessary branch conditional checks. The packed kernels are also optimized at the boundary. These boundary condition optimization help when the input matrix dimensions "m" and "n" are not multiples of register block-sizes "MR & NR". Typical DGEMM operation is C = betaC + alpha op(A) * op(B). Kindly note the multiplication with alpha is handled inside kernel, hence in these dgemm packing routines alpha is always consider 1.0. These routines are "bli_dpackm_8xk_nn_zen" & "bli_dpackm_6xk_nn_zen". The generic packing routines are "bli_dpackm_6xk_gen_zen" & bli_dpackm_8xk_gen_zen". These routines are enabled from "bli_cntx_init_zen2()" through bli_cntx_set_packm_kers(). In this checkout wthe generic packing kernels are enabled by default". Later will introduce run-time mechanism to change these packing kernels based on the DGEMM input parameters. Change-Id: I079b4dce0757d558224cb8c55d024bfea6a4de91	2020-05-28 02:01:43 -04:00
managalv	9b09dd7d6c	CPUPL-929:Improve Complex GEMM performance Added context for CCR format Change-Id: I81ac1b882f176235b1c48f4952ec30f44c6b138c	2020-05-23 11:10:35 +05:30
managalv	154bedc785	CPUPL-929:Improve Complex GEMM performance Removed print which was part of kernel Change-Id: I288e0151ba8da8d6dd4415734c88ed3474ba3a5b	2020-05-22 14:39:12 +05:30
dzambare	8ce6e49a34	Added file and copyright header for aoclflist.c file. AMD Internal: CPUPL-968 Change-Id: I1d0905fac62cac8224e38ae8d0912b9c5663799d	2020-05-22 12:54:18 +05:30
managalv	11570dbc14	CPUPL-929:Improve Complex GEMM performance Updated BLIS_MC value and created SUP context for CCC storage format Change-Id: I5032b29834ea545d7b5f7a9469bc5655c71b7fe5	2020-05-22 10:47:28 +05:30
Guodong Xu	72443e7173	avoid loading twice in armv8a gemm kernel (#403 ) This bug happens at a corner case, when k_iter == 0 and we jump to CONSIDERKLEFT. In current design, first row/col. of a and b are loaded twice. The fix is to rearrange a and b (first row/col.) loading instructions. Change-Id: I4a985a3abf9b1e7a0ee29e17c7d39a4a27138c4c Signed-off-by: Guodong Xu <guodong.xu@linaro.org>	2020-05-21 12:37:53 +05:30
Field G. Van Zee	f973f00d94	Defined netlib equivalent of xerbla_array(). Details: - Added a function definition for xerbla_array_(), which largely mirrors its netlib implementation. Thanks to Isuru Fernando for suggesting the addition of this function. Change-Id: Ie9c619f5604e60a32edfda2db2b66f0c762581d3	2020-05-21 11:57:54 +05:30
Field G. Van Zee	994a2d8de5	Documented Perl prerequisite for build system. Details: - Added Perl to list of prerequisites for building BLIS. This is in part (and perhaps completely?) due to some substitution commands used at the end of configure that include '\n' characters that are not properly interpreted by the version of sed included on some versions of OS X. This new documentation addresses issue #398.	2020-05-21 11:56:45 +05:30
Guodong Xu	66ec22705b	New kernel set for Arm SVE using assembly (#396 ) Here adds two kernels for Arm SVE vector extensions. 1. a gemm kernel for double at sizes 8x8. 2. a packm kernel for double at dimension 8xk. To achive best performance, variable length agonostic programming is not used. Vector length (VL) of 256 bits is mandated in both kernels. Kernels to support other VLs can be added later. "SVE is a vector extension for AArch64 execution mode for the A64 instruction set of the Armv8 architecture. Unlike other SIMD architectures, SVE does not define the size of the vector registers, but constrains into a range of possible values, from a minimum of 128 bits up to a maximum of 2048 in 128-bit wide units. Therefore, any CPU vendor can implement the extension by choosing the vector register size that better suits the workloads the CPU is targeting. Instructions are provided specifically to query an implementation for its register size, to guarantee that the applications can run on different implementations of the ISA without the need to recompile the code." [1] [1] https://developer.arm.com/solutions/hpc/resources/hpc-white-papers/arm-scalable-vector-extensions-and-application-to-machine-learning Signed-off-by: Guodong Xu <guodong.xu@linaro.org>	2020-05-21 11:56:45 +05:30
Yingbo Ma	562b9eeaaf	Update KernelsHowTo.md (#395 )	2020-05-21 11:56:45 +05:30
Field G. Van Zee	8e3f143439	Adding missing conjy to her2/syr2 in typed API doc. Details: - Fixed a missing argument (conjy) in the function signatures of bli_?her2() and bli_?syr2() in docs/BLISTypedAPI.md. Thanks to Robert van de Geijn for reporting this omission. Change-Id: Ifd1e01d5d7f943db4b1d67b467eb57e4a5c44165	2020-05-21 11:56:36 +05:30
Field G. Van Zee	93023d0e02	README.md update to promote supmt dgemm. Details: - Updated the sup entry in the "What's New" section of the README.md file to promote the multithreaded dgemm sup feature introduced in `c0558fd`.	2020-05-21 11:55:32 +05:30
Field G. Van Zee	4907b32c2c	CHANGELOG update (0.7.0)	2020-05-21 11:55:32 +05:30
Field G. Van Zee	052a3c589f	ReleaseNotes.md update in advance of next version. Details: - Updated docs/ReleaseNotes.md in preparation for next version. Change-Id: I6c3e0dbaebcb855dff9420196092da5cb0bcce89	2020-05-21 11:55:20 +05:30
Field G. Van Zee	27b2911ab3	Rename more bli_thread_obarrier(), _obroadcast(). Details: - Renamed instances of bli_thread_obarrier() and bli_thread_obroadcast() that were made in the supmt-specific code commited to the 'amd' branch, which has now been merged with 'master'. Prior to the merge, 'master' received commit `c01d249`, which applied these renamings to the existing, non-sup codebase.	2020-05-21 11:54:54 +05:30
Field G. Van Zee	4a5e76e15e	Minor updates/elaborations to RELEASING file.	2020-05-21 11:54:54 +05:30
Satish Balay	d560d105d2	OSX: specify the full path to the location of libblis.dylib (#390 ) * OSX: specify the full path to the location of libblis.dylib so that it can be found at runtime Before this change: Appication gives runtime error [when linked with blis] dyld: Library not loaded: libblis.3.dylib balay@kpro lib % otool -L libblis.dylib libblis.dylib: libblis.3.dylib (compatibility version 0.0.0, current version 0.0.0) /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1281.0.0) After this change: balay@kpro lib % otool -L libblis.dylib libblis.dylib: /Users/balay/petsc/arch-darwin-c-debug/lib/libblis.3.dylib (compatibility version 0.0.0, current version 0.0.0) /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1281.0.0) * INSTALL_LIBDIR -> libdir as INSTALL_LIBDIR has DESTDIR Co-Authored-By: Jed Brown <jed@jedbrown.org> * CREDITS file update. Co-authored-by: Jed Brown <jed@jedbrown.org> Co-authored-by: Field G. Van Zee <field@cs.utexas.edu>	2020-05-21 11:54:54 +05:30
Field G. Van Zee	3597284b9d	Updates, tweaks to runme.sh in test/1m4m. Details: - Made several updates to test/1m4m/runme.sh, including: - Added missing handling for 1m and 4m1a implementations when setting the BLIS_??_NT environment variables. - Added support for using numactl to run the test executables. - Several other cleanups.	2020-05-21 11:54:53 +05:30
Field G. Van Zee	b325f1ea62	Warn user when auto-detection returns 'generic'. Details: - Added logic to configure that causes the script to output a warning to the user if/when "./configure auto" is run and the underlying hardware feature detection code is unable to identify the hardware. In these cases, the auto-detect code will return 'generic', which is likely not what the user expected, and a flag will be set so that a message is printed at the end of the configure output. (Thankfully, we don't expect this scenario to play out very often.) Thanks to Devin Matthews for suggesting this fix #384.	2020-05-21 11:54:53 +05:30
Field G. Van Zee	9e76059f15	Renamed bli_thread_obarrier(), _obroadcast(). Details: - Renamed two bli_thread_*() APIs: bli_thread_obarrier() -> bli_thread_barrier() bli_thread_obroadcast() -> bli_thread_broadcast() The 'o' was a leftover from when thrcomm_t objects tracked both "inner" and "outer" communicators. They have long since been simplified to only support the latter, and thus the 'o' is superfluous. Change-Id: If9ec9a2383dfb02e1cfc74918f87a1fabddbd55b	2020-05-21 11:54:37 +05:30
Field G. Van Zee	6a957d7247	List Gentoo under supported external packages. Details: - Add mention of Gentoo Linux under the list of external packages in the README.md file. Thanks to M. Zhou for maintaining this package.	2020-05-21 11:50:37 +05:30
Field G. Van Zee	c7faae9442	Merged test/sup, test/supmt into test/sup. Details: - Updated the Makefile, test_gemm.c, and runme.sh in test/sup to be able to compile and run both single-threaded and multithreaded experiments. This should help with maintenance going forward. - Created a test/sup/octave_st directory of scripts (based on the previous test/sup/octave scripts) as well as a test/sup/octave_mt directory (based on the previous test/supmt/octave scripts). The octave scripts are slightly different and not easily mergeable, and thus for now I'll maintain them separately. - Preserved the previous test/sup directory as test/sup/old/supst and the previous test/supmt directory as test/sup/old/supmt. Change-Id: Ia230fc65185fd9a34eec714721004aa9e0bd40ed	2020-05-21 11:50:19 +05:30
Field G. Van Zee	6d369532e3	Updated sup[mt] Makefiles for variable dim ranges. Details: - Updated test/sup/Makefile and test/supmt/Makefile to allow specifying different problem size ranges for the drivers where one, two, or three matrix dimensions is large. This will facilitate the generation of more meaningful graphs, particularly when two dimensions are tiny.	2020-05-21 11:46:36 +05:30
Field G. Van Zee	2096f41aa6	Updates to octave scripts in test/sup[mt]/octave. Details: - Optimized scripts in test/sup/octave and test/supmt/octave for use with octave 5.2.0 on Ubuntu 18.04. - Fixed stray 'end' keywords in gen_opsupnames.m and plot_l3sup_perf.m, which were not only unnecessary but also causing issues with versions 5.x.	2020-05-21 11:46:35 +05:30
Field G. Van Zee	08709d4117	Removed sorting on LDFLAGS in common.mk (#373 ). Details: - Removed a line of code in common.mk that passed LDFLAGS through the sort function. The purpose was not to sort the contents, but rather to remove duplicates. However, there is valid syntax in a string of linker flags that, when sorted, yields different/broken behavior. So I've removed the line in common.mk that sorts LDFLAGS. Also, for future use, I've added a new function, rm-dupls, that removes duplicates without sorting. (This function was based on code from a stackoverflow thread that is linked to in the comments for that code.) Thanks to Isuru Fernando for reporting this issue (#373). Change-Id: Ie355cc111fd2c6669f0c3088e8fa5dc7c407a3b9	2020-05-21 11:45:48 +05:30
Field G. Van Zee	b3c0309009	CHANGELOG update (0.6.1)	2020-05-21 11:42:04 +05:30
Field G. Van Zee	d6496d55cc	ReleaseNotes.md update in advance of next version. Details: - Updated ReleaseNotes.md in preparation for next version. Change-Id: I2aa6f944ce2584de85ae7b6921ff0193b3b7020a	2020-05-21 11:41:49 +05:30
Field G. Van Zee	51f87f3e42	Removed 'attic/windows' (to prevent confusion). Details: - Finally removed 'attic/windows' and its contents. This directory once contained "proto" Windows support for BLIS, but we've since moved on to (thanks to Isuru Fernando) providing Windows DLL support via AppVeyor's build artifacts. Furthermore, since 'windows' was the only subdirectory within 'attic', the directory path would show up in GitHub's listing at https://github.com/flame/blis, which probably led to someone being confused about how BLIS provides Windows support. I assume (but don't know for sure) that nobody is using these files, so this is admittedly a case of shoot first and ask questions later.	2020-05-21 11:41:10 +05:30
Field G. Van Zee	142df1b1e9	CREDITS file update.	2020-05-21 11:41:10 +05:30
Dave Love	291ee5f748	Fix parsing in vpu_count on workstation SKX (#351 ) * Fix parsing in vpu_count on workstation SKX * Document Skylake-X as Haswell for single FMA * Update vpu_count for Skylake and Cascade Lake models * Support printing the configuration selected, controlled by the environment Intended particularly for diagnosing mis-selection of SKX through unknown, or incorrect, number of VPUs. * Move bli_log outside the cpp condition, and use it where intended * Add Fixme comment (Skylake D) * Mostly superficial edits to commits towards #351. Details: - Moved architecture/sub-config logging-related code from bli_cpuid.c to bli_arch.c, tweaked names, and added more set/get layering. - Tweaked log messages output from bli_cpuid_is_skx() in bli_cpuid.c. - Content, whitespace changes to new bullet in HardwareSupport.md that relates to single-VPU Skylake-Xs. * Fix comment typos Co-authored-by: Field G. Van Zee <field@cs.utexas.edu>	2020-05-21 11:40:57 +05:30
Field G. Van Zee	99da76fd64	Fixed 'configure' breakage introduced in `6433831`. Details: - Added a missing 'fi' (endif) keyword to a conditional block added in the configure script in commit `6433831`.	2020-05-21 11:40:00 +05:30
Field G. Van Zee	38ecda47e7	Updated 1m draft article link in README.md.	2020-05-21 11:40:00 +05:30
Jeff Hammond	570d51483b	blacklist ICC 18 for knl/skx due to test failures Signed-off-by: Jeff Hammond <jeff.r.hammond@intel.com>	2020-05-21 11:40:00 +05:30

1 2 3 4 5 ...

2064 Commits