amd/blis - blis - Public git mirror

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-12 01:59:59 +00:00

Author	SHA1	Message	Date
Meghana	e56cf63a3f	Optimized "bli_dotv_zen_int10" kernels Details: - Fixed issues in "bli_dotv_zen_int10" kernels and optimized them. - Changed cntx_init file to choose "bli_dotv_zen_int10" kernel for dotv API call. Change-Id: Iee8d7519f3a22a2d41166390be6047e9cb37557f Signed-off-by: Meghana Vankadari <Meghana.Vankadari@amd.com> AMD-Internal: [CPUPL-824]	2020-04-14 09:52:57 +05:30
Meghana	b5fe75e104	Closing input and output files in test_gemm.c and test_trsm.c Change-Id: I75cdd5adc2bd2dac7d0eca9c050e06dbd52bec26 Signed-off-by: Meghana Vankadari <Meghana.Vankadari@amd.com>	2020-03-24 09:09:58 +05:30
Meghana Vankadari	ddcb3d8a52	Modified test_trsm.c file in test folder to read input sizes from a file Details: -A Macro 'FILE_IN_OUT' is defined to read matrix dimensions and strides from a csv file. Format for input file if 'FILE_IN_OUT' is defined: Each line defines a TRSM problem with the following parameters: m n cs_a cs_b The operation implemented by default is AX=B where A is lower-triangular and matrices are in column-major order. When macro is disabled, it reverts back to original implementation. Usage: ./test_trsm_<mkl/blis/openblas>.x input.csv output.csv -A macro 'READ_ALL_PARAMS_FROM_FILE' is defined to read all the parameters for TRSM from a csv file. This macro can be defined only when 'FILE_IN_OUT' is already defined. Format for the input file if 'READ_ALL_PARAMS_FROM_FILE' is defined: Each line defines a TRSM problem with the following paramenters: sideA uploA transA diagA m n cs_a cs_b By default, column-major order is chosen as storage scheme for matrices. Usage: ./test_trsm_<mkl/blis/openblas>.x input.csv output.csv Change-Id: I349bc69ca968911c16e04d1ce70974d01e65a2fb Signed-off-by: Meghana Vankadari <Meghana.Vankadari@amd.com>	2020-03-20 07:15:17 -04:00
Meghana	c20c96d9c0	Made some critical changes to small_gemm kernels Details: - In case of GEMM, whenever beta is zero, we need to perform C = alpha (A B) instead of C = beta * C + alpha * (A * B) Added conditions to check the value of beta at different levels inside small_gemm kernels and decide whether to perform scaling C with beta or not. -Modified small_gemm kernels to use BLIS specific functions to retrieve different fields of objects. -Calling bli_gemm_check before entering bli_gemm_small to facilitate early return in case of invalid inputs. -For corner cases inside small_gemm kernels, a buffer called f_temp is used to load and store data to and from registers. populating the buffer with zeroes before use. -In bli_gemm_front, datatypes of status and return value from bli_gemm_small are not matching. Corrected the datatype of the variable 'status' inside bli_gemm_front to err_t. Change-Id: I8b52ad55008f028d6c8b7e0d20f746a869d9daea Signed-off-by: Meghana Vankadari <Meghana.Vankadari@amd.com> AMD-Internal: [CPUPL-689,SWLCSG-104]	2020-03-19 16:30:04 +05:30
Field G. Van Zee	efe85b3c19	Added missing return to bli_thread_partition_2x2(). Details: - Added a missing return statement to the body of an early case handling branch in bli_thread_partition_2x2(). This bug only affected cases where n_threads < 4, and even then, the code meant to handle cases where n_threads >= 4 executes and does the right thing, albeit using more CPU cycles than needed. Nonetheless, thanks to Kiran Varaganti for reporting this bug via issue #377. - Whitespace changes to bli_thread.c (spaces -> tabs). Change-Id: I2182be0911f76861dd14bec9b6bacb6c20c2725d	2020-03-16 12:28:25 +05:30
Field G. Van Zee	a7c5723e77	Skip building thrinfo_t tree when mt is disabled. Details: - Return early from bli_thrinfo_sup_grow() if the thrinfo_t object address is equal to either &BLIS_GEMM_SINGLE_THREADED or &BLIS_PACKM_SINGLE_THREADED. - Added preprocessor logic to bli_l3_sup_thread_decorator() in bli_l3_sup_decor_single.c that (by default) disables code that creates and frees the thrinfo_t tree and instead passes &BLIS_GEMM_SINGLE_THREADED as the thrinfo_t pointer into the sup implementation. - The net effect of the above changes is that a small amount of thrinfo_t overhead is avoided when running small/skinny dgemm problems when BLIS is compiled with multithreading disabled. Change-Id: Ia1066752849f1dfc0cd98f8ac0302e2f7b0f8bf0	2020-03-13 01:10:34 -04:00
Kiran Devrajegowda	04fc9d3710	Merge "Fixed bug(s) in mt sup when single-threaded." into amd-staging-rome-2.2	2020-03-13 01:10:22 -04:00
Field G. Van Zee	574de9e29e	Fixed bug(s) in mt sup when single-threaded. Details: - Fixed a syntax bug in bli_l3_sup_decor_single.c as a result of changing function interface for the thread entry point function (of type l3supint_t). - Unfortunately, fixing the interface was not enough, as it caused a memory leak in the sba at bli_finalize() time. It turns out that, due to the new multithreading-capable variant code useing thrinfo_t objects--specifically, their calling of bli_thrinfo_grow()--we have to pass in a real thrinfo_t object rather than the global objects &BLIS_PACKM_SINGLE_THREADED or &BLIS_GEMM_SINGLE_THREADED. Thus, I inserted the appropriate logic from the OpenMP and pthreads versions so that single-threaded execution would work as intended with the newly upgraded variants. Change-Id: I2bfff849abf3fa30c73e0c5876128400854bbcb5	2020-03-13 01:10:04 -04:00
Field G. Van Zee	1a284828d1	Support multithreading within the sup framework. Details: - Added multithreading support to the sup framework (via either OpenMP or pthreads). Both variants 1n and 2m now have the appropriate threading infrastructure, including data partitioning logic, to parallelize computation. This support handles all four combinations of packing on matrices A and B (neither, A only, B only, or both). This implementation tries to be a little smarter when automatic threading is requested (e.g. via BLIS_NUM_THREADS) in that it will recalculate the factorization in units of micropanels (rather than using the raw dimensions) in bli_l3_sup_int.c, when the final problem shape is known and after threads have already been spawned. - Implemented bli_?packm_sup_var2(), which packs to conventional row- or column-stored matrices. (This is used for the rrc and crc storage cases.) Previously, copym was used, but that would no longer suffice because it could not be parallelized. - Minor reorganization of packing-related sup functions. Specifically, bli_packm_sup_init_mem_[ab]() are called from within packm_sup_[ab]() instead of from the variant functions. This has the effect of making the variant functions more readable. - Added additional bli_thrinfo_set_() static functions to bli_thrinfo.h and inserted usage of these functions within bli_thrinfo_init(), which previously was accessing thrinfo_t fields via the -> operator. - Renamed bli_partition_2x2() to bli_thread_partition_2x2(). - Added an auto_factor field to the rntm_t struct in order to track whether automatic thread factorization was originally requested. - Added new test drivers in test/supmt that perform multithreaded sup tests, as well as appropriate octave/matlab scripts to plot the resulting output files. - Added additional language to docs/Multithreading.md to make it clear that specifying any BLIS__NT variable, even if it is set to 1, will be considered manual specification for the purposes of determining whether to auto-factorize via BLIS_NUM_THREADS. - Minor comment updates. AMD-Internal: [CPUPL-713] Change-Id: I9536648e7befac4d2dc17805e44ef34470961662	2020-03-13 01:09:29 -04:00
Nallani Bhaskar	83745c7ffc	Beta Zero Check for sgemm small. Core Software Group SWLCSG-137 BLIS-ST validation failures Change-Id: I21d5eae6ec390438be847f2dca42350b97059d6e	2020-03-09 02:55:51 -04:00
Nallani Bhaskar	e0c95d77e1	Beta Zero Checks for sgemm_small Change-Id: I111b66ad54a27b1977d155904738a55a351e6689	2020-03-09 02:55:25 -04:00
Meghana Vankadari	cc98047fd6	Made framework changes to initialize specific cache block sizes for TRSM. Details: -This commit addresses the performance optimization(single-thread and multi-thread) for DTRSM on zen2. -This new optimization employs different MC, KC & NC values for TRSM than what is being used in other Level-3 routines like DGEMM. -Changed TRSM framework code to choose these blocksizes for TRSM on zen family configurations. -Added a new field called "trsm_blkszs" to cntx structure in order to store TRSM specific block sizes. -Implemented routines to initialize, set and query the TRSM-specific block sizes. -Defined a new macro "AOCL_BLIS_ZEN" in configure script. This macro is automatically defined for zen family architectures. It enables us to choose different cache block sizes for TRSM instead of common level-3 block sizes. Change-Id: Id8557b1c962a316b1edecca9cd582675eaf35fe6 Signed-off-by: Meghana Vankadari <meghana.vankadari@amd.com> AMD-Internal: [CPUPL-656]	2020-03-09 10:33:42 +05:30
dzambare	f965b95d8b	CPUPL-587: Corrected condition for A packing in sgemm_small Change-Id: I1e5dc4a1dbe2f1d17f9c72e8dd0c6728ac1fd750	2020-01-27 11:08:20 +05:30
Meghana	b3e2938b9e	Fix for CPUPL-549: TRSM for AlXB case results in NaN values For the kernel of size 4x8, cs_b is used instead of cs_a to calculate address of diagonal elements of matrix A. Correcting the mistake. Change-Id: Ie74e0f6a397fcd32fefb5804cd00f1e90bfe5523 AOCL-2.1 2.1	2019-12-21 23:12:09 +05:30
Dipal M Zambare	72f4a7ab1e	Increased pool buffer size to accommodate packing buffers needed in small_gemm to make it reentrant. Change-Id: I96ac19ce97c39becce2c6e7ab47c3e7624560b30	2019-12-19 14:45:13 +05:30
Meghana Vankadari	62e00b4d64	Merge "Change in threshold condition for trsm_small kernels" into amd-staging-rome-rel-2.1	2019-12-17 23:54:01 -05:00
Meghana Vankadari	8eb264f78b	Change in threshold condition for trsm_small kernels Change-Id: I396e246b1639d300fcb94bdf7e5fa8bc8c87e994	2019-12-16 18:54:48 +05:30
Devrajegowda, Kiran	1fe8edbed0	"Merge Selective Packing code from amd branch flame/blis" Change-Id: Ifbdf49735f56a66fbbc96dab6d3ca6069302daed	2019-12-16 14:48:53 +05:30
Nallani Bhaskar	a8af07f68c	Added support to handle unsupported storage formats in sgemmsup using normal/small gemm path Change-Id: I8762059c89e50f60e765a2a2983c5b2bdcdd8bc1	2019-12-13 15:31:28 +05:30
Kiran Devrajegowda	21224e8264	Merge "Revert " Merge Selective Packing code from amd branch flame/blis"" into amd-staging-rome-rel-2.1	2019-12-13 00:45:34 -05:00
Nallani Bhaskar	10a26a7357	Merge "Fix for CPUPL-550: AOCC clang compiler error. Resolved: Duplicate back to back declaration of a lable in asm file" into amd-staging-rome-rel-2.1	2019-12-13 00:25:49 -05:00
Kiran Varaganti	1650bcb623	Revert " Merge Selective Packing code from amd branch flame/blis" This reverts commit `e4a6af33f5`. Reason for revert: <Review not done> Change-Id: Iae548f949a81a66281023c860c2bcffdfdae21b2	2019-12-13 00:01:35 -05:00
Nallani Bhaskar	dc4e7d1203	Fix for CPUPL-550: AOCC clang compiler error. Resolved: Duplicate back to back declaration of a lable in asm file Change-Id: I82c386d5fc00139da74fa031980d65c6a3874bd0	2019-12-12 20:43:47 +05:30
Devrajegowda, Kiran	e4a6af33f5	Merge Selective Packing code from amd branch flame/blis Change-Id: I6d577f67ec84febe6af3635b10e5c9c77844ccd2	2019-12-12 15:22:21 +05:30
Kiran Varaganti	edc8f04dea	Merge "Fix for CPUPL-541,When threading is enabled blis-mt library gets generated otherwise blis.a gets generated for sequential builds. However blis.h header file will differ for both type of libraries. The difference is about enable/disable of defining BLIS_ENABLE_OPENMP or BLIS_ENABLE_PTHREADS when threading is enabled. Appropriate header file needs to be included in the application." into amd-staging-rome-rel-2.1	2019-12-11 01:26:06 -05:00
Nallani Bhaskar	44edee7404	Added support to handle 7x16,8x16,9x16 efficiently in 6x16n kernel	2019-12-10 16:09:46 +05:30
Kiran Varaganti	82ec21f1c7	Fix for CPUPL-541,When threading is enabled blis-mt library gets generated otherwise blis.a gets generated for sequential builds. However blis.h header file will differ for both type of libraries. The difference is about enable/disable of defining BLIS_ENABLE_OPENMP or BLIS_ENABLE_PTHREADS when threading is enabled. Appropriate header file needs to be included in the application. Change-Id: I82a4ae2bf529eedd83868e059a43749714cbe246	2019-12-10 15:40:27 +05:30
Kiran Varaganti	9b6c04d075	Merge " change in threshold condition for SUP and small kernels" into amd-staging-rome-rel-2.1	2019-12-08 23:42:25 -05:00
Devrajegowda, Kiran	3192914a1c	change in threshold condition for SUP and small kernels Change-Id: I7dbd30b2004c67122a639f081efc36e0f0d69fad	2019-12-09 01:31:58 +05:30
Kiran Varaganti	27d2b5a0db	Merge "Made some improvements to trsm_small kernels" into amd-staging-rome-rel-2.1	2019-12-06 05:21:34 -05:00
Meghana	17b3a2639e	Made some improvements to trsm_small kernels Interchanged some loops to favour column-major storage. Added check condiion to identify last column and load it using a 'for' loop to avoid memory accesses out of buffer Change-Id: Id5d2e16c65017a7f4b641d33228d23903efd09ac	2019-12-06 14:48:28 +05:30
Nallani Bhaskar	af94ba29cf	Added sup support for sgemm under zen and related frame work changes. Change-Id: Ia7e88b96d3a3617e8d24754f50db081ffe2e9955	2019-12-04 10:56:10 +05:30
Meghana Vankadari	31bfe8985f	re-enabling the boundary check condition for bli_dtrsm_small_AlXB. It was disabled by mistake in previous commits. Change-Id: Ib7d2d0c5e133ff10559ce3dc5f7e624707e43c11	2019-12-03 17:07:37 +05:30
Meghana	cef185250e	Fixed Segmentation fault in trsm_small kernels for the case AlXB. For matrix sizes which are not multiples of 4, trsm_small kernels access memory outside the allocated buffers which causes segmentation fault. This is fixed by handling each of the corner cases separately. Change-Id: Ia7cfad5d65339a209a7376cc1654382593c933af	2019-12-03 17:05:57 +05:30
Meghana	fb75044ea2	Removed zen and zen2 configurations from amd64 family amd64 family supports all the architectures before zen. Assigned (BLIS_ARCH_GENERIC+1) to BLIS_NUM_ARCHS in order to avoid update for every new architecture. Change-Id: I8241e643f6dfd0ebe272e053ca8b6a9c1463d9dc	2019-12-03 16:48:34 +05:30
Devrajegowda, Kiran	b074c5e09c	Added a macro MATRIX_INITIALISATION for matrix initialisation in test application Change-Id: I8e5c9902e603a549218d4e8509a481288792266d	2019-12-01 13:12:02 +05:30
prangana	d72b509fbb	Pass actual enum type to bli_mem_set_buf_type function if C++ Change-Id: I63b4926963c361429b001f7ae93d9b544e9be95b	2019-11-30 17:57:42 +05:30
prangana	13249e83e2	Replace bli_thread_init_rntm with bli_rntm_init_from_global in zen small gemm Change-Id: I14fb2795b483368580ff3fcf5f537723f3845377	2019-11-30 16:33:10 +05:30
prangana	e0fb039a60	Merge branch 'amd' of https://github.com/flame/blis into amd-blis-nov-mergetest Change-Id: I59325783883d67bb33e938aea8c34d8e3d6832fb	2019-11-30 12:52:14 +05:30
Field G. Van Zee	efa61a6c8b	Added missing bli_l3_sup_thread_decorator() symbol. Details: - Defined dummy versions of bli_l3_sup_thread_decorator() for Openmp and pthreads so that those builds don't fail when performing shared library linking (especially for Windows DLLs via AppVeyor). For now, these dummy implementations of bli_l3_sup_thread_decorator() are merely carbon-copies of the implementation provided for single- threaded execution (ie: the one found in bli_l3_sup_decor_single.c). Thus, an OpenMP or pthreads build will be able to use the gemmsup code (including the new selective packing functionality), as it did before `39fa7136`, even though it will not actually employ any multithreaded parallelism.	2019-11-29 16:17:04 -06:00
Field G. Van Zee	39fa7136f4	Added support for selective packing to gemmsup. Details: - Implemented optional packing for A or B (or both) within the sup framework (which currently only supports gemm). The request for packing either matrix A or matrix B can be made via setting environment variables BLIS_PACK_A or BLIS_PACK_B (to any non-zero value; if set, zero means "disable packing"). It can also be made globally at runtime via bli_pack_set_pack_a() and bli_pack_set_pack_b() or with individual rntm_t objects via bli_rntm_set_pack_a() and bli_rntm_set_pack_b() if using the expert interface of either the BLIS typed or object APIs. (If using the BLAS API, environment variables are the only way to communicate the packing request.) - One caveat (for now) with the current implementation of selective packing is that any blocksize extension registered in the _cntx_init function (such as is currently used by haswell and zen subconfigs) will be ignored if the affected matrix is packed. The reason is simply that I didn't get around to implementing the necessary logic to pack a larger edge-case micropanel, though this is entirely possible and should be done in the future. - Spun off the variant-choosing portion of bli_gemmsup_ref() into bli_gemmsup_int(), in bli_l3_sup_int.c. - Added new files, bli_l3_sup_packm_a.c, bli_l3_sup_packm_b.c, along with corresponding headers, in which higher-level packm-related functions are defined for use within the sup framework. The actual packm variant code resides in bli_l3_sup_packm_var.c. - Pass the following new parameters into var1n and var2m: packa, packb bool_t's, pointer to a rntm_t, pointer to a cntl_t (which is for now always NULL), and pointer to a thrinfo_t* (which for nowis the address of the global single-threaded packm thread control node). - Added panel strides ps_a and ps_b to the auxinfo_t structure so that the millikernel can query the panel stride of the packed matrix and step through it accordingly. If the matrix isn't packed, the panel stride of interest for the given millikernel will be set to the appropriate value so that the mkernel may step through the unpacked matrix as it normally would. - Modified the rv_6x8m and rv_6x8n millikernels to read the appropriate panel strides (ps_a and ps_b, respectively) instead of computing them on the fly. - Spun off the environment variable getting and setting functions into a new file, bli_env.c (with a corresponding prototype header). These functions are now used by the threading infrastructure (e.g. BLIS_NUM_THREADS, BLIS_JC_NT, etc.) as well as the selective packing infrastructure (e.g. BLIS_PACK_A, BLIS_PACK_B). - Added a static initializer for mem_t objects, BLIS_MEM_INITIALIZER. - Added a static initializer for pblk_t objects, BLIS_PBLK_INITIALIZER, for use within the definition of BLIS_MEM_INITIALIZER. - Moved the global_rntm object to bli_rntm.c and extern it where needed. This means that the function bli_thread_init_rntm() was renamed to bli_rntm_init_from_global() and relocated accordingly. - Added a new bli_pack.c function, which serves as the home for functions that manage the pack_a and pack_b fields of the global rntm_t, including from environment variables, just as we have functions to manage the threading fields of the global rntm_t in bli_thread.c. - Reorganized naming for files in frame/thread, which mostly involved spinning off the bli_l3_thread_decorator() functions into their own files. This change makes more sense when considering the further addition of bli_l3_sup_thread_decorator() functions (for now limited only to the single-threaded form found in the _single.c file). - Explicitly initialize the reference sup handlers in both bli_cntx_init_haswell.c and bli_cntx_init_zen.c so that it's more obvious how to customize to a different handler, if desired. - Removed various snippets of disabled code. - Various comment updates.	2019-11-29 15:27:07 -06:00
Devrajegowda, Kiran	c4047e491a	Merge branch 'amd-blis-nov-mergetest' into amd-staging-rome2.1 Change-Id: I1e04592dd9494faa34555008dd1edbca8a092a44	2019-11-29 23:01:51 +05:30
Dipal M Zambare	e6e66fb1f9	Fixed reentrancy issues with bli_sgemm_small() and bli_dgemm_small(). Replaced global buffer used for packing with the buffer provided by memory pools. These buffers are checkout at the beginning of each call and return the pool once done. Please check comment in the above functions for details. Change-Id: I76b3560f7efcc621a4455e834fce06f629c38f50	2019-11-27 19:10:16 +05:30
Dipal M Zambare	37badee648	Updated build infra to use python detected by auto config. Even though configure script check the availability of correct version of python, this information is not passed to makefiles. This results in python scripts getting involved without interpreter. This normally works fine as the script used the path for shebang, however it doesn't work if the command specified by shebang is alias. This also causes confusion that even though configure has found the python, we end up with python not found error during build. This fix will pass the detected version of the python interpreter to makefiles which solved both issues mentioned above. Change-Id: Ic04da77601ff8ad2a461e9f2f936470109cda22c	2019-11-26 14:57:47 +05:30
Meghana Vankadari	764d6f4643	changed configure script to support AOCC Change-Id: I86d2f36f42bc6cc7e6b950f4e85087753ce5bc40	2019-11-25 15:17:04 +05:30
Devrajegowda, Kiran	85fa9e4107	resolved merge conflicts when merged with public repo master branch Change-Id: Iad6ba809680ba5081cc9d7879794ef58cc8f8a40	2019-11-25 14:46:48 +05:30
Field G. Van Zee	bbb21fd0a9	Tweaked SIAM/SC Best Prize language in README.md.	2019-11-21 18:15:16 -06:00
Field G. Van Zee	043366f92d	Fixed typo in previous commit (SIAM/SC prize).	2019-11-21 18:13:51 -06:00
Field G. Van Zee	05a4d583e6	Added SIAM/SC prize to "What's New" in README.md.	2019-11-21 18:12:24 -06:00
Field G. Van Zee	881b05ecd4	Fixed blastest failure for 'generic' subconfig. Details: - Fixed a subtle and complicated bug that only manifested via the BLAS test drivers in the generic subconfiguration, and possibly any other subconfiguration that did not register complex-domain gemm ukernels, or registered ONLY real-domain ukernels as row-preferential. This is a long story, but it boils down to an exception to the "transpose the operation to bring storage of C into agreement with ukernel pref" optimization in bli_hemm_front.c and bli_symm_front.c sabotaging the proper functioning of the 1m method, but only when the imaginary component of beta is zero. See the comments in issue #342 for more details. Thanks to Dave Love for identifying the commit in which this bug was introduced, and other feedback related to this bug.	2019-11-21 16:34:27 -06:00

1 2 3 4 5 ...

1982 Commits