amd/blis - blis - Public git mirror

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-12 01:59:59 +00:00

Author	SHA1	Message	Date
phakumar	ccf0772d6e	BLIS library porting on to Windows: This library ported on Windows 10 using CMake scripts and Visual Studio 2019 with clang compiler AMD internal:[CPUPL-657] Change-Id: Ie701f52ebc0e0585201ba703b6284ac94fc0feb9	2020-06-16 18:29:00 +05:30
Dipal M Zambare	dad7e2f235	Added support multiple trace levels & optimization of file size requirements Multiple trace levels will allow user to set the nested call levels up to which the traces to be limited. It will also reduce file size requirements. Also optimized auto trace output to reduce file size by removing thread ID's from individual lines. AMD Internal: [CPUPL-806] Change-Id: I28e08a5bdf1b147469d8ce290ff7cde7f74481bd	2020-06-10 16:00:49 +05:30
Dipal M Zambare	305c744131	Added traces in dgemm and sgemm paths. Added traces from blas/cblas API's till kernels for dgemm and sgemm. By default the traces will be disabled, user need to enable them in their local workspace, please check aocl_dtl/aocldtlcf.h file. AMD Internal : CPUPL-806 Change-Id: I83b310509fb1a599c114387192bcf882ef0480f9	2020-06-08 12:01:22 +05:30
Meghana	9fce1ec4a4	Optimized SGEMV kernel and changed BLAS interface call Details: - Optimized saxpyf kernel with fuse_factor=5 and iter_unroll=2. - Modified framework files of sgemv to remove dependency on cntx variable. - Updated cntx_init file of zen2 to choose optimized kernels. - Modified BLAS interface call for SGEMV to reduce framework overhread. - Currently these changes are applicable for zen2 configuration. Change-Id: Iabc36ae640e82e65f8764f3c6dee513ad64b22fd Signed-off-by: Meghana Vankadari <Meghana.Vankadari@amd.com> AMD-Internal: [CPUPL-707]	2020-06-04 02:49:08 -04:00
managalv	b4e599ecc2	CPUPL-929: Improve Complex GEMM performance - Support all storage formats and non Transpose/Conjugate Matrices Failure was seen in libflame function (FLASH_UDdate_UT_inc) Due to typecasting double complex pointer as double pointer Change-Id: If6e2f4663575450a13a9a07dddd5622628f5c6b0	2020-06-02 22:27:54 +05:30
Nallani Bhaskar	6f01cd2c54	Fix for sblat3.x failure in make check Details: Using of ymm registers storing 8 float values than 4 floats values Changed register from ymm to xmm in required places. This can be found only when leading dimension is greater than the actual dimension. Change-Id: I39f04eac18c4fa3a8c93048c977d6a83aa92b800	2020-06-01 17:04:59 +05:30
managalv	f7bc37ea32	CPUPL-929: Improve Complex GEMM performance - Support all storage formats and non Transpose/Conjugate Matrices Details Added Support of N SUP kernel for complex float and complex double Removed prefetching in M SUP kernels for complex float and complex double Removed all warnings Change-Id: I05ffde0f0613681927fe7576db7f5f1a4486fd05	2020-06-01 06:24:12 -04:00
Kiran Varaganti	c8f3cec5f7	Merge "Code cleanup in 6xk DGEMM pack Kernel" into amd-staging-rome-2.2	2020-06-01 05:08:58 -04:00
Nallani Bhaskar	5e0ad13f8e	Code Cleanup and replaced vzeroall with vxorps Change-Id: I74c2cc2183a407aad86eab5c3285c33690de9abd	2020-06-01 10:14:06 +05:30
Nallani Bhaskar	2413c31672	CPUPL-923: Implemented dot Product Kernels in SGEMM SUP for transpose cases. Details: Added two new kernels bli_sgemmsup_rd_zen_asm_6x16m and bli_sgemmsup_rd_zen_asm_6x16n to support dot product in Row Major (A * Tranpose(B)) and in Column Major (Tranpose(A) * B) Change-Id: I264fd75c4c4b68fb7dc4fd229eaa44d09e9f3432	2020-05-31 22:37:03 +05:30
Kiran Varaganti	3ebd5f8aa0	Code cleanup in 6xk DGEMM pack Kernel Removed conditional check if(*kappa_cast==0.0) in 6xk dgemm packing kernel Change-Id: Ie543787133d303aeb2532e67b83d6ba96e3d558e	2020-05-31 21:41:45 +05:30
Kiran Varaganti	f8ddd48594	Code Clean-up in DGEMM packing kernels Removed conditional check for (*kappa_cast == 1.0) because its always 1.0 in DGEMM packing kernels. [CPUPL-636] Change-Id: Ib04f2a3cdbb0f138036a8b0486d1dec073e40407	2020-05-30 21:55:29 +05:30
prangana	0c52aaefe1	Merge branch 'ref/heads/amd-staging-rome-2.2' of ssh://git.amd.com:29418/cpulibraries/er/blis into amd-staging-rome-2.2 Change-Id: I46acf48354ff73fb4eaeac255132d21095ea4d98	2020-05-30 10:31:10 +05:30
prangana	bb7eeec843	Change loop test expression in bli_packm_zen_int.c PRAGMA SIMD loop has issues with test expression (k !=0) Changed usage to (k > 0) Change-Id: I50204dbd0194de43f0d6cdcbfc586bb16aa25968	2020-05-30 10:00:21 +05:30
Kiran Varaganti	739803a441	DGEMM Packing Kernels for Native DGEMM implementation [CPUPL-858] Packing kernels for dgemm 6x8 kernel are added explicitly for zen2 configuration. Apart from generic packing kernels used by level-3 routines and for all combinations of the input parameters, introduced DGEMM specific packing kernels for the case op(A) & op(B) is no transpose. This helps us to vectorize these packing kernels and eliminate un-necessary branch conditional checks. The packed kernels are also optimized at the boundary. These boundary condition optimization help when the input matrix dimensions "m" and "n" are not multiples of register block-sizes "MR & NR". Typical DGEMM operation is C = betaC + alpha op(A) * op(B). Kindly note the multiplication with alpha is handled inside kernel, hence in these dgemm packing routines alpha is always consider 1.0. These routines are "bli_dpackm_8xk_nn_zen" & "bli_dpackm_6xk_nn_zen". The generic packing routines are "bli_dpackm_6xk_gen_zen" & bli_dpackm_8xk_gen_zen". These routines are enabled from "bli_cntx_init_zen2()" through bli_cntx_set_packm_kers(). In this checkout wthe generic packing kernels are enabled by default". Later will introduce run-time mechanism to change these packing kernels based on the DGEMM input parameters. Change-Id: I079b4dce0757d558224cb8c55d024bfea6a4de91	2020-05-28 02:01:43 -04:00
managalv	154bedc785	CPUPL-929:Improve Complex GEMM performance Removed print which was part of kernel Change-Id: I288e0151ba8da8d6dd4415734c88ed3474ba3a5b	2020-05-22 14:39:12 +05:30
Guodong Xu	72443e7173	avoid loading twice in armv8a gemm kernel (#403 ) This bug happens at a corner case, when k_iter == 0 and we jump to CONSIDERKLEFT. In current design, first row/col. of a and b are loaded twice. The fix is to rearrange a and b (first row/col.) loading instructions. Change-Id: I4a985a3abf9b1e7a0ee29e17c7d39a4a27138c4c Signed-off-by: Guodong Xu <guodong.xu@linaro.org>	2020-05-21 12:37:53 +05:30
Guodong Xu	66ec22705b	New kernel set for Arm SVE using assembly (#396 ) Here adds two kernels for Arm SVE vector extensions. 1. a gemm kernel for double at sizes 8x8. 2. a packm kernel for double at dimension 8xk. To achive best performance, variable length agonostic programming is not used. Vector length (VL) of 256 bits is mandated in both kernels. Kernels to support other VLs can be added later. "SVE is a vector extension for AArch64 execution mode for the A64 instruction set of the Armv8 architecture. Unlike other SIMD architectures, SVE does not define the size of the vector registers, but constrains into a range of possible values, from a minimum of 128 bits up to a maximum of 2048 in 128-bit wide units. Therefore, any CPU vendor can implement the extension by choosing the vector register size that better suits the workloads the CPU is targeting. Instructions are provided specifically to query an implementation for its register size, to guarantee that the applications can run on different implementations of the ISA without the need to recompile the code." [1] [1] https://developer.arm.com/solutions/hpc/resources/hpc-white-papers/arm-scalable-vector-extensions-and-application-to-machine-learning Signed-off-by: Guodong Xu <guodong.xu@linaro.org>	2020-05-21 11:56:45 +05:30
Field G. Van Zee	9e76059f15	Renamed bli_thread_obarrier(), _obroadcast(). Details: - Renamed two bli_thread_*() APIs: bli_thread_obarrier() -> bli_thread_barrier() bli_thread_obroadcast() -> bli_thread_broadcast() The 'o' was a leftover from when thrcomm_t objects tracked both "inner" and "outer" communicators. They have long since been simplified to only support the latter, and thus the 'o' is superfluous. Change-Id: If9ec9a2383dfb02e1cfc74918f87a1fabddbd55b	2020-05-21 11:54:37 +05:30
managalv	f630b3fc36	CPUPL-929:Improve Complex GEMM performance Details: SUP support added for ZGEMM for different storage formats in M direction SUP kernels and sub kernels are implemented to cover all dimensions of square matrix SUP kernels supports RRR, RCR, CRR, CCR storage formats Change-Id: I2c846a430dfcf356cac8ebf62015b1f743157381	2020-05-20 17:36:04 +05:30
Meghana Vankadari	9ea0472f4c	Replaced all the instances of zen_basic with zen_ref_c Change-Id: Id53f2c1ce7e9878991a831c3651061f0b679b080 Signed-off-by: Meghana Vankadari <Meghana.Vankadari@amd.com> AMD-Internal: [CPUPL-885]	2020-05-19 20:27:17 +05:30
Meghana Vankadari	4fcc4e499d	Optimized DGEMV kernel and changed BLAS interface call Details: - Optimized daxpyf kernel with fuse_factor=5 and iter_unroll=2. - Modified framework files of dgemv to remove dependency on cntx variable. - Updated cntx_init file of zen2 to choose optimized kernels. - Modified BLAS interface call for DGEMV to reduce framework overhread. - Currently these changes are applicable for zen2 configuration. They will be enabled for zen family processors in future. - Changed naming convention for new BLAS macros to indicate their use. - Added new optimized kernel for axpyf under zen2 folder. - Implemented basic GEMV kernel without using axpyv or axpyf. This kernel is chosen for small sizes. Change-Id: I4278d37e494854879c71499b8b9da8c5dbe3bf5b Signed-off-by: Meghana Vankadari <Meghana.Vankadari@amd.com> AMD-Internal: [CPUPL-885]	2020-05-19 06:40:44 -04:00
managalv	af1ad806f2	CPUPL-929: Improve Complex GEMM performance - Support all storage formats and non Transpose/Conjugate Matrices Details: Supports cgemm SUP all storage formats for XXR format Change-Id: I1f1ac6b47f0b54141acac65e2cb4f3a2aaa3bac6	2020-05-18 21:06:57 +05:30
managalv	310dda928f	CPUPL-709: Improve Complex GEMM performance - Level 1 Optimization Details Added SUP support for cgemm in M direction SUP kernels are 3x8m, 3x4m, 3x2m is implemeted Sub kernels are implemented to support various dimenions SUP CGEMM supports matrix C & A row/col major and Matrix B is row major matrix Change-Id: Ia6854b929d3b5741a4900422d05df1257f5d014d	2020-05-18 20:43:49 +05:30
Nallani Bhaskar	b3a308b689	CPUPL-948: Selective Packing changes are imlplemented in sgemm sup Description: Pannel strides are updated using variables rather than constant values to support selecive packing in sgemm sup kernels Change-Id: Ic098eb70592d12d7d2174a1166aebf3bc749140c	2020-05-18 11:46:33 +05:30
Devrajegowda, Kiran	6f33fd6aac	Modified Function definition for BLAS and CBLAS interfaces of ?SCALV API Details: -Kernel is called directly from API call to avoid framework overhead in case of single and double precisions. -Currently these changes are applicable only for zen2 configuration. They will be enabled for zen family processors in future. -These changes improve performance of BLAS and CBLAS interfaces of API. They do not affect BLIS-specific APIs. -setv simd kernel is added for single and double precision elements Change-Id: I1b343aa232f2571717c2b01ada5914f869883e1a Signed-off-by: Kiran ND <Kiran.Devrajegowda@amd.com> AMD-Internal: [CPUPL-817]	2020-05-13 01:51:48 -04:00
Nallani Bhaskar	49cd7a96d5	CPUPL-866: ZenDNN gtest cases failing with blis 2.1 and later releases Change-Id: Ib9ddfb133576d06cea6642fc3fefd818317fe922	2020-05-03 13:00:43 +05:30
Devrajegowda, Kiran	4caee59466	Adding a simd kernel for copyv function Details: - Separate kernel for copyv function added to improve performance. - Modified cntx_init file in zen and zen2 configuration - Added test_copyv.c in test folder - Modified test/Makefile to include test_copyv.c Change-Id: I297f539f2ddd2d71997b127a71a460991cd07b41 Signed-off-by: Kiran N D <kiran.Devrajegowda@amd.com> AMD-Internal: [CPUPL-818]	2020-04-24 01:55:25 -04:00
Meghana	b846059bcf	Added opt kernels for SWAPV Details: -Added SIMD kernels for SWAPV for both single and double precisions. -Modified cntx_init file for zen and zen2 configurations to choose opt kernels for SWAPV. -Added test_swapv.c in test folder. -Modified test/Makefile to include test_swapv.c Change-Id: Ida786eec722e634aee0dacdd51c327823c80f01a Signed-off-by: Meghana Vankadari <Meghana.Vankadari@amd.com> AMD-Internal: [CPUPL-847]	2020-04-20 01:21:44 -05:00
Meghana	e56cf63a3f	Optimized "bli_dotv_zen_int10" kernels Details: - Fixed issues in "bli_dotv_zen_int10" kernels and optimized them. - Changed cntx_init file to choose "bli_dotv_zen_int10" kernel for dotv API call. Change-Id: Iee8d7519f3a22a2d41166390be6047e9cb37557f Signed-off-by: Meghana Vankadari <Meghana.Vankadari@amd.com> AMD-Internal: [CPUPL-824]	2020-04-14 09:52:57 +05:30
Meghana	c20c96d9c0	Made some critical changes to small_gemm kernels Details: - In case of GEMM, whenever beta is zero, we need to perform C = alpha (A B) instead of C = beta * C + alpha * (A * B) Added conditions to check the value of beta at different levels inside small_gemm kernels and decide whether to perform scaling C with beta or not. -Modified small_gemm kernels to use BLIS specific functions to retrieve different fields of objects. -Calling bli_gemm_check before entering bli_gemm_small to facilitate early return in case of invalid inputs. -For corner cases inside small_gemm kernels, a buffer called f_temp is used to load and store data to and from registers. populating the buffer with zeroes before use. -In bli_gemm_front, datatypes of status and return value from bli_gemm_small are not matching. Corrected the datatype of the variable 'status' inside bli_gemm_front to err_t. Change-Id: I8b52ad55008f028d6c8b7e0d20f746a869d9daea Signed-off-by: Meghana Vankadari <Meghana.Vankadari@amd.com> AMD-Internal: [CPUPL-689,SWLCSG-104]	2020-03-19 16:30:04 +05:30
Nallani Bhaskar	83745c7ffc	Beta Zero Check for sgemm small. Core Software Group SWLCSG-137 BLIS-ST validation failures Change-Id: I21d5eae6ec390438be847f2dca42350b97059d6e	2020-03-09 02:55:51 -04:00
Nallani Bhaskar	e0c95d77e1	Beta Zero Checks for sgemm_small Change-Id: I111b66ad54a27b1977d155904738a55a351e6689	2020-03-09 02:55:25 -04:00
dzambare	f965b95d8b	CPUPL-587: Corrected condition for A packing in sgemm_small Change-Id: I1e5dc4a1dbe2f1d17f9c72e8dd0c6728ac1fd750	2020-01-27 11:08:20 +05:30
Meghana	b3e2938b9e	Fix for CPUPL-549: TRSM for AlXB case results in NaN values For the kernel of size 4x8, cs_b is used instead of cs_a to calculate address of diagonal elements of matrix A. Correcting the mistake. Change-Id: Ie74e0f6a397fcd32fefb5804cd00f1e90bfe5523	2019-12-21 23:12:09 +05:30
Dipal M Zambare	72f4a7ab1e	Increased pool buffer size to accommodate packing buffers needed in small_gemm to make it reentrant. Change-Id: I96ac19ce97c39becce2c6e7ab47c3e7624560b30	2019-12-19 14:45:13 +05:30
Meghana Vankadari	62e00b4d64	Merge "Change in threshold condition for trsm_small kernels" into amd-staging-rome-rel-2.1	2019-12-17 23:54:01 -05:00
Meghana Vankadari	8eb264f78b	Change in threshold condition for trsm_small kernels Change-Id: I396e246b1639d300fcb94bdf7e5fa8bc8c87e994	2019-12-16 18:54:48 +05:30
Devrajegowda, Kiran	1fe8edbed0	"Merge Selective Packing code from amd branch flame/blis" Change-Id: Ifbdf49735f56a66fbbc96dab6d3ca6069302daed	2019-12-16 14:48:53 +05:30
Kiran Devrajegowda	21224e8264	Merge "Revert " Merge Selective Packing code from amd branch flame/blis"" into amd-staging-rome-rel-2.1	2019-12-13 00:45:34 -05:00
Nallani Bhaskar	10a26a7357	Merge "Fix for CPUPL-550: AOCC clang compiler error. Resolved: Duplicate back to back declaration of a lable in asm file" into amd-staging-rome-rel-2.1	2019-12-13 00:25:49 -05:00
Kiran Varaganti	1650bcb623	Revert " Merge Selective Packing code from amd branch flame/blis" This reverts commit `e4a6af33f5`. Reason for revert: <Review not done> Change-Id: Iae548f949a81a66281023c860c2bcffdfdae21b2	2019-12-13 00:01:35 -05:00
Nallani Bhaskar	dc4e7d1203	Fix for CPUPL-550: AOCC clang compiler error. Resolved: Duplicate back to back declaration of a lable in asm file Change-Id: I82c386d5fc00139da74fa031980d65c6a3874bd0	2019-12-12 20:43:47 +05:30
Devrajegowda, Kiran	e4a6af33f5	Merge Selective Packing code from amd branch flame/blis Change-Id: I6d577f67ec84febe6af3635b10e5c9c77844ccd2	2019-12-12 15:22:21 +05:30
Nallani Bhaskar	44edee7404	Added support to handle 7x16,8x16,9x16 efficiently in 6x16n kernel	2019-12-10 16:09:46 +05:30
Kiran Varaganti	9b6c04d075	Merge " change in threshold condition for SUP and small kernels" into amd-staging-rome-rel-2.1	2019-12-08 23:42:25 -05:00
Devrajegowda, Kiran	3192914a1c	change in threshold condition for SUP and small kernels Change-Id: I7dbd30b2004c67122a639f081efc36e0f0d69fad	2019-12-09 01:31:58 +05:30
Kiran Varaganti	27d2b5a0db	Merge "Made some improvements to trsm_small kernels" into amd-staging-rome-rel-2.1	2019-12-06 05:21:34 -05:00
Meghana	17b3a2639e	Made some improvements to trsm_small kernels Interchanged some loops to favour column-major storage. Added check condiion to identify last column and load it using a 'for' loop to avoid memory accesses out of buffer Change-Id: Id5d2e16c65017a7f4b641d33228d23903efd09ac	2019-12-06 14:48:28 +05:30
Nallani Bhaskar	af94ba29cf	Added sup support for sgemm under zen and related frame work changes. Change-Id: Ia7e88b96d3a3617e8d24754f50db081ffe2e9955	2019-12-04 10:56:10 +05:30

1 2 3 4 5 ...

320 Commits