amd/blis - blis - Public git mirror

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-07-18 09:37:52 +00:00

Author	SHA1	Message	Date
Dipal Madhukar Zambare	1638ff7605	Merge "DTL logs corrections" into amd-staging-milan-3.1	2021-06-06 23:20:22 -04:00
Nageshwar Singh	61b7584580	Bench addition for amaxv API AOCL-Internal: [CPUPL-1591] Change-Id: Ia9754dfed1a7302d5c267858f9005c8f64e28b46	2021-06-04 17:45:04 +05:30
Nageshwar Singh	ecfbdd16a8	Added bench utility for trsv API AOCL-Internal: [CPUPL-1591] Change-Id: I5953e13e9c75f620987ea92d92d1b1d7b5bfd043	2021-06-04 08:05:37 -04:00
Dipal M Zambare	2f344f5df1	DTL logs corrections -- Fixed issues in printing the values of side, uploa and diaga parameters for hemm, hemv, her, her2, her2k, herk, symm, symv, syr, syr2, syr2k, syrk, trmm, trmv, trsm, trsv. -- For above API's logging was called with MKSTR() for side, uploa and diaga parameters. MKSTR is needed only for macro arguments but not for function's arguments. -- Added space between function name and data type where it was missing. Bench expects logs in this format. AMD-Internal: [CPUPL-1585] Change-Id: Ib6ab66890e68cfa52860f869d6a1c34e78036a2d	2021-06-04 15:24:13 +05:30
Dipal M Zambare	849e1cee0a	Updated version number to 3.0.1. Change-Id: I07d5c26bb96b590854e1f81d41ed49a5e320f60e	2021-06-03 15:48:05 +05:30
Nagarapu Phanikumar	7ea32e6d0b	Merge " Unifying BLIS Windows and Linux codebase" into amd-staging-milan-3.1	2021-06-03 06:03:26 -04:00
nphaniku	2bdee3cd6c	Unifying BLIS Windows and Linux codebase 1. Removed dependency on bli_config.h inclusion in blis.h 2. Provided AOCL DYNAMIC / TRSM PRE INVERSION / COMPLEX RETURN configuration flags. 3. CMAKE changes to incorporate new changes as per 3.1 code base. 4. Removed zen2 folder from Windows directory. AMD Internal : [CPUPL-1532] Change-Id: I9261851087d10f73ab563d466fa3f7bb72ddee47	2021-06-03 15:28:10 +05:30
mkurumel	9afbb11b4f	DTL Logging bug in GEMV Details : - Fixed Incorrect Macro used in dgemv and cgemv Trace logging exit. AMD-Internal: [CPUPL-1403] Change-Id: Icac502d8d4adad112754d9c764a30d3db56a743f	2021-06-02 21:21:00 +05:30
mkurumel	99e3bce065	SGEMV : single Precision axpyf kernel optimization for SGEMV Details : - Implemented saxpyf kernel with fuse factor=6 for sgemv. AMD-Internal: [CPUPL-1403] Change-Id: I72fd30c08a789603267cf58910138549d45d231a	2021-06-02 07:55:48 -04:00
Nageshwar Singh	2e1a5bc1dd	Optimized double complex axpyf kernel for zgemv Details: - Implemented zaxpyf kernel with fuse factor=4 for zgemv. - Modified BLAS interface call for zgemv to reduce framework overhead. - Directed gemv to dotv in the case where dimension of y vector is 1. - when alpha = 0, gemv becomes scalv of Y with beta. Added code to return early after scaling Y vector with beta. AMD-Internal: [CPUPL-1402] Change-Id: I2231285fe3060982d4434466346a040b7ab803fc	2021-06-01 18:03:29 +05:30
Meghana Vankadari	3804e301c9	Fixed a bug in Level-3 bench files where ldc = 1 Details: - To determine whether matrices are col-stored, we were checking ldc == 1. This is incorrect as a matrix can be col-stored with ldc = 1 if dimension is 1. - Modified the condition to check row_stride instead of col stride. if row-stride != 1, we can assume that matrices are not col-stored and ignore those inputs by printing an error message. Change-Id: Id4d5b971104eb11cbcdd6d22c5c620febefd3a87	2021-06-01 10:57:18 +05:30
Kiran Varaganti	ff84d37930	Merge "SUP GEMM - Enable only block panel (var2m)" into amd-staging-milan-3.1	2021-05-31 06:46:04 -04:00
Meghana Vankadari	887ecb46e0	Added threshold logic for SYRK Details: - Added decision logic to choose between SUP and native implementations of SYRK for zen2 architectures. - For architectures other than zen2 it will be redirected to gemm threshold function. Change-Id: I350578cc4f930e85b9581e4d9aed220e71a2171d	2021-05-31 05:34:38 -04:00
Kiran Varaganti	aa9f5b8b37	SUP GEMM - Enable only block panel (var2m) Completely disabling supvar1n (Panel Block) gemm to simplify things supvar1n perform better only when m >> and n=k=small (<10). This simplification will improve performance for m = n shape dgemm. Change-Id: I523fcb211e8ab92718ea7367f9707a38275e24b1	2021-05-30 21:22:44 +05:30
Madan mohan Manokar	6d6f746190	3m1 turning OFF since 3m1 is turned off in bla_gemm.c, setting FALSE for 3m1 in bli_l3_ind_oper_st AMD-Internal: [CPUPL-1592] Change-Id: I80dfe7c993f9edfbf752b7351cfdaa22a9e60035	2021-05-26 10:06:54 +05:30
Kiran Varaganti	ae6b6a7b7c	Merge "Fix a bug in bench_gemm.c" into amd-staging-milan-3.1	2021-05-25 05:00:05 -04:00
Meghana Vankadari	4446395047	Redirecting dgemv to axpyf based implementation for smaller sizes. AMD-Internal: [CPUPL-1403] Change-Id: I0ff2763c41c5ae598c58bc250adc317d7f8a4994	2021-05-25 01:39:12 -04:00
satish kumar nuggu	82087773a0	Optimized single threaded dtrsm small for right cases Details: 1. Added optimized dtrsm kernels for all 8 right side cases Below are few notable optimizations which improved performance a. Loading, transposing (for transa cases), packing and reusing of a01 block required for GEMM operation. The block size increases from 0 to 6X(n-6) in steps of 6x6 while solving TRSM from one end of A to other end of triangular A b. Packing of 6 diagonal elements in one location helped to utilize cache line efficiently AMD-Internal: [CPUPL-1563] Change-Id: Iabd37536216d5215fc69ee1f8ec671b52f1be9d3	2021-05-25 01:09:50 -04:00
Meghana Vankadari	8c9a7c21b4	Optimized axpyf kernel for scomplex datatype Details: - Implemented axpyf kernel with fuse factor=4 for scomplex datatype. - Modified BLAS interface call for cgemv to reduce framework overhead. - Directed gemv to dotv in the case where dimension of y vector is 1. - when alpha = 0, gemv becomes scalv of Y with beta. Added code to return early after scaling Y vector with beta. AMD-Internal: [CPUPL-1402] Change-Id: Ibaab078008d76953332ba4da3515993578c0e586	2021-05-24 14:40:17 +05:30
Kiran Varaganti	492f54fb5e	Fix a bug in bench_gemm.c When op(A) or op(B) = transpose - the leading dimensions of these matrices altered. Commented out the statements "if(transa) lda = ..." similarly for matrix B and corrected this mistake in both column and row storages. Provide a provision to call BLIS interfaces when row-major inputs are used. Change-Id: Id2041af219a64567471c14190f283274d1df2f7f	2021-05-24 12:59:28 +05:30
Dipal M Zambare	5f53d14971	Added bench utility for dotv and scalv APIs. - Added bench utility for dotv and scalv API's - Corrected logging for scalv to handle complex types - Corrected logging to remove transpose field from dotv logs AOCL-Internal: [CPUPL-1577] Change-Id: Ieb29e773309de1520c7fa5b79b97c943d894ba07	2021-05-21 10:00:32 +05:30
Dipal Madhukar Zambare	dac15bdb3f	Merge "Added bench utility for ger API." into amd-staging-milan-3.1	2021-05-19 08:17:09 -04:00
Dipal Madhukar Zambare	b2f7c7f019	Merge "Fixed crash issue in bench utility for gemv API" into amd-staging-milan-3.1	2021-05-19 08:15:40 -04:00
Dipal M Zambare	413814fe70	Fixed crash issue in bench utility for gemv API - incx and incy was not considered while allocating memory for x and y vectors. - Updated test data set AMD-Internal: [CPUPL-1578] Change-Id: I374a75aaa66f951f0f8353434d94c135d09b2f05	2021-05-19 14:21:09 +05:30
Dipal M Zambare	0e82783f1c	Added bench utility for ger API. AOCL-Internal: [CPUPL-1577] Change-Id: Icc7a4590f605d7273077a7d2a42d4ecbafed2248	2021-05-19 14:05:01 +05:30
Nallani Bhaskar	b2e68b9812	Merge "Added optimized single threaded dtrsm small for left cases" into amd-staging-milan-3.1	2021-05-19 00:47:56 -04:00
Nallani Bhaskar	3a2e4c3db8	Added optimized single threaded dtrsm small for left cases Details: 1. Added optimized dtrsm kernels for all 8 left side cases Below are few notable optimizations which improved performance a. Loading, transposing (for transa cases), packing and reusing of a10 block required for GEMM operation. The block size increases from 0 to 8X(m-8) in steps of 8x8 while solving TRSM from one end of A to other end of triangular A b. Performing inregister transpose whenever required c. Packing of 8 diagonal elements in one location helped to utilize cache line efficiently 2. Enabled calling dtrsm small for smaller sizes at cblas level itself to avoid frame work overhead, which is significant for very small sizes 3. Thanks to SatishKumar.Nuggu@amd.com for implementing lln, llt, lun and manideep.kurumella@amd.com for implementing lut kernels using intrinsics. 4. Removed all older implementations of strsm which are not developed as per the guide lines, can be refered from older releases if required. Change-Id: I66ad6ef364cbcf5c99a3c4a4dcac12929865ade6	2021-05-18 16:16:00 +05:30
Dipal Madhukar Zambare	1605fea83e	Merge "Re-merged the gemmt testsuite file." into amd-staging-milan-3.1	2021-05-18 04:20:11 -04:00
Dipal Madhukar Zambare	6d8e2a36b3	Merge "Fixed blastest failure for amd64 configuration" into amd-staging-milan-3.1	2021-05-18 03:57:18 -04:00
Dipal M Zambare	9e27065c2b	Fixed blastest failure for amd64 configuration - When building for amd64 configuration, small matrix support for dgemm is not enabled (yet). Functions supporting small matrix implementation are called even when small matrix support is disabled. Update code to prevent this. AMD-Internal: [CPUPL-1575] Change-Id: I3a1692e965679cfde44938b1d26951145c790aa0	2021-05-18 03:24:06 -04:00
Nallani Bhaskar	a59796ef16	Updated leading dimensions for transpose case in gemm bench 1. Updated lda, ldb based on trans flags 2. Updated deriving storage type using leading dimension 2. Cleanup and alignment 3. Included transpose and row major cases in inputgemm.txt Change-Id: I25f5cd522eb64f212445d98f4682132bf5a330b6	2021-05-14 15:26:20 +05:30
Dipal M Zambare	21130ebece	Added configure option for AOCL Dynamic feature. - AOCL Dynamic feature is added in BLIS which determines optimal number of threads for the current problem size. - This feature can be enabled/disabled by modifying the source code - This change adds support to enable/disable this feature during configuration time by adding a new option in configure script AOCL-Internal : [CPUPL-1565] Change-Id: I590693f793cabc44d27a7f815adc41631dd01bbe	2021-05-12 00:41:13 -04:00
Meghana Vankadari	a3600d395d	Added bench app for syrk - input is a log file generated from AOCL_DTL Change-Id: I25dd695dea267a89a5c666d66abc4b91a57956c8	2021-05-11 14:57:51 +05:30
Dipal Madhukar Zambare	2b80e8824a	Merge "Added bench utility for gemv API." into amd-staging-milan-3.1	2021-05-11 01:09:22 -04:00
Dipal M Zambare	08424e8896	Added bench utility for gemv API. AMD-Internal: [CPUPL-1558] Change-Id: Iaba1aa164fa589fa7f5047f314b26a24c4c2c3a7	2021-05-10 15:01:47 +05:30
Nageshwar Singh	a88cb82cec	Revert "Adding trans h support in bench_gemm.c" This reverts commit `791903b31c`. Change-Id: I24403cced67ea9e851adb58a8bf01a3e17bb4e85	2021-05-07 04:11:30 -04:00
Meghana Vankadari	dc2d6ee763	Moved dynamic threading function from GEMMT to GEMM Details: - Current tuning for choosing optimal number of threads is done for GEMM. - Dynamic thread calculation function was placed in gemmt code flow instead of gemm by mistake. Fixing it with this commit. AMD-Internal: [CPUPL-1376] Change-Id: Iccb42a7a617b9b4cdb4c4af9be21aa82aaaabbcc	2021-05-07 12:10:53 +05:30
Meghana Vankadari	33ddf2e448	Fixed blastest failure for haswell configuration Details: - Placed optimized version of BLAS DGEMM, ZGEMM definitions under BLIS_CONFIG_EPYC as they use gemm small which are defined only for zen family configurations. - Added code to query and set cntx in gemv and trsv framework before cntx is referred for any function pointers to avoid querying from NULL pointer. AMD-Internal: [CPUPL-1562] Change-Id: I977d028ec4ddb57dcdc70e443e7708f36c01cca9	2021-05-07 01:49:54 -04:00
Meghana Vankadari	eea347b02e	Added dynamic threading support for GEMM SUP code path Details: - Introduced new feature called AOCL_DYNAMIC. - When this macro is defined, Optimum number of threads to solve DGEMM is estimated based on the dimensions (M,N,K). - Range of optimum number of threads will be [1, num_threads], where "num_threads" is number of threads set by the application. - Num_threads is derived from either environment variable "OMP_NUM_THREADS or BLIS_NUM_THREADS' or bli_set_num_threads() API. - Only local copy of rntm is modified by AOCL_DYNAMIC feature. global_rntm data structure remains unchanged in order to keep track of original number of threads set by application. - Optimum number of threads calculation is done only for SUP. - Since 'native' code path handles larger problem sizes, we use max number of threads recommended by the application. AMD-Internal: [CPUPL-1376] Change-Id: I665ce14543d6719857d70325c4a9f959c08e66e3	2021-05-07 09:52:51 +05:30
Dipal M Zambare	29bfedad30	Re-merged the gemmt testsuite file. - Verified merge of all gemmt related files - Corrected testsuite/src/test_gemmt.c AMD-Internal: [CPUPL-1561] Change-Id: I5fe03b8e3754e4ed96c927ef7570be6f9d4f528b	2021-05-06 18:08:28 +05:30
Kiran Varaganti	433f17b6cd	bench_gemmt Bug Fix Fix reading input parameters Interchange the reading of n and k, first n appears and then k appears in the logs. Added comments to explain the format of the input gemmt log. Change-Id: I44c6081d4449ba210728bc089c4215d5eef18834	2021-05-06 14:54:15 +05:30
managalv	c420bd63e2	Enabled optimised packed routines on zen3 Change-Id: I5eb57f8ab2cccd20d0f778ada539fd1474cf6338	2021-05-06 01:25:08 -04:00
Madan mohan Manokar	c1fa9abe32	zgemm native path tuning 1. NC and MC values are tuned for both single-instance and multi-instance run. 2. zen2 and zen3 configs updated. 3. SUP path disabled for zgemm, since tuned native path performed better. To be re-enabled after setting right threshold for SUP selection. AMD-Internal: [CPUPL-1442] Change-Id: I0eb86926744d2983530a443e20e3e4e2ee3f3239	2021-05-06 01:15:35 -04:00
Dipal Madhukar Zambare	821fa267c9	Merge "Updated makefiles to fix issues introduced in merge" into amd-staging-milan-3.1	2021-05-05 23:42:15 -04:00
Meghana Vankadari	dc71602895	Merge "Added sup functionality for SYRK" into amd-staging-milan-3.1	2021-05-05 06:26:03 -04:00
Dipal M Zambare	7454cca9e7	Updated makefiles to fix issues introduced in merge - Updated Makefile to include DTL files in library build - Updated Makefile to include cpp header file installation - Updated test/makefile to include extra API added by AMD team. AMD-Internal: [CPUPL-1559] Change-Id: I249c6935d5ff5fb645f9deec7e0218575484be13	2021-05-05 14:59:15 +05:30
Nallani Bhaskar	f917d826b5	Updated test application to work with row major cblas Details: 1. Fixed reading leading dimenstions in test_gemm.c based on row/col major 2. Reduced redundent code and adjusted alignment Change-Id: I8ca8c81223386fc21c6cc7c1d8f8a2109c9f5343	2021-05-02 23:09:13 +05:30
Meghana Vankadari	1303732e83	Added sup functionality for SYRK Details: - Added bli_syrksup function that internally uses gemmt implementation. - Modified OAPI of syrk to call SUP before proceeding to the conventional implementation. - Copied gemmsup threshold function for syrk temporarily. Thresholds are yet to be derived for syrk. Change-Id: I751c6bd62bc76a3e4717f77c5cb33f19b759151d	2021-04-29 12:35:30 +05:30
Nallani Bhaskar	b239a5aee7	Bug fix in sgemmsup 1x16n kernel Details: Address increment was missing in bli_sgemmsup_rv_zen_asm_1x16 kernel while storing output in column major order in beta zero case JIRA: CPUPL-1548 Change-Id: I36269cd28de6fbef2256451e399f90f0437b0ce1	2021-04-28 21:33:30 +05:30
lcpu	7401effc03	BLIS:merge: Merge conflicts araised has been fixed while downstreaming BLIS code from master to milan-3.1 branch Implemented an automatic reduction in the number of threads when the user requests parallelism via a single number (ie: the automatic way) and (a) that number of threads is prime, and (b) that number exceeds a minimum threshold defined by the macro BLIS_NT_MAX_PRIME, which defaults to 11. If prime numbers are really desired, this feature may be suppressed by defining the macro BLIS_ENABLE_AUTO_PRIME_NUM_THREADS in the appropriate configuration family's bli_family_*.h. (Jeff Diamond) Changed default value of BLIS_THREAD_RATIO_M from 2 to 1, which leads to slightly different automatic thread factorizations. Enable the 1m method only if the real domain microkernel is not a reference kernel. BLIS now forgoes use of 1m if both the real and complex domain kernels are reference implementations. Relocated the general stride handling for gemmsup. This fixed an issue whereby gemm would fail to trigger to conventional code path for cases that use general stride even after gemmsup rejected the problem. (RuQing Xu) Fixed an incorrect function signature (and prototype) of bli_?gemmt(). (RuQing Xu) Redefined BLIS_NUM_ARCHS to be part of the arch_t enum, which means it will be updated automatically when defining future subconfigs. Minor code consolidation in all level-3 _front() functions. Reorganized Windows cpp branch of bli_pthreads.c. Implemented bli_pthread_self() and _equals(), but left them commented out (via cpp guards) due to issues with getting the Windows versions working. Thankfully, these functions aren't yet needed by BLIS. Allow disabling of trsm diagonal pre-inversion at compile time via --disable-trsm-preinversion. Fixed obscure testsuite bug for the gemmt test module that relates to its dependency on gemv. AMD-internal-[CPUPL-1523] Change-Id: I0d1df018e2df96a23dc4383d01d98b324d5ac5cd	2021-04-27 11:09:48 +05:30

1 2 3 4 5 ...

2485 Commits