amd/blis - blis - Public git mirror

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-13 10:35:38 +00:00

Author	SHA1	Message	Date
Madan mohan Manokar	a424e8b426	3m_sqp vectorization 1. bli_malloc modified to normal malloc and address alignment within 3m_sqp. 2. function added to pack A real,imag and sum. 3. function added to pack B real,imag and sum. 4. function added to pack C real,imag and beta handling. 4. sum and sub vectorized. AMD-Internal: [CPUPL-1352] Change-Id: I514e9efb053d529caef2de413d74d0dac2ceca54	2021-03-10 11:54:50 +05:30
Kiran Varaganti	12d13629f9	Fix Debug Trace Log in dgemm_ and zgemm_ Replaced "MKSTR(ch)" in the DTL call "AOCL_DTL_LOG_GEMM_INPUTS(AOCL_DTL_LEVEL_TRACE_1, MKSTR(ch)...)" with "D" and "Z" for dgemm_ and zgemm_ respectively to prevent printing wrong data-type. [CPUPL-1449] Change-Id: Ic91537189352bdb164411799e127de990a5c9a08	2021-03-02 15:16:21 +05:30
Nageshwar Singh	791903b31c	Adding trans h support in bench_gemm.c Change-Id: If340d515c38a593df26d5075e29685ef044601a5	2021-03-02 02:33:06 +05:30
Meghana Vankadari	22d4689360	Implemented 16x3 based gemm kernel for the case where A has transpose Details: - This implementation does a transpose operation while packing 16xk of A buffer and passes it to 16x3-nn kernel. - The same implementation works for the case where B has transpose. AMD-Internal: [CPUPL-1376] Change-Id: I81f74deb609926598f62c30f5bd6fc80fb1b9a17	2021-02-18 16:47:14 +05:30
Kiran Varaganti	851ab8b39f	Merge "Code fixes for single-thread and multi-thread builds." into amd-staging-milan-3.1	2021-02-16 23:29:41 -05:00
Kiran Varaganti	e1a5e96c7f	Code fixes for single-thread and multi-thread builds. Made changes to dgemm_ and zgemm_ interfaces to support multi-thread GEMM implementations. When number of threads is greater than one, we call multi-threaded gemm (sup or native) and for single thread version we call different flavors of single-thread gemm implementations decided based on the matrix dimensions. [CPUPL-1376] Change-Id: I2e37145ec9a07d6b7e7be1719bd49239e813aa8a	2021-02-16 12:44:31 +05:30
Meghana Vankadari	cf7d9c7314	Disabled calling of bli_dgemm_small from gemm_front Details: - Decision logic to choose small_gemm has been moved to blas interface. - Redirecting all the calls to small_gemm from gemm_front to native implementation. AMD-Internal: [CPUPL-1376] Change-Id: I6490f67113e9f7c272269f441c86f2a0b3c89a53	2021-02-16 11:30:20 +05:30
Madan mohan Manokar	95e0fb3a05	sqp commenting 1. Added comments. AMD-Internal: [CPUPL-1429] Change-Id: Ie37e24e58cd8bf836038a2258ebd09c3912fab9e	2021-02-15 04:22:23 -05:00
Meghana Vankadari	42a0a6bc6f	Added a basic dgemm implementation for smaller matrices. Details: - This kernel works best for cases where k = 1. - This implementation is called directly from blas interface when A, B matrices have no-transpose and k = 1. AMD-Internal: [CPUPL-1376] Change-Id: I3b31673a28290c81d4a4cb64c8605d56e50b5d3d	2021-02-15 09:43:47 +05:30
Meghana Vankadari	943b1362c7	Enabled vectorized pack kernels for zen2 configuration. Details: - These kernels are implemented by Field G. Van Zee as part of TRSM SUP implementation with commit-ID 9e31f5e8553f8ae99cfe8a80052fc63499e0891a. AMD-Internal: [CPUPL-1376] Change-Id: Ib39a87fc20571ae9aeff82c9b87516ac583093c2	2021-02-12 19:16:57 +05:30
Madan mohan Manokar	4c8b823972	gemm_sqp(gemm_squarePacked): 3m_sqp and dgemm_sqp 1. SquarePacked algorithm focuses on efficient zgemm/dgemm implementation for square matrix sizes (m=k=n) 2. Variation of 3m algorithm (3m_sqp) is implemented to allow single load and store of C matrix in kernel. 3. Currently the method supports only m multiple of 8. Residues cases to be implemented later. 4. dgemm Real kernel (dgemm_sqp) implementation without alpha, beta multiple is done, since real alpha and beta scaling are in 3m_sqp framework. 5. gemm_sqp supports dgemm when alpha = +/-1.0 and beta = 1.0. Change-Id: I49becaf6079da4be29be5b06057ff4e50770a7d8 AMD-Internal: [CPUPL-1352]	2021-02-12 15:57:59 +05:30
Kiran Varaganti	29ddec241a	Merge "DGEMM Optimizations for smaller dimensions" into amd-staging-milan-3.1	2021-02-11 08:22:36 -05:00
Kiran Varaganti	a7d43cf720	DGEMM Optimizations for smaller dimensions Modified dgemm_ to able to call small_gemm 16x3 kernel. small_gemm will be called if((m + n -k) < 2000 && (m + k-n) < 2000 && n + k-m < 2000) && n > 2. small_gemm kernel - if m or n or k = 0 we return and this case will be handled by sup or native kernel. [CPUPL - 1376] Change-Id: I61c2b36ad0ae4fb3dd23bc37c2b6c78556b3105b	2021-02-11 11:05:42 +05:30
Mangala V	503e912fc5	Merge "Modified blas interface of TRSM to call TRSV whenever m=1 or n=1." into amd-staging-milan-3.1	2021-02-11 00:21:45 -05:00
managalv	8face536fd	Modified blas interface of TRSM to call TRSV whenever m=1 or n=1. TRSM API: AX = B, where X=B Case1: Call TRSV when matrix B is vector & A is matrix, When n = 1 for left side and when m = 1 for right side Case2: Divide B/A when matrix B is vector & A is scalar(Diagonal element), When m = 1 for left side and when n = 1 for right side For right side, Transpose complete operation, Change upper to lower and vice versa when A is being transposed Change-Id: Ib020f2a568f04a6e8d8f75bfc38adbfd7c5d175a	2021-02-11 18:47:37 +05:30
Madan mohan Manokar	3ab9104dae	Handling zgemm real(+/-1) alpha and beta 1.Improved performance when zgemm's alpha and beta are real and equal to +/-1. 2.change done in bli_zgemmsup_rv_zen_asm_3x4n. 3.change done in bli_zgemmsup_rv_zen_asm_3x4m. 4.change done in bli_zgemm_haswell_asm_3x4. Change-Id: Ic14d8507b264c24a8748febf6bc73eb60e476430 AMD-Internal: [CPUPL-1352]	2021-02-10 02:58:58 -05:00
managalv	1ff4981203	Modified blas interface of TRSM to call TRSV whenever m=1 or n=1. Case1: Call TRSV when matrix C & B are vector & A is matrix, When n = 1 for left side and when m = 1 for right side Case2: Divide B/A when matrix C & B are vector & A is scalar(Diagonal element), When m = 1 for left side and when n = 1 for right side For right side, Transpose complete operation, Change upper to lower and vice versa when A is being transposed Change-Id: Ie87e4a263c287ba554832ccc56b629f982e3ac4c	2021-02-08 19:02:25 +05:30
Madan mohan Manokar	f1ea1f1d34	Adpative zgemm 1. 3m1 choosen for (m<=128) & (68>n<=128) & (k<=128) 2. Default blis3.1 path for rest of the sizes. Change-Id: I1e50dece013e72a67f1162faef5cbeb9bfbbc23a AMD-Internal: [CPUPL-1352]	2021-02-03 12:43:57 +05:30
Meghana Vankadari	2e7cf8d82f	Added 16x4 AXPYF kernel for zen2 config Details: - Added a new AXPYF kernel with fuse_factor = 4 and iter_unroll = 4. - Modified blas interface of GEMM to call GEMV whenever m=1 or n=1. Change-Id: I3f5acd37b009f53cf63f462cec79fd3e73676dbc	2021-02-02 21:22:44 +05:30
dzambare	48f2366b6f	Updated BLIS version string to "AOCL BLIS X.x" format AMD-Internal : [CPUPL-1394] Change-Id: Ifebcb14d9eb064d231b831f5a1e151853ad5a009	2021-01-07 12:38:32 +05:30
Nagendra Prasad M	566f586547	Merge "Blis: DOTC Additional argument for Complex types when using FLANG" into amd-staging-milan-3.1	2020-12-21 06:03:11 -05:00
nprasadm	10ac4e2aba	Blis: DOTC Additional argument for Complex types when using FLANG Merged the changes done in UT Austin BLIS repo for DOTC Additional argument. Other modifications related to test application included. Verifed the above code changes through scalapack test applications 'xztrd' , 'xctrd' Change-Id: I7e16f3953db71890f9e8fbb0f7b363eaad899f62 Signed-off-by: Nagendra <Nagendra.PrasadM@amd.com> AMD-Internal: [CPUPL-1323]	2020-12-16 14:03:10 +05:30
Kiran Varaganti	fc80892bb2	Improve sup GEMM performance (CCC - row prefer kernel) Column-storage (CCC) case m is large and n & k are relatively small - row preferred kernels, in this case var1n sup kernels are called. But actually block-panel var2m works better here. After induced transposition the n becomes m which is large and m becomes n which is smaller. The micropanels of induced B are larger than micropanels of induced A, therefore var2m is better option than var1n. [CPUPL-1376] Change-Id: I9214140d340ea4ac3edfefc31c465c926ba93326	2020-12-10 19:16:44 +05:30
Dipal M Zambare	66fd5e547a	Update AMD copyright notice for current year. Change-Id: I2ffd3d3306499922be15638d37c4d1e806acd36c AMD-Internal: [CPUPL-1367]	2020-12-10 13:44:29 +05:30
Dipal M Zambare	38a8008cd8	Enabled znver3 flag for zen3 architecture znver3 flag will be enabled if compiler is AOCC Clang version 3.0 and configuration is zen3 Change-Id: Ie164f4d469bf3f8df31ccf8fed9f80dfc62efb39 AMD-Internal: [CPUPL-1353]	2020-12-04 12:28:22 +05:30
Meghana Vankadari	e083caf01d	Merge "Correcting zdotc definition error for configs other than zen family" into amd-staging-milan-3.0	2020-12-01 06:20:10 -05:00
Dipal M Zambare	c2f63fcc54	Update amd64 bundle configuration The configuration is updated to - Enable EPYC architecture optimizations - Macros to override block sizes. AMD-Internal : [CPUPL-1350] Change-Id: Id712f9abe6e81c9ece2baaab9d965b405e72977a	2020-12-01 14:37:13 +05:30
Meghana Vankadari	11b4cd8fc5	Correcting zdotc definition error for configs other than zen family Details: - when BLIS_CONFIG_EPYC is not defined, zdotc is defined twice. - One definition is part of macro based code. - Other definition is implemented as part of framework optimizations. - Modified the bla_dot.c file to choose macro based code for configs other than zen family. AMD-Internal: [CPUPL-1348] Change-Id: I9ef6a590a6199e173d38248c3fb72feddfb20922	2020-12-01 13:33:59 +05:30
bhaskarn	91909c1562	Fix for segmentation crash in dgemmsup kernels Description: [AMD Internal]: CPUPL-1336 Removed extra/un-nesseary loads in dgemmmsup kernels which are accessing the memory beyond the boundaries and causing segmentation issue. Kernels: bli_dgemmsup_rd_haswell_asm_1x4 bli_dgemmsup_rv_haswell_asm_1x6 Change-Id: Idaeed36ebd9f13550943394a37e372b8d015b2d3	2020-11-24 10:15:57 -05:00
Kumar, Phani	477fc41fff	Cmake script changes and blis.h changes for amd-staging-milan-3.0 AMD Internal : [CPUPL-1083] Change-Id: Ia29a1f328ee32e2aec59a7fc70c04400d6ee6580	2020-11-24 06:12:25 -05:00
Dipal M Zambare	0a3d94c9a2	Updated test drivers for dotv, scalv and swapv. Added traces in cblas layer for these API's. These test drivers didn't have calls for complex data types, the drivers are updated to support them. AMD-Internal : [CPUPL-1315] Change-Id: Ia52ecca68ea17314315d626b57c46a2f5973985b	2020-11-24 10:26:32 +05:30
Meghana Vankadari	97753d8e6b	Modified log routines for gemm, gemmt and trsm Details: - Modified log routines to accept inputs from blas layer instead of oapi level. AMD-Internal: [CPUPL-1332] Change-Id: If33c3585af92e617910ae8f7d442d1275119bbfc	2020-11-23 04:53:15 -05:00
Madan mohan Manokar	1d8fab0996	Test driver fix for her and her2 Fixed test driver code for her, her2 Support added to handle complex and double complex data type in test driver. Change-Id: If65939e99d8cf77e0fb70561166d84bf67d0321d AMD-Internal: [CPUPL-1326]	2020-11-23 04:10:43 -05:00
Dipal Madhukar Zambare	22270aa9e4	Merge "Added debug log and trace for gemv and dotv for blis and cblas interface" into amd-staging-milan-3.0	2020-11-23 03:51:48 -05:00
managalv	fdc0e70cd8	Added debug log and trace for gemv and dotv for blis and cblas interface AMD Internal: [CPUPL-1314] Change-Id: I2708fd9c73419c968c8e02ff11545645dc639052	2020-11-23 19:55:21 +05:30
Kiran Varaganti	80a516382e	Fixed wrong dimensions check in bench/bench_gemm.c application Verifying the valid values of m, n, k, lda, ldb and ldc is removed. Since the bench app is run on logs collected from AOCL traces. The correct way of checking should consider transpose parameter and storage order. Change-Id: If0fbf733c2650c6f328661293eb99d062685d638	2020-11-20 20:39:20 +05:30
Madan Mohan Manokar	35d33bab6a	Merge "Test driver fix for her, her2, herk and her2k" into amd-staging-milan-3.0	2020-11-19 05:47:31 -05:00
Madan mohan Manokar	38698f0dfd	Test driver fix for her, her2, herk and her2k Fixed test driver code for her, her2, herk and her2k function. Above functions supports only complex and double complex data type, test code is updated accordingly. Change-Id: Iee7b79abda4a2959a265c420d23879bf47f2c38d AMD-Internal: [CPUPL-1313]	2020-11-19 12:58:21 +05:30
satish kumar nuggu	17f994bd15	Added Blas interface for ?imatcopy, ?omatcopy, ?omatadd, ?omatcopy2 AMD-Internal: [CPUPL-1116] Original review was in this commit http://gerrit-git.amd.com/c/cpulibraries/er/blis/+/428165. Added new commit for transpose API's Change-Id: I322389cc0be0aaccf82d1d0bb4476beea8694cd8	2020-11-18 12:55:36 +05:30
Dipal M Zambare	ce99b1ecef	Added dynamic block size selection logic for DGEMM. Block sizes (MC, KC, NC) for DGEMM are determined at runtime based on following parameters - Single or multithreaded build - Processor Architecture (currently support only zen3) - Number of threads requested while running the library Change-Id: Ia793484b77adb87486e630d0d3b4c7856ae52094 AMD-Internal: [CPUPL-660, CPUPL-661]	2020-11-12 22:40:38 +05:30
Dipal M Zambare	6abd193144	Corrected thread id generation in DTL for BLIS. Added blis.h in aoclos.c in order to check if BLIS was build with openmp support. AOCL-Internal: [CPUPL-1238] Change-Id: I366da030266b9d7f2ad09dc722847a7d86b85933	2020-11-12 09:13:15 +05:30
managalv	5a57eeadfa	Disable 3m1 method for complex GEMM Details: Native method is being enabled for complex gemm Need to run performance for large dataset to enable induced method MD-Internal: [CPUPL-1300] Change-Id: I5444dd31e8b8e73da73f789da8b64276e8e40de8	2020-11-11 19:57:59 +05:30
bhaskarn	008fe49df6	Added bench application for trsm Description: Added bench_trsm.c to read inputs from AOCL DTL logs to benchmark Added sample input file Change-Id: I6806e42244bf775cbed457553ca07fb0222ef597	2020-11-09 13:06:39 -05:00
Madan mohan Manokar	ac6dbdcdfb	Fixing the logs fixing data type issue in logs. Change-Id: I3b9fb2921fd9db57a734c7a2866b53f1b51adfdb AMD-Internal: [CPUPL-1249]	2020-11-09 20:39:23 +05:30
mkurumel	39f7a4eecf	Add AOCL DTL logging. Added logging for syr,syr2,syrk,syr2k,trmm,trmv,trsv. AMD-Internal: [CPUPL-1256] Change-Id: I628ef5d48796cfc68ec68886b8c1b0555261b3d1	2020-11-09 14:40:43 +05:30
Madan mohan Manokar	bded7f9392	Log fix Added function defintion of her2k Change-Id: Ia7ccd72772cdcafdcf5cb8a21c6746b13c70b158 AMD-Internal: [CPUPL-1249]	2020-11-09 10:33:08 +05:30
Mangala V	1578e8b874	Merge "Optimised AXPYF routine for complex float and complex double" into amd-staging-milan-3.0	2020-11-06 02:05:02 -05:00
managalv	aae48c2221	Optimised AXPYF routine for complex float and complex double Details: - Added SIMD code - Processing 5 rows at a time in SIMD loop to improve performance AMD-Internal: [CPUPL-1054] Change-Id: I2ac93f25895dccfc42e14be0689e6d4e655d6a0a	2020-11-06 18:42:13 +05:30
Dipal Madhukar Zambare	1fb9e1d029	Merge "Re-enable support for Intel 19+ compiler." into amd-staging-milan-3.0	2020-11-06 02:00:34 -05:00
Kiran Varaganti	a15e531374	Merge "Benchmark using AOCL Logs as input" into amd-staging-milan-3.0	2020-11-06 01:51:38 -05:00

1 2 3 4 5 ...

2242 Commits