sthangar
997628ed97
Reducing the framework overhead of GEMV routines
...
Change-Id: I83607ad767bff74e305e915b54b0ea34ec3e5684
2017-12-11 12:08:58 +05:30
sthangar
3485abba4b
Checked in the small matrix code to compute GEMM called with A transpose case
...
Change-Id: I29f40046d43d7a4b037c1cb322503ee26495f462
2017-12-11 12:07:31 +05:30
Devin Matthews
de16beb83b
PACKDIM_MR=8 didn't work out, but messing with the prefetching helps 2%.
2017-12-11 12:07:31 +05:30
Devin Matthews
25d0e61854
Revert "Change PACKDIM_MR (double) for haswell to 8."
...
This reverts commit 681eec913d .
2017-12-11 12:07:31 +05:30
Devin Matthews
c5bdd84b35
Change PACKDIM_MR (double) for haswell to 8.
2017-12-11 12:07:31 +05:30
Field G. Van Zee
db4a0bb8ba
Whitespace reformatting to armv8a kernels file.
...
Details:
- Updated formatting of function signature/header in
kernels/armv8a/3/bli_gemm_opt_4x4.c.
2017-12-11 11:58:33 +05:30
sthangar
42e7f6fb2a
fixed license attribute issues in AMD added files
...
Change-Id: I303f870a777c7cd1c1af29ea0b93f3e0a27948e4
2017-03-31 14:33:02 +05:30
Kiran Varaganti
0b19029342
Code cleanup, removed warnings from trsm, removed unused routines in axpyv & scalv
...
Change-Id: I02867f394c5f416194c4b1769a6c75f39243ec81
2017-03-14 14:51:31 +05:30
praveeng
825363bd2a
Merge code from master to amd-staging as on 2017_03_08 by praveeng
...
Change-Id: I80740081b2cb54c9b77a3e78b9fe540e170be23d
2017-03-08 15:43:42 +05:30
sthangar
093bdb80c8
Checked in Unpacked DGEMM code
...
Change-Id: I39dcc7b238b328f73ee2675d21a5e521d0488723
2017-03-07 13:35:50 +05:30
Kiran Varaganti
33923da9a1
Added variant 10 for double precision axpyv microkernel
...
Change-Id: I7a20cc113a422603250bc450825c965136354974
2017-03-06 14:31:31 +05:30
Kiran Varaganti
bc828f7f8e
Added new axpyv (single precision) microkernel where it performs 10 FMAs per loop- This gives better performance than all other implementations of axpyv
...
Change-Id: Ic4f0e4c67e367d67d0b24febcf34f81a70a39972
2017-03-03 14:45:35 +05:30
sthangar
c9949f4603
Checked in DGEMMTRSM and edge case handling routine in DDOTXF
...
Change-Id: I65f00661af6c09b2507294fd43e0a10641c0597e
2017-03-01 11:14:34 +05:30
Devin Matthews
0e18f68cf1
Handle k=0 correctly in KNL dgemm ukernel.
2017-02-20 09:03:21 -06:00
Devin Matthews
7d42fc0796
Cast dim_t and inc_t parameters to 64-bit in KNL microkernels.
2017-02-19 21:10:55 -05:00
Kiran Varaganti
04245c9ff7
Reoptimized scalv routines - two vector multiplies are done per iteration, and these routines are enabled in bli_kernel.h
...
Change-Id: Ic5654508573d1f6bde2edef06aefe117e581feb5
2017-02-10 14:24:30 +05:30
Kiran Varaganti
58b5b77e5f
Fixed a bug in axpyv, the arguments passed to intrinsic fmad instruction are corrected
...
Change-Id: If12f24c6bc74b22ac9e4acd6b9378e06d79f2f5e
2017-02-08 21:43:34 +05:30
Kiran Varaganti
85de4ebf74
variant 4 axpyv single precision modified: explicitly used FMA intrinsics, replaced vector multiply and add operations
...
Change-Id: I975feef56696d479d2b9e9441b0660021cf4f6ff
2017-02-08 14:41:04 +05:30
Kiran Varaganti
3fa53e8af3
Merged axpyv and gemm small in bli_kernel.h
...
Merge branch 'amd-staging' of ssh://git.amd.com:29418/cpulibraries/er/blis into amd-staging
modified: config/zen/bli_kernel.h
modified: frame/3/gemm/bli_gemm_front.c
modified: kernels/x86_64/zen/3/bli_gemm_small_matrix.c
Change-Id: If181cf9345178c448b3530beb8bef453917fe295
2017-02-08 11:51:57 +05:30
sthangar
95be7b0470
Added logic for packing matrix A and prefetching matrix C in Unpacked SGEMM code
...
Change-Id: I99efeca9eb5b4449286ec0ec133fd554ef1bb4f0
2017-02-08 11:24:10 +05:30
Kiran Varaganti
b5291a445b
Added optimization variant 4 for axpyv single precision - this performs 5 FMA per loop, keeping the IPC always full
...
Change-Id: Ie77ed22584271136a257e673bcd3b1ba71136bc9
2017-02-07 12:39:31 +05:30
Kiran Varaganti
f4bfc1662a
New routines implemented for axpyv to improve performance for small vector sizes, vectorization is done for vectors as small as 8 (single precision) 4(double precision), since this operation has low compute to memory ratio, higher matrix sizes memory operations are dominating and hence not much gain - This still needs some work- added saxpyv and daxpyv var 3 routines in the file bli_axpyv_opt_var1.c
...
Change-Id: Ic1b33bd5516e10113b00e44ab41b97eb19d46072
2017-02-06 15:04:27 +05:30
sthangar
574472ba5a
checked in unpacked SGEMM optimization
...
Change-Id: I8e4ea374415c0c402c660b656fb076af15354181
2017-01-27 14:32:02 +05:30
praveeng
41595e98ee
Merge master code as on 2016_12_07 to amd-staging
...
Change-Id: I5d9ecef9bff960aeb9b51ca4e4b21714e789e44f
2016-12-07 15:14:02 +05:30
sthangar
d625c49e20
checked-in SGEMMTRSM microkernel for Zen
...
Change-Id: Ib61936418dea911b2154aa99f703b66e9669f94f
2016-12-01 16:17:09 +05:30
Francisco Igual
7f31a6307b
Fixed missing cntx argument in ARMv8 microkernels.
2016-11-27 14:40:47 +01:00
praveeng
d8f13beeea
Merge master code till 2016_11_25 to amd-staging
2016-11-25 17:31:08 +05:30
Field G. Van Zee
b3e58ee303
Reimplemented 4x12 haswell ukernels (real only).
...
Details:
- Replaced permutation-based implementations in bli_gemm_asm_d4x12.c, which
defines 4x24 single real and 4x12 double real gemm microkernels, with
broadcast-based implementations. (The previous microkernel file has been
moved to an 'old' subdirectory.)
2016-11-23 17:58:26 -06:00
sthangar
9772218cae
Added optimized DAMAX routines for Zen
...
Change-Id: I499c0c8f0f4ce6c19235c47b86d5608db6ba50f8
2016-11-16 15:19:19 +05:30
Kiran Varaganti
e35d3c23f2
Added new optimized micro-kernel for dotxv routine
...
Change-Id: I2c544e9b25a454d971ad690353502a55cd668391
2016-11-10 14:30:53 +05:30
praveeng
0d13e9a4f6
bli_kernel.h
...
Change-Id: I425d089f79497a0de7d1622e829c3ca9edf7f091
2016-11-07 14:40:41 +05:30
Field G. Van Zee
8a11a2174a
Updates to non-default haswell microkernels.
...
Details:
- Updated s and d microkernels in bli_gemm_asm_d8x6.c to relax alignment
constraints.
- Added missing c and z microkernels, which are based on the corresponding
kernels in the d6x8 set.
- This completes the d8x6 set (which may be used for situations when it
is desirable to have a microkernel with a column preference).
2016-10-31 19:07:55 -05:00
Devin Matthews
11eb7957ab
Merge branch 'master' into knl
...
# Conflicts:
# frame/thread/bli_thread.h
2016-10-25 13:51:07 -05:00
Devin Matthews
cd5b668183
Don't use %rbp in KNL packing kernels.
2016-10-25 13:49:27 -05:00
Devin Matthews
5117d444f7
Change .align to .p2align in Bulldozer ukernels
...
Apparently OSX doesn't allow .align directives for >16B, so I've changed these to their .p2align counterparts.
2016-10-24 16:20:47 -05:00
Kiran Varaganti
d250e6a3af
Merged TRSM and scalv routines into zen folder
...
Change-Id: Ice897bc83e8fb70b90f23cc3ce892c39883aceb9
2016-10-20 14:34:39 +05:30
sthangar
1c2f7b57d5
Removed symlinks to zen kernels from haswell kernel folder and also modified the bli_kernel.h file accordingly
...
Change-Id: Ib3736af48e851c8243bbe10d937fb942c49ad048
2016-10-18 15:06:35 +05:30
sthangar
7e04490002
Checked in the SAMAX optimizations
...
Change-Id: I7faf8c3adf52ff01432188ad3b9866ee4b9a9dfd
2016-10-13 10:07:51 +05:30
Field G. Van Zee
b922d75634
Avoid compiling BLAS/CBLAS files when disabled.
...
Details:
- Updated the top-level Makefile, build/config.mk.in template, and
configure script so that object files corresponding to source files
belonging to the BLAS compatibility layer are not compiled (or archived)
when the compatibility layer is disabled. (Same for CBLAS.) Thanks
to Devin Matthews for suggesting this optimization.
- Slight change to the way configure handles internal variables. Instead
of converting (overwriting) some, such as enable_blas2blis and
enable_cblas, from a "yes" or "no" to a "1" or "0" value, the latter are
now stored in new variables that live alongside the originals (with the
suffix "_01"). This is convenient since some values need to be
sed-substituted into the config.mk.in template, which requires "yes" or
"no", while some need to be written to the bli_config.h.in template,
which requires "0" or "1".
Updated BLIS4 TOMS citation in README.md.
Added complex gemm micro-kernels for haswell.
Details:
- Defined cgemm (3x8) and zgemm (3x4) micro-kernels for haswell-based
architectures. As with their real domain brethren, these kernels perfer
row storage, (though this doesn't affect most users due to high-level
optimizations in most level-3 operations that induce a transpose to
whatever storage preference the kernel may have).
Change-Id: I512ab90784ecbb7cdaee24928d2ccebb544ba5c1
2016-09-15 12:24:07 +05:30
Pradeep Rao
69826110ba
Merge "Implemented trsm single precision for lower triangular matrices, files added bli_trsm_l_int_6x16.cfiles modified bli_kernel.h to enable optimized trsm microkernel and test_trsm.c is modified to test trsm single precision" into amd-staging
2016-09-14 03:26:25 -04:00
Field G. Van Zee
121c39d455
Added complex gemm micro-kernels for haswell.
...
Details:
- Defined cgemm (3x8) and zgemm (3x4) micro-kernels for haswell-based
architectures. As with their real domain brethren, these kernels perfer
row storage, (though this doesn't affect most users due to high-level
optimizations in most level-3 operations that induce a transpose to
whatever storage preference the kernel may have).
2016-09-05 13:11:42 -05:00
sthangar
64598ee4cf
fixed the symlink issue
...
Change-Id: I2186d529f295c576597c189e1ae219bc1a83f955
2016-08-31 12:54:50 +05:30
sthangar
fdc6639023
Placed 1 and 1f AMD optimized AVX routines under zen folder
...
Change-Id: I26795211ef11d232ed794ce36dd0a9c1f8706328
2016-08-29 10:43:38 +05:30
Kiran Varaganti
a58dd35ed7
Implemented trsm single precision for lower triangular matrices, files added bli_trsm_l_int_6x16.cfiles modified bli_kernel.h to enable optimized trsm microkernel and test_trsm.c is modified to test trsm single precision
...
Change-Id: Ibddf989f4aad577e89558673e1038cf6ece654d9
2016-08-26 14:55:12 +05:30
Devin Matthews
c8e4ef9395
Add prefetchw to 30x8 kernel.
2016-08-03 16:13:03 -05:00
Devin Matthews
4b5a2f3d6e
Merge remote-tracking branch 'origin/knl' into knl
...
# Conflicts:
# kernels/x86_64/knl/3/bli_dgemm_opt_24x8.c
2016-08-03 16:09:51 -05:00
Devin Matthews
380736bfe9
Add (new) 30x8 KNL kernel and fix non-scatter prefetch bug.
2016-08-03 16:08:28 -05:00
Devin Matthews
9f52a587de
Try prefetchw[t1] instead of regular prefetch for C.
2016-08-03 16:03:53 -05:00
Devin Matthews
8945a1512d
This version gets ~1550 GFLOPs on KNL wuth 16x4.
2016-08-03 11:28:24 -05:00
praveeng
cdfb3c3f29
Merge master code as on 2016_07_29 to amd-staging branch by praveeng
...
Change-Id: Ic78b84d8b8d10158fb2a612f9a64bbc7b1f9b486
2016-07-29 12:46:21 +05:30