blis/kernels at 7dca25d056c235186f48f2bb216e312cb4afcaf7 - blis - Public git mirror

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-12 18:15:37 +00:00

Files

History

Harihara Sudhan S d4901f53ce Improved performance of double complex AXPYV kernel

- Increased unroll by reusing X registers that was previously used
  for performing shuffle.
- Added loops with smaller increment steps for better problem
  decomposition.
- Added X vector and Y vector prefetch to the kernel.
- Removed redundant code that handles fringe in incx = 1 and
  incy = 1. This remainder will be performed by the loop that handles
  non-unit stride cases.
- Vectorized loops that handle non-unit stride cases using SSE
  instructions.

AMD-Internal: [CPUPL-2773]
Change-Id: Ifb5dc128e17b4e21315789bfaa147e3a7ec976f0

2023-03-02 00:35:42 -05:00

..

New kernel set for Arm SVE using assembly (#396 )

2020-05-21 11:56:45 +05:30

Squash-merge 'pr' into 'squash'. (#457 )

2020-11-14 09:39:48 -06:00

avoid loading twice in armv8a gemm kernel (#403 )

2020-05-21 12:37:53 +05:30

Replaced use of bool_t type with C99 bool.

2020-08-03 11:27:13 +05:30

Remove UT-Austin from copyright headers' clause 3.

2018-12-04 14:31:06 -06:00

Added a dummy file to kernels/generic.

2017-11-21 12:34:20 -06:00

Obliterated usage of rbp register in SUP gemm kernel

2023-03-01 11:09:57 -05:00

Remove UT-Austin from copyright headers' clause 3.

2018-12-04 14:31:06 -06:00

Type saga continues; fixed sgemm ukernel signature.

2020-09-12 17:48:15 -05:00

Remove UT-Austin from copyright headers' clause 3.

2018-12-04 14:31:06 -06:00

Optionally disable trsm diagonal pre-inversion.

2020-12-04 16:08:15 -06:00

BLIS library porting on to Windows:

2020-06-16 18:29:00 +05:30

Remove UT-Austin from copyright headers' clause 3.

2018-12-04 14:31:06 -06:00

Add POWER10 support to BLIS (#450 )

2020-09-29 16:52:18 -05:00

Fixed bug in power10 microkernel I/O. (#488 )

2021-03-30 19:07:42 -05:00

Remove UT-Austin from copyright headers' clause 3.

2018-12-04 14:31:06 -06:00

Bug Fix to replace vzeroall

2022-07-22 03:42:17 -04:00

Improved performance of double complex AXPYV kernel

2023-03-02 00:35:42 -05:00

BLIS:merge:

2021-04-27 11:09:48 +05:30

Added support for zen3 configuration

2020-07-22 18:24:26 +05:30

Added AVX512 DTRSM small RLNN/RUTN variant kernels

2023-02-28 01:40:03 +05:30

CMakeLists.txt

Added support for AVX512 for Windows and AMAVX

2022-06-08 11:09:48 +05:30