blis/kernels at 18ae57305ebece998ca4c6c1786e6ba8983df976 - blis - Public git mirror

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-11 17:50:00 +00:00

Files

History

Harihara Sudhan S 18ae57305e ZAXPYF4 optimization

- Vectorized alpha scaling of X vector using SSE instructions. This
  can be done irrespective of incx.
- Added code to prefetch A matrix and Y vector to L1 cache
- Vectorized fringe case computation and non-unit stride computation
  with SSE instructions.
- Increased unroll in unit stride cases for better register
  utilization.

AMD-Internal: [CPUPL-2773]
Change-Id: I217e6ce9e3f5753ebe271c684abd9a2274fd2715

2023-02-04 12:34:50 -05:00

..

New kernel set for Arm SVE using assembly (#396 )

2020-05-21 11:56:45 +05:30

Squash-merge 'pr' into 'squash'. (#457 )

2020-11-14 09:39:48 -06:00

avoid loading twice in armv8a gemm kernel (#403 )

2020-05-21 12:37:53 +05:30

Replaced use of bool_t type with C99 bool.

2020-08-03 11:27:13 +05:30

Remove UT-Austin from copyright headers' clause 3.

2018-12-04 14:31:06 -06:00

Added a dummy file to kernels/generic.

2017-11-21 12:34:20 -06:00

AVX2 dgemm kernel optimization for AOCC

2023-01-09 07:49:41 -05:00

Remove UT-Austin from copyright headers' clause 3.

2018-12-04 14:31:06 -06:00

Type saga continues; fixed sgemm ukernel signature.

2020-09-12 17:48:15 -05:00

Remove UT-Austin from copyright headers' clause 3.

2018-12-04 14:31:06 -06:00

Optionally disable trsm diagonal pre-inversion.

2020-12-04 16:08:15 -06:00

BLIS library porting on to Windows:

2020-06-16 18:29:00 +05:30

Remove UT-Austin from copyright headers' clause 3.

2018-12-04 14:31:06 -06:00

Add POWER10 support to BLIS (#450 )

2020-09-29 16:52:18 -05:00

Fixed bug in power10 microkernel I/O. (#488 )

2021-03-30 19:07:42 -05:00

Remove UT-Austin from copyright headers' clause 3.

2018-12-04 14:31:06 -06:00

Bug Fix to replace vzeroall

2022-07-22 03:42:17 -04:00

ZAXPYF4 optimization

2023-02-04 12:34:50 -05:00

BLIS:merge:

2021-04-27 11:09:48 +05:30

Added support for zen3 configuration

2020-07-22 18:24:26 +05:30

Integrated 32x6 DGEMM kernel for zen4 and its related changes are added.

2023-01-19 23:11:36 +05:30

CMakeLists.txt

Added support for AVX512 for Windows and AMAVX

2022-06-08 11:09:48 +05:30