blis/kernels at 595f7b7edf0981e4b03bb01a864d59d264896f90 - blis

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-11 17:50:00 +00:00

Files

mkurumel 595f7b7edf dnrm2 optimization with dot method

1.  Added new kernel bli_dnorm2fv_unb_var1 kernel to compute
	norm with dot operation.
    2.  Added vectorization to compute square of 32 double element
	block size from vector X.
    3.  Defined a new Macro BLIS_ENABLE_DNRM2_FAST under config header
	to compute nrm2 using new kernel.
    4.  Dot kernel definitions and implementation have a possibility for
	accuracy issues .we can switch to traditional implementation by
	disabling the MACRO BLIS_ENABLE_DNRM2_FAST to compute L2-norm
	for Vector X .

    AMD-Internal: [CPUPL-1757]

Change-Id: I1adcaf1b3b4e33837758593c998c25705ff0fe11

2021-11-12 08:58:53 +05:30

armsve

New kernel set for Arm SVE using assembly (#396 )

2020-05-21 11:56:45 +05:30

armv7a

Squash-merge 'pr' into 'squash'. (#457 )

2020-11-14 09:39:48 -06:00

armv8a

avoid loading twice in armv8a gemm kernel (#403 )

2020-05-21 12:37:53 +05:30

bgq

Replaced use of bool_t type with C99 bool.