blis/frame/3 at 1dbeee4d194628d6ff296b9d5ec44eed0cd4d76f - blis - Public git mirror

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-26 07:25:28 +00:00

Files

History

Mangala V e6cc2a3e22 ZGEMMT SUP Optimizations for AVX512

Existing Design:
 - GEMM AVX2 kernel performs computation and updates temporary C buffer
 - Portion of temporary C buffer is copied to output C buffer
   based on UPLO parameter
 - For diagonal blocks, using GEMM kernels is not efficient

New Design: Implemented in current patch when UPLO='L'
 - GEMMT kernel used for computation, temporary buffer is not required.
 - Only required elements are computed using mask load store for all
   fringe cases
 - Exception: AVX2 code path is used when storage format is RRC, CRR, CRC

- AOCL-Dynamic is added based on dimension
- Check for AVX platform is added in SUP interface, It returns to
  native implementation if hardware doesnot support AVX platform
- SUP ref_var2m is expanded for dcomplex datatype to avoid condition
  check which exists for double datatype

AMD_Internal: [CPUPL-5006]

Change-Id: I3e21404b732b8f2df9cbdba394303752fdf36286

2024-05-07 23:00:29 +05:30

..

Fix for build issue when Mixed Datatypes are disabled

2024-02-23 04:02:49 -05:00

ZGEMMT SUP Optimizations for AVX512

2024-05-07 23:00:29 +05:30

CMake: Adding new portable CMake system.

2023-11-09 15:49:45 +05:30

CMake: Adding new portable CMake system.

2023-11-09 15:49:45 +05:30

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

CMake: Adding new portable CMake system.

2023-11-09 15:49:45 +05:30

CMake: Adding new portable CMake system.

2023-11-09 15:49:45 +05:30

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

CMake: Adding new portable CMake system.

2023-11-09 15:49:45 +05:30

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

bli_l3_blocksize.c

Removed local copy of cntx in TRSM

2023-08-16 08:09:01 -04:00

bli_l3_blocksize.h

Removed export macros from all internal prototypes.

2020-08-03 11:47:18 +05:30

bli_l3_check.c

Added functionality support for dzgemm

2022-05-17 18:01:55 +05:30

bli_l3_check.h

Squash-merge 'pr' into 'squash'. (#457 )

2020-11-14 09:39:48 -06:00

bli_l3_cntl.c

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

bli_l3_cntl.h

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

bli_l3_compute.c

Added Parameter Checks and DTL Trace for Extension APIs

2023-11-09 18:53:59 +05:30

bli_l3_compute.h

Code cleanup: No newline at end of file

2023-11-22 17:11:10 -05:00

bli_l3_direct.c

Remove UT-Austin from copyright headers' clause 3.

2018-12-04 14:31:06 -06:00

bli_l3_direct.h

Removed export macros from all internal prototypes.

2020-08-03 11:47:18 +05:30

bli_l3_ft_ukr.h

Added a new field in cntx to store l3 threshold function pointers

2021-08-16 00:10:01 -04:00

bli_l3_oapi_ba.c

Remove UT-Austin from copyright headers' clause 3.

2018-12-04 14:31:06 -06:00

bli_l3_oapi_ex.c

Remove UT-Austin from copyright headers' clause 3.

2018-12-04 14:31:06 -06:00

bli_l3_oapi.c

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

bli_l3_oapi.h

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

bli_l3_oft_var.h

Remove UT-Austin from copyright headers' clause 3.

2018-12-04 14:31:06 -06:00

bli_l3_oft.h

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

bli_l3_packm.c

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

bli_l3_packm.h

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

bli_l3_prune.c

Export functions without def file (#303 )

2020-08-03 11:46:07 +05:30

bli_l3_prune.h

Removed export macros from all internal prototypes.

2020-08-03 11:47:18 +05:30

bli_l3_smart_threading.c

BLIS: Implement zen5 sub-configuration

2024-04-12 07:26:31 -04:00

bli_l3_smart_threading.h

Smart Threading for GEMM (sgemm) v1.

2022-05-17 18:10:39 +05:30

bli_l3_sup_ft_ker.h

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

bli_l3_sup_int_amd.c

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

bli_l3_sup_int.c

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

bli_l3_sup_int.h

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

bli_l3_sup_ker_prot.h

Added ZTRSM AVX512 small code path

2024-05-03 05:10:41 -04:00

bli_l3_sup_ker.h

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

bli_l3_sup_oft.h

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

bli_l3_sup_packm_a.c

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

bli_l3_sup_packm_a.h

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

bli_l3_sup_packm_b.c

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

bli_l3_sup_packm_b.h

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

bli_l3_sup_packm_var.c

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

bli_l3_sup_packm_var.h

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

bli_l3_sup_ref.c

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

bli_l3_sup_ref.h

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

bli_l3_sup_var1n2m.c

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

bli_l3_sup_var12.c

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

bli_l3_sup_vars.h

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

bli_l3_sup.c

ZGEMMT SUP Optimizations for AVX512

2024-05-07 23:00:29 +05:30

bli_l3_sup.h

Added sup functionality for SYRK

2021-04-29 12:35:30 +05:30

bli_l3_tapi_ba.c

Remove UT-Austin from copyright headers' clause 3.

2018-12-04 14:31:06 -06:00

bli_l3_tapi_ex.c

Remove UT-Austin from copyright headers' clause 3.

2018-12-04 14:31:06 -06:00

bli_l3_tapi.c

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

bli_l3_tapi.h

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

bli_l3_thrinfo.c

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

bli_l3_thrinfo.h

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

bli_l3_ukr_fpa.c

Remove UT-Austin from copyright headers' clause 3.

2018-12-04 14:31:06 -06:00

bli_l3_ukr_fpa.h

Removed export macros from all internal prototypes.

2020-08-03 11:47:18 +05:30

bli_l3_ukr_oapi.c

Export functions without def file (#303 )

2020-08-03 11:46:07 +05:30

bli_l3_ukr_oapi.h

Add symbol export macro for all functions (#302 )

2019-08-23 14:18:07 +05:30

bli_l3_ukr_prot.h

Add low-precision POWER10 gemm kernels (#467 )

2021-03-05 13:53:43 -06:00

bli_l3_ukr_tapi.c

Export functions without def file (#303 )

2020-08-03 11:46:07 +05:30

bli_l3_ukr_tapi.h

Remove UT-Austin from copyright headers' clause 3.

2018-12-04 14:31:06 -06:00

bli_l3_ukr.h

Remove UT-Austin from copyright headers' clause 3.

2018-12-04 14:31:06 -06:00

bli_l3.h

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00