amd/blis - blis - Public git mirror

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-21 17:08:17 +00:00

Author	SHA1	Message	Date
Arnav Sharma	66b2231b65	Fixed CMake files for HER - Removed subdirectory addition Change-Id: I419085db0b9034777409207a7d79b7ffa91eb8f1	2022-06-01 12:25:43 +05:30
Arnav Sharma	e5d5a43eab	Optimized ZHER Implementation - Implemented optimized her framework calls for double precision complex numbers. - The zher kernel operates over 4 columns at a time. Initially, it computes the diagonal elements of the matrix, then the 4x4 triangular part is computed and finally the remaining part is computed as 4x4 tiles of the matrix upto m rows. AMD-Internal: [CPUPL-2151] Change-Id: I27430ee33ffb901b3ef4bdd97b034e3f748e9cca	2022-05-25 14:03:01 +05:30
S, HariharaSudhan	a8bc55c373	Multithreaded SGEMV var 1 with smart threading - Implemented an OpenMP based stand alone SGEMV kernel for row-major (var 1) for multithread scenarios - Smart threading is enabled when AOCL DYNAMIC is defined - Number of threads are decided based on the input dims using smart threading AMD-Internal: [CPUPL-1984] Change-Id: I9b191e965ba7468e95aabcce21b35a533017502e	2022-05-17 18:10:39 +05:30
Dipal M Zambare	31921b9974	Updated windows build system to define BLIS_CONFIG_EPYC flag. All AMD specific optimization in BLIS are enclosed in BLIS_CONFIG_EPYC pre-preprocessor, this was not defined in CMake which are resulting in overall lower performance. Updated version number to 3.1.1 Change-Id: I9848b695a599df07da44e77e71a64414b28c75b9	2022-05-17 18:03:09 +05:30
Harsh Dave	351269219f	Optimized dher2 implementation - Impplemented her2 framework calls for transposed and non transposed kernel variants. - dher2 kernel operate over 4 columns at a time. It computes 4x4 triangular part of matrix first and remainder part is computed in chunk of 4x4 tile upto m rows. - remainder cases(m < 4) are handled serially. AMD-Internal: [CPUPL-1968] Change-Id: I12ae97b2ad673a7fd9b733c607f27b1089142313	2022-01-05 05:51:15 -06:00
Nageshwar Singh	cbd9ea76af	Complex single standalone gemv implementation independent of axpyf. Details - For axpyf implementation there are function(axpyf) calling overhead. - New implementations reduces function calling overhead. - This implementation uses kernel of size 8x4. - This implementation gives better performance for smaller sizes when compared to axpyf based implementation AMD-Internal: [CPUPL-1402] Change-Id: Ic9a5e59363290caf26284548638da9065952fd48	2021-11-12 08:58:55 +05:30
Nageshwar Singh	a3d04a21a0	Complex double standalone gemv implementation independent of axpyf. Details - For axpyf implementation there are function(axpyf) calling overhead. - New implementations reduces function calling overhead. - This implementation uses kernel of size 4x4. - This implementation gives better performance for smaller sizes when compared to axpyf based implementation AMD-Internal: [CPUPL-1402] Change-Id: I5fa421b8c1d2b44c991c2a05e8f5b01b83eb4b37	2021-11-12 08:58:54 +05:30
Meghana Vankadari	47744663d9	Enabling framework optimizations for zen family architectures. Details: - Introduced a new macro 'BLIS_CONFIG_EPYC' to enable blas and cblas framework optimizations for zen family configurations. - The macro needs to be defined in family.h files of respective arch configs. - Moved zen2-specific optimized kernels to zen folder, in order to be accessible to all zen family architectures. Change-Id: I8da2db6b7ab22ef350a01d86c214006e812eb06d	2020-10-07 13:10:50 +05:30

8 Commits