amd/blis - blis - Public git mirror

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-07-16 08:42:21 +00:00

Author	SHA1	Message	Date
kdevraje	13806ba3b0	This check in has changes w.r.t Copyright information, which is changed to (start year) - 2019 Change-Id: Ide3c8f7172210b8d3538d3c36e88634ab1ba9041	2019-05-27 16:24:43 +05:30
Field G. Van Zee	0645f239fb	Remove UT-Austin from copyright headers' clause 3. Details: - Removed explicit reference to The University of Texas at Austin in the third clause of the license comment blocks of all relevant files and replaced it with a more all-encompassing "copyright holder(s)". - Removed duplicate words ("derived") from a few kernels' license comment blocks. - Homogenized license comment block in kernels/zen/3/bli_gemm_small.c with format of all other comment blocks.	2018-12-04 14:31:06 -06:00
Field G. Van Zee	4fa4cb0734	Trivial comment header updates. Details: - Removed four trailing spaces after "BLIS" that occurs in most files' commented-out license headers. - Added UT copyright lines to some files. (These files previously had only AMD copyright lines but were contributed to by both UT and AMD.) - In some files' copyright lines, expanded 'The University of Texas' to 'The University of Texas at Austin'. - Fixed various typos/misspellings in some license headers.	2018-08-29 18:06:41 -05:00
Field G. Van Zee	017548314f	Replaced function chooser macros w/ func ptr arrays. Details: - Previously, most object API functions (_oapi.c) used a function chooser macro that would expand out to an if-elseif-elseif-else conditional that used a num_t datatype to call the appropriate type-specific API (_tapi.c). This always felt a little hackish, and would get in the way somewhat of addig support for new num_t datatypes in the future. So, I've replaced that functionality with code that queries a function pointer that is then typecast appropriately. This model of function calling was already pervasive for kernels queried from the cntx_t structure. It was also already in use in various other functions, such as macrokernels, and this commit simply extends that pattern. - The above change required many new files, mostly header files, that define the function types (mostly _ft.h) for the queriable functions as well as some source files to define the function pointer arrays and their corresponding query functions (_fpa.c). Various other function types, mostly for kernel function types, were renamed to reduce the potential for confusion with the function types for expert and basic (non-expert) typed API functions. - Removed definitions for all of the "bli_call_ft_*()" function chooser macros from bli_misc_macro_defs.h.	2018-08-07 14:13:25 -05:00
Isuru Fernando	14648e1376	Native windows support using clang (#227 ) * Add appveyor file * Build script * Remove fPIC for now * copy as * set CC and CXX * Change the order of immintrin.h * Fix testsuite header * Move testsuite defs to .c * Fix appveyor file * Remove fPIC again and fix strerror_r missing bug * Remove appveyor script * cd to blis directory * Fix sleep implementation * Add f2c_types_win.h * Fix f2c compilation * Remove rdp and rename appveyor.yml * Remove setenv declaration in test header * set CPICFLAGS to empty * Fix another immintrin.h issue * Escape CFLAGS and LDFLAGS * Fix more ?mmintrin.h issues * Build x86_64 in appveyor * override LIBM LIBPTHREAD AR AS * override pthreads in configure * Move windows definitions to bli_winsys.h * Fix LIBPTHREAD default value * Build intel64 in appveyor for now	2018-07-04 17:48:42 -05:00
Field G. Van Zee	3f02af0905	Row storage optimizations to zen dotxf kernels. Details: - Split the main loop bodies of zen's [sd]dotxf kernels into two cases: one to handle a column-stored matrix A and one to handle a row-stored matrix A. This allows vector instructions to be employed even if A is stored by rows (and A^T appears stored as columns). Both storage cases use a common edge case loop. Thanks to Devin Matthews for this idea and for prototyping the change needed for sdotxf kernel.	2018-03-26 17:40:04 -05:00
Field G. Van Zee	e2192a8fd5	Removed vzeroupper intrinsics from zen kenels. Details: - Fixed a bug in the zen (also used by haswell) dotxf kernels whereby a vzeroupper instruction destoryed part of the intermediate result stored by the vdpps instructions that came right before. (The vzeroupper instrinsic was removed.) - Removed remaining vzeroupper instrinsics from other zen kernels. Previously, the vzeroupper instructions were included because BLIS is typically compiled with -mfpmath=sse. But it was brought to my attention that inserting these vzeroupper instructions is unnecessary for our purposes, since (a) -mfpmath=sse results in VEX-encoded scalar code rather than literal SSE instructions, and (b) compilers already (likely) insert vzeroupper instructions where necessary. Thanks to Devin Matthews for zeroing in on the dotxf bug. - Removed -malign-double from bulldozer make_defs.mk. This alignment was already happening by default since bulldozer is an x86_64 system.	2018-03-23 12:53:48 -05:00
Field G. Van Zee	5112e1859e	Added missing 'restrict' to some kernels' cntx_t. Details: - Added missing 'restrict' keyword to cntx_t argument of function signatures corresponding to level-1v, level-1f, and level-1m kernels. This affected bli_l1v_ker_prot.h, bli_l1f_ker_prot.h, and bli_l1m_ker_prot.h. (The 'restrict' was already being used to qualify cntx_t* arguments for kernels defined in bli_l3_ker_prot.h.) - Added comments to bli_l1v_ker.h, bli_l1f_ker.h, bli_l1m_ker.h, and bli_l3_ukr.h that help explain how those headers function to produce kernel prototypes using the prototype macros defined in the files mentioned above.	2018-02-23 14:31:26 -06:00
Field G. Van Zee	16813335bd	Merge branch 'amd' into rt Details: - Merged contributions made by AMD via 'amd' branch (see summary below). Special thanks to AMD for their contributions to-date, especially with regard to intrinsic- and assembly-based kernels. - Added column storage output cases to microkernels in bli_gemm_zen_asm_d6x8.c and bli_gemmtrsm_l_zen_asm_d6x8.c. Even with the extra cost of transposing the microtile in registers, this is much faster than using the general storage case when the underlying matrix is column-stored. - Added s and d assembly-based zen gemmtrsm_u microkernel (including column storage optimization mentioned above). - Updated zen sub-configuration to reflect presence of new native kernels. - Temporarily reverted zen sub-configuration's level-3 cache blocksizes to smaller haswell values. - Temporarily disabled small matrix handling for zen configuration family in config/zen/bli_family_zen.h. - Updated zen CFLAGS according to changes in `1e4365b`. - Updated haswell microkernels such that: - only one vzeroupper instruction is called prior to returning - movapd/movupd are used in leiu of movaps/movups for double-real microkernels. (Note that single-real microkernels still use movaps/movups.) - Added kernel prototypes to kernels/zen/bli_kernels_zen.h, which is now included via frame/include/bli_arch_config.h. - Minor updates to bli_amaxv_ref.c (and to inlined "test" implementation in testsuite/src/test_amaxv.c). - Added early return for alpha == 0 in bli_dotxv_ref.c. - Integrated changes from `f07b176`, including a fix for undefined behavior when executing the 1m method under certain conditions. - Updated config_registry; no longer need haswell kernels for zen sub-configuration. - Tweaked marginal and pass thresholds for dotxf. - Reformatted level-1v, -1f, and -3 amd kernels and inserted additional comments. - Updated LICENSE file to explicitly mention that parts are copyright UT-Austin and AMD. - Added AMD copyright to header templates in build/templates. Summary of previous changes from 'amd' branch. - Added s and d assembly-based zen gemm microkernels (d6x8 and d8x6) and s and d assembly-based zen gemmtrsm_l microkernels (d6x8). - Added s and d intrinsics-based zen kernels for amaxv, axpyv, dotv, dotxv, and scalv, with extra-unrolling variants for axpyv and scalv. - Added a small matrix handler to bli_gemm_front(), with the handler implemented in kernels/zen/3/bli_gemm_small_matrix.c. - Added additional logic to sumsqv that first attempts to compute the sum of the squares via dotv(). If there is a floating-point exception (FE_OVERFLOW), then the previous (numerically conservative) code is used; otherwise, the result of dotv() is square-rooted and stored as the result. This new implementation is only enabled when FE_OVERFLOW is #defined. If the macro is not #defined, then the previous implementation is used. - Added axpyv and dotv standalone test drivers to test directory. - Added zen support to old cpuid_x86.c driver in build/auto-detect/old. - Added thread-local and __attribute__-related macros to bli_macro_defs.h.	2018-02-21 17:43:32 -06:00

9 Commits