amd/blis - blis - Public git mirror

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-11 09:39:59 +00:00

Author	SHA1	Message	Date
Field G. Van Zee	a4e8801d08	Increased MT sup threshold for double to 201. Details: - Fine-tuned the double-precision real MT threshold (which controls whether the sup implementation kicks for smaller m dimension values) from 180 to 201 for haswell and 180 to 256 for zen. - Updated octave scripts in test/sup/octave to include a seventh column to display performance for m = n = k.	2019-05-31 17:30:51 -05:00
Field G. Van Zee	b9c9f03502	Implemented gemm on skinny/unpacked matrices. Details: - Implemented a new sub-framework within BLIS to support the management of code and kernels that specifically target matrix problems for which at least one dimension is deemed to be small, which can result in long and skinny matrix operands that are ill-suited for the conventional level-3 implementations in BLIS. The new framework tackles the problem in two ways. First the stripped-down algorithmic loops forgo the packing that is famously performed in the classic code path. That is, the computation is performed by a new family of kernels tailored specifically for operating on the source matrices as-is (unpacked). Second, these new kernels will typically (and in the case of haswell and zen, do in fact) include separate assembly sub-kernels for handling of edge cases, which helps smooth performance when performing problems whose m and n dimension are not naturally multiples of the register blocksizes. In a reference to the sub-framework's purpose of supporting skinny/unpacked level-3 operations, the "sup" operation suffix (e.g. gemmsup) is typically used to denote a separate namespace for related code and kernels. NOTE: Since the sup framework does not perform any packing, it targets row- and column-stored matrices A, B, and C. For now, if any matrix has non-unit strides in both dimensions, the problem is computed by the conventional implementation. - Implemented the default sup handler as a front-end to two variants. bli_gemmsup_ref_var2() provides a block-panel variant (in which the 2nd loop around the microkernel iterates over n and the 1st loop iterates over m), while bli_gemmsup_ref_var1() provides a panel-block variant (2nd loop over m and 1st loop over n). However, these variants are not used by default and provided for reference only. Instead, the default sup handler calls _var2m() and _var1n(), which are similar to _var2() and _var1(), respectively, except that they defer to the sup kernel itself to iterate over the m and n dimension, respectively. In other words, these variants rely not on microkernels, but on so-called "millikernels" that iterate along m and k, or n and k. The benefit of using millikernels is a reduction of function call and related (local integer typecast) overhead as well as the ability for the kernel to know which micropanel (A or B) will change during the next iteration of the 1st loop, which allows it to focus its prefetching on that micropanel. (In _var2m()'s millikernel, the upanel of A changes while the same upanel of B is reused. In _var1n()'s, the upanel of B changes while the upanel of A is reused.) - Added a new configure option, --[en\|dis]able-sup-handling, which is enabled by default. However, the default thresholds at which the default sup handler is activated are set to zero for each of the m, n, and k dimensions, which effectively disables the implementation. (The default sup handler only accepts the problem if at least one dimension is smaller than or equal to its corresponding threshold. If all dimensions are larger than their thresholds, the problem is rejected by the sup front-end and control is passed back to the conventional implementation, which proceeds normally.) - Added support to the cntx_t structure to track new fields related to the sup framework, most notably: - sup thresholds: the thresholds at which the sup handler is called. - sup handlers: the address of the function to call to implement the level-3 skinny/unpacked matrix implementation. - sup blocksizes: the register and cache blocksizes used by the sup implementation (which may be the same or different from those used by the conventional packm-based approach). - sup kernels: the kernels that the handler will use in implementing the sup functionality. - sup kernel prefs: the IO preference of the sup kernels, which may differ from the preferences of the conventional gemm microkernels' IO preferences. - Added a bool_t to the rntm_t structure that indicates whether sup handling should be enabled/disabled. This allows per-call control of whether the sup implementation is used, which is useful for test drivers that wish to switch between the conventional and sup codes without having to link to different copies of BLIS. The corresponding accessor functions for this new bool_t are defined in bli_rntm.h. - Implemented several row-preferential gemmsup kernels in a new directory, kernels/haswell/3/sup. These kernels include two general implementation types--'rd' and 'rv'--for the 6x8 base shape, with two specialized millikernels that embed the 1st loop within the kernel itself. - Added ref_kernels/3/bli_gemmsup_ref.c, which provides reference gemmsup microkernels. NOTE: These microkernels, unlike the current crop of conventional (pack-based) microkernels, do not use constant loop bounds. Additionally, their inner loop iterates over the k dimension. - Defined new typedef enums: - stor3_t: captures the effective storage combination of the level-3 problem. Valid values are BLIS_RRR, BLIS_RRC, BLIS_RCR, etc. A special value of BLIS_XXX is used to denote an arbitrary combination which, in practice, means that at least one of the operands is stored according to general stride. - threshid_t: captures each of the three dimension thresholds. - Changed bli_adjust_strides() in bli_obj.c so that bli_obj_create() can be passed "-1, -1" as a lazy request for row storage. (Note that "0, 0" is still accepted as a lazy request for column storage.) - Added support for various instructions to bli_x86_asm_macros.h, including imul, vhaddps/pd, and other instructions related to integer vectors. - Disabled the older small matrix handling code inserted by AMD in bli_gemm_front.c, since the sup framework introduced in this commit is intended to provide a more generalized solution. - Added test/sup directory, which contains standalone performance test drivers, a Makefile, a runme.sh script, and an 'octave' directory containing scripts compatible with GNU Octave. (They also may work with matlab, but if not, they are probably close to working.) - Reinterpret the storage combination string (sc_str) in the various level-3 testsuite modules (e.g. src/test_gemm.c) so that the order of each matrix storage char is "cab" rather than "abc". - Comment updates in level-3 BLAS API wrappers in frame/compat.	2019-04-27 18:44:50 -05:00
Field G. Van Zee	74e513eb6a	Support row storage in Eigen gemm test/3 driver. Details: - Added preprocessor branches to test/3/test_gemm.c to explicitly support row-stored matrices. Column-stored matrices are also still supported (and is the default for now). (This is mainly residual work leftover from initial integration of Eigen into the test drivers, so if we ever want to test Eigen with row-stored matrices, the code will be ready to use, even if it is not yet integrated into the Makefile in test/3.)	2019-04-17 13:34:44 -05:00
Field G. Van Zee	7bc75882f0	Updated Eigen results in docs/graphs with 3.3.90. Details: - Updated the level-3 performance graphs in docs/graphs with new Eigen results, this time using a development version cloned from their git mirror on March 27, 2019 (version 3.3.90). Performance is improved over 3.3.7, though still noticeably short of BLIS/MKL in most cases. - Very minor updates to docs/Performance.md and matlab scripts in test/3/matlab.	2019-03-28 17:40:50 -05:00
Field G. Van Zee	bfac7e385f	Added ability to plot with Eigen in test/3/matlab. Details: - Updated matlab scripts in test/3/matlab to optionally plot/display Eigen performance curves. Whether Eigen is plotted is determined by a new boolean function parameter, with_eigen. - Updated runme.m scratchpad to reflect the latest invocations of the plot_panel_4x5() function (with Eigen plotting enabled).	2019-03-27 16:04:48 -05:00
Field G. Van Zee	67535317b9	Fixed mislabeled eigen output from test/3 drivers. Details: - Fixed the Makefile in test/3 so that it no longer incorrectly labels the matlab output variables from Eigen-linked hemm, herk, trmm, and trsm driver output as "vendor". (The gemm drivers were already correctly outputing matlab variables containing the "eigen" label.)	2019-03-27 13:32:18 -05:00
Field G. Van Zee	5e6b160c8a	Link to Eigen BLAS for non-gemm drivers in test/3. Details: - Adjusted test/3/Makefile so that the test drivers are linked against Eigen's BLAS library for hemm, herk, trmm, and trsm. We have to do this since Eigen's headers don't define implementations to the standard BLAS APIs. - Simplified #included headers in hemm, herk, trmm, and trsm source driver files, since nothing specific to Eigen is needed at compile-time for those operations.	2019-03-26 19:10:59 -05:00
Field G. Van Zee	92fb9c87bf	Add more support for Eigen to drivers in test/3. Details: - Use compile-time implementations of Eigen in test_gemm.c via new EIGEN cpp macro, defined on command line. (Linking to Eigen's BLAS library is not necessary.) However, as of Eigen 3.3.7, Eigen only parallelizes the gemm operation and not hemm, herk, trmm, trsm, or any other level-3 operation. - Fixed a bug in trmm and trsm drivers whereby the wrong function (bli_does_trans()) was being called to determine whether the object for matrix A should be created for a left- or right-side case. This was corrected by changing the function to bli_is_left(), as is done in the hemm driver. - Added support for running Eigen test drivers from runme.sh.	2019-03-26 15:43:23 -05:00
Field G. Van Zee	288843b06d	Added Eigen support to test/3 Makefile, runme.sh. Details: - Added targets to test/3/Makefile that link against a BLAS library build by Eigen. It appears, however, that Eigen's BLAS library does not support multithreading. (It may be that multithreading is only available when using the native C++ APIs.) - Updated runme.sh with a few Eigen-related tweaks. - Minor tweaks to docs/Performance.md.	2019-03-20 17:52:23 -05:00
Field G. Van Zee	913cf97653	Added docs/Performance.md and docs/graphs subdir. Details: - Added a new markdown document, docs/Performance.md, which reports performance of a representative set of level-3 operations across a variety of hardware architectures, comparing BLIS to OpenBLAS and a vendor library (MKL on Intel/AMD, ARMPL on ARM). Performance graphs, in pdf and png formats, reside in docs/graphs. - Updated README.md to link to new Performance.md document. - Minor updates to CREDITS, docs/Multithreading.md. - Minor updates to matlab scripts in test/3/matlab.	2019-03-19 16:15:24 -05:00
Field G. Van Zee	b938c16b0c	Renamed test/3m4m to test/3. Details: - Renamed '3m4m' directory to '3', which captures the directory nicely since it builds test drivers to test level-3 operations. - These test drivers ceased to be used to test the 3m and 4m (or even 1m) induced methods long ago, hence the name change.	2019-03-07 16:40:39 -06:00
Field G. Van Zee	ab89a40582	More minor updates and edits to test/3m4m. Details: - Further updates to matlab scripts, mostly for compatibility with GNU Octave. - More tweaks to runme.sh. - Updates to runme.m that allow copy-paste into matlab interactive session to generate graphs.	2019-03-07 16:26:12 -06:00
Field G. Van Zee	f0e70dfbf3	Very minor updates to test/3m4m for ul252. Details: - Very minor updates to the newly revamped test/3m4m drivers when used on a Xeon Platinum (SkylakeX).	2019-03-07 01:04:05 +00:00
Field G. Van Zee	9f1dbe572b	Overhauled test/3m4m Makefile and scripts. Details: - Rewrote much of Makefile to generate executables for single- and dual- socket multithreading as well as single-threaded. Each of the three can also use a different problem size range/increment, as is often appropriate when doubling/halving the number of threads. - Rewrote runme.sh script to flexibly execute as many threading parameter scenarios as is given in the input parameter string (currently set within the script itself). The string also encodes the maximum problem size for each threading scenario, which is used to identify the executable to run. Also improved the "progress" output of the script to reduce redundant info and improve readability in terminals that are not especially wide. - Minor updates to test_*.c source files. - Updated matlab scripts according to changes made to the Makefile, test drivers, and runme.sh script, and renamed 'plot_all.m' to 'runme.m'.	2019-03-05 17:47:55 -06:00
Field G. Van Zee	e2a02ebd00	Updates (from ls5) to test/3m4m/runme.sh. Details: - Lonestar5-specific updates to runme.sh.	2019-02-28 13:58:59 -06:00
Field G. Van Zee	8e023bc914	Updates to 3m4m/matlab scripts. Details: - Minor updates to matlab graph-generating scripts. - Added a plot_all.m script that is more of a scratchpad for copying and pasting function invocations into matlab to generate plots that are presently of interest to us.	2019-02-22 16:55:30 -06:00
Field G. Van Zee	b1f5ce8622	Minor updates to scripts in test/mixeddt/matlab.	2019-02-05 17:38:50 -06:00
Devangi N. Parikh	38203ecd15	Added thunderx2 system in the mixeddt test scripts Details: - Added thunderx2 (tx2) as a system in the runme.sh in test/mixeddt	2019-02-04 15:28:28 -05:00
Field G. Van Zee	58c7fb4788	Added more matlab scripts for mixeddt paper. Details: - Added a variant set of matlab scripts geared to producing plots that reflect performance data gathered with and without extra memory optimizations enabled. These scripts reside (for now) in test/mixeddt/matlab/wawoxmem.	2019-01-08 17:00:27 -06:00
Field G. Van Zee	6885051a16	Generalizations/cleanup to mixeddt matlab scripts. Details: - Parameterized, reorganized, and added comments to matlab scripts in test/mixeddt/matlab. - Reordered some lines of code and added comments to plot_l3_perf.m in test/3m4m/matlab.	2018-12-05 14:45:39 -06:00
Field G. Van Zee	cbdb0566bf	Updates to 3m4m, mixeddt test driver files. Details: - Updated 3m4m and mixeddt Makefiles and runme.sh scripts, mostly to port recent changes to the former to the latter. - Disabled (for now) code in 3m4m/test_*.c files that disables all induced methods except for the one that is requested from the Makefile via the IND macro. This is done because usually, we want to test whatever method is enabled automatically for complex datatypes. (That is, when native complex microkernels are missing, we usually want to test performance of 1m.)	2018-12-05 20:06:32 +00:00
Field G. Van Zee	0645f239fb	Remove UT-Austin from copyright headers' clause 3. Details: - Removed explicit reference to The University of Texas at Austin in the third clause of the license comment blocks of all relevant files and replaced it with a more all-encompassing "copyright holder(s)". - Removed duplicate words ("derived") from a few kernels' license comment blocks. - Homogenized license comment block in kernels/zen/3/bli_gemm_small.c with format of all other comment blocks.	2018-12-04 14:31:06 -06:00
Field G. Van Zee	22384fd2b7	Minor updates to test_gemm.c in test/mixeddt.	2018-12-04 13:09:04 -06:00
Field G. Van Zee	279deae18f	Added 4x5 matlab plotting scripts to test/3m4m. Details: - Added a new directory, test/3m4m/matlab, containing matlab scripts for plotting 4x5 panels of performance graphs (using the subplot() function) for gemm, hemm, herk, trmm, and trsm across all four floating-point datatypes. I expect to further refine these scripts as time goes on, but their current state constitutes a good start.	2018-11-16 11:34:19 -06:00
Field G. Van Zee	7b5ba7319b	Merge branch 'dev' of github.com:flame/blis into dev	2018-11-14 12:32:01 -06:00
Field G. Van Zee	52392932dc	Minor fixes to test/3m4m drivers. Details: - Cleanups to Makefile to allow all test drivers to be built for OpenBLAS and MKL in addition to BLIS. - Fixed copy-paste typos in test_hemm in calls to ssymm_() and dsymm_(). - Fixed incorrect types for betap in BLAS cpp macro branch of test_herk.c.	2018-11-13 22:23:38 +00:00
Field G. Van Zee	4f12e36a0d	Fixed number of columns in first output line. Details: - In previous commit, forgot to remove output column corresponding to the k dimension.	2018-11-13 14:23:12 -06:00
Field G. Van Zee	a2e0cdd7de	Added hemm test driver to test/3m4m. Details: - Added a new test_hemm.c test driver to test/3m4m, which was modeled after the driver by the similar name in test. Also updated Makefile so that blis-nat-[sm]t would trigger builds for the new driver.	2018-11-13 14:15:11 -06:00
Field G. Van Zee	ce719f816d	More edits to mixeddt matlab scripts. Details: - Renamed scripts in test/mixeddt/matlab: plot_case_all.m -> plot_dom_all.m plot_case_md.m -> plot_dom_case.m plot_all_md.m -> plot_dt_all.m - Added plot_dt_select.m in order to plot select graphs for the main body of the mixeddt paper, and added additional related legend handling in plot_gemm_perf.m. - Added test/mixeddt/matlab/output and a .gitkeep file within in order to force git to recognize the directory.	2018-11-10 14:48:43 -06:00
Field G. Van Zee	bf99e7c14b	Minor updates to test/mixeddt driver. Details: - Cleaned up test/mixeddt Makefile in preparation for gathering new data for mixeddt paper, including renaming implementations to "internal" and "ad-hoc" to match the terminology to be used in the paper. - Added new matlab scripts for generating 8 figures, each covering all mixed-precision cases for each mixed-domain case. - Updated the runme.sh script according to changes to Makefile. - Fixed a minor bug in test_gemm.c that may have given incorrect performance in complex, homogeneous storage datatype cases where the computation precision was equal to the storage precisions. (Examples: zzzd, cccs.)	2018-11-08 18:47:17 -06:00
Field G. Van Zee	06c23954e6	Defined unified bli_pthreads_() API for all OSes. Details: - Expanded the bli_pthread_() -> pthread_() wrappers in frame/thread/bli_pthread.c to include cases for Windows taken from frame/base/bli_pthread_wrap.c. Now, bli_thread_() is always defined and always used by BLIS and the BLIS testsuite (in lieu of calling pthreads directly, as before). The implementation used in this new API depends on whether we are building for Windows, and to a lesser extent, whether we are building on OS X. For the core API, Windows uses Windows threads, non-Windows (Linux, OS X) uses pthreads. OS X and Windows get barriers implemented in terms of other bli_pthread_() functions, and Linux gets barriers implemented in terms of pthread_barrier(). This commit addresses issue #273. - Fixed a bug in the Linux definition of bli_pthread_mutex_unlock(), which was erroneously calling pthread_mutex_lock(). - Minor changes to configure so that the auto-detection executable can be built given the above changes (most notably, turning on POSIX extensions via -D_GNU_SOURCE). - Removed temporary play-test code for shiftd that accidentally got committed into test/3m4m/test_gemm.c.	2018-10-23 19:16:54 -05:00
Field G. Van Zee	090e4f08fc	Merge branch 'master' into dev	2018-10-19 18:41:10 -05:00
Field G. Van Zee	bb6df2814f	Defined a new level-1d operation: shiftd. Details: - Defined a new level-1d operation called 'shiftd', including object and typed APIs. This operation adds a scalar value to every element along an arbitrary diagonal of a matrix. Currently, shiftd is implemented in terms of the addv kernel. (The scalar is passed in as the x vector with an increment of zero.) - Replaced ad-hoc usage of setd and addd (after creating a temporary matrix object) with use of shiftd, which is much more concise, in various test driver files in the testsuite. Similar changes were made to the standalone test drivers and the example code. - Added documentation entries in BLISObjectAPI.md and BLISTypedAPI.md for bli_shiftd() and bli_?shiftd(), respectively. - Added observed object properties to level-1d documentation in BLISObjectAPI.md.	2018-10-18 17:11:39 -05:00
Field G. Van Zee	49d3f9fcbb	Merge branch 'master' into dev	2018-10-17 18:00:40 -05:00
Field G. Van Zee	5fec95b99f	Implemented mixed-datatype support for gemm. Details: - Implemented support for gemm where A, B, and C may have different storage datatypes, as well as a computational precision (and implied computation domain) that may be different from the storage precision of either A or B. This results in 128 different combinations, all which are implemented within this commit. (For now, the mixed-datatype functionality is only supported via the object API.) If desired, the mixed-datatype support may be disabled at configure-time. - Added a memory-intensive optimization to certain mixed-datatype cases that requires a single m-by-n matrix be allocated (temporarily) per call to gemm. This optimization aims to avoid the overhead involved in repeatedly updating C with general stride, or updating C after a typecast from the computation precision. This memory optimization may be disabled at configure-time (provided that the mixed-datatype support is enabled in the first place). - Added support for testing mixed-datatype combinations to testsuite. The user may test gemm with mixed domains, precisions, both, or neither. - Added a standalone test driver directory for building and running mixed-datatype performance experiments. - Defined a new variation of castm, castnzm, which operates like castm except that imaginary values are not touched when casting a real operand to a complex operand. (By contrast, in these situations castm sets the imaginary components of the destination matrix to zero.) - Defined bli_obj_imag_is_zero() and substituted calls in lieu of all usages of bli_obj_imag_equals() that tested against BLIS_ZERO, and also simplified the implementation of bli_obj_imag_equals(). - Fixed bad behavior from bli_obj_is_real() and bli_obj_is_complex() when given BLIS_CONSTANT objects. - Disabled dt_on_output field in auxinfo_t structure as well as all accessor functions. Also commented out all usage of accessor functions within macrokernels. (Typecasting in the microkernel is still feasible, though probably unrealistic for now given the additional complexity required.) - Use void function pointer type (instead of void*) for storing function pointers in bli_l0_fpa.c. - Added documentation for using gemm with mixed datatypes in docs/MixedDatatypes.md and example code in examples/oapi/11gemm_md.c. - Defined level-1d operation xpbyd and level-1m operation xpbym. - Added xpbym test module to testsuite. - Updated frame/include/bli_x86_asm_macros.h with additional macros (courtsey of Devin Matthews).	2018-10-15 16:37:39 -05:00
Field G. Van Zee	98e01ea04b	Merge branch 'master' into amd	2018-10-04 20:44:12 -05:00
Devangi N. Parikh	8bf30eb473	Fixed runme.sh in test/studies/thunderx2 Details: - Fixed the setting of threads for a single core run.	2018-10-03 22:22:29 -04:00
Devangi N. Parikh	f6f2456ba2	Fixed the Makefile in test/studies/thunderx2 Details: - Fixed target for make-all-st and make-all-mt so that the armpl targets are built	2018-10-03 21:43:46 -04:00
Field G. Van Zee	ac18949a4b	Multithreading optimizations for l3 macrokernels. Details: - Adjusted the method by which micropanels are assigned to threads in the 2nd (jr) and 1st (ir) loops around the microkernel to (mostly) employ contiguous "slab" partitioning rather than interleaved (round robin) partitioning. The new partitioning schemes and related details for specific families of operations are listed below: - gemm: slab partitioning. - herk: slab partitioning for region corresponding to non-triangular region of C; round robin partitioning for triangular region. - trmm: slab partitioning for region corresponding to non-triangular region of B; round robin partitioning for triangular region. (NOTE: This affects both left- and right-side macrokernels: trmm_ll, trmm_lu, trmm_rl, trmm_ru.) - trsm: slab partitioning. (NOTE: This only affects only left-side macrokernels trsm_ll, trsm_lu; right-side macrokernels were not touched.) Also note that the previous macrokernels were preserved inside of the 'other' directory of each operation family directory (e.g. frame/3/gemm/other, frame/3/herk/other, etc). - Updated gemm macrokernel in sandbox/ref99 in light of above changes and fixed a stale function pointer type in blx_gemm_int.c (gemm_voft -> gemm_var_oft). - Added standalone test drivers in test/3m4m for herk, trmm, and trsm and minor changes to test/3m4m/Makefile. - Updated the arguments and definitions of bli__get_next_[ab]_upanel() and bli_trmm_?_?r_my_iter() macros defined in bli_l3_thrinfo.h. - Renamed bli_thread_get_range() APIs to bli_thread_range*().	2018-09-30 18:54:56 -05:00
Devangi N. Parikh	02adab427c	Created a 'thunderx2' subdirectory within test/studies Details: - Created a 'thunderx2' subdirectory within test/studies to house various level-3 test driver used to measure performance on ThunderX2.	2018-09-20 14:38:50 -04:00
Devangi N. Parikh	dad07245db	Fixed yet another bug in runme script in test/studies Details: - Fixed another copy-paste bug	2018-09-12 04:16:58 -05:00
Devangi N. Parikh	e669057fe3	Fixed bug in runme script in test/studies Details: - Fixed bug in runme script for skx studies that set the number of threads incorrectly	2018-09-11 22:29:42 -05:00
Devangi N. Parikh	232fdc3df3	Updated runme script in test/studies. Details: - Updated runme script for skx studies to run multithreading tests on 1 and 2 sockets.	2018-09-10 18:45:50 -05:00
Field G. Van Zee	4b5437ec7a	Define a cpp macro specific to BLIS compilation. Details: - Tweaked the cflags functions in common.mk so that a new preprocessor macro, BLIS_IS_BUILDING_LIBRARY, is defined, but only when BLIS itself is being built. This macro will not be defined when, for example, the testsuite or example code compiles code local to those applications. This was done in part by defining a new cflags function get-user-cflags-for(), which is now the designated function for application Makefiles if they wish to inherit a basic set of CFLAGS from BLIS. (The compiler flags returned are identical to that of get-frame-cflags-for() except that -DBLIS_IS_BUILDING_LIBRARY is omitted.) - Updated all test driver-like makefiles to call get-user-cflags-for() instead of get-frame-cflags-for().	2018-09-07 17:24:32 -05:00
Field G. Van Zee	4fa4cb0734	Trivial comment header updates. Details: - Removed four trailing spaces after "BLIS" that occurs in most files' commented-out license headers. - Added UT copyright lines to some files. (These files previously had only AMD copyright lines but were contributed to by both UT and AMD.) - In some files' copyright lines, expanded 'The University of Texas' to 'The University of Texas at Austin'. - Fixed various typos/misspellings in some license headers.	2018-08-29 18:06:41 -05:00
Field G. Van Zee	0f491e994a	Allow lesser Makefiles to reference installed BLIS. Details: - Updated the build system so that "lesser" Makefiles, such as those in belonging to example code or the testsuite, may be run even if the directory is orphaned from the original build tree. This allows a user to configure, compile, and install BLIS, delete the build tree (that is, the source distribution, or the build directory for out- of-tree builds) and then compile example or testsuite code and link against the installed copy of BLIS (provided the example or testsuite directory was preserved or obtained from another source). The only requirement is that make be invoked while setting the BLIS_INSTALL_PATH variable to the same installation prefix used when BLIS was configured. The easiest syntax is: make BLIS_INSTALL_PATH=/install/prefix though it's also permissible to set BLIS_INSTALL_PATH as an environment variable prior to running 'make'. - Updated all lesser Makefiles to implement the new aforementioned build behavior. - Relocated check-blastest.sh and check-blistest.sh from build to blastest and testsuite, respectively, so that if those directories are copied elsewhere the user can still run 'make check' locally. - Updated docs/Testsuite.md with language that mentions this new option of building/linking against an installed copy of BLIS.	2018-08-25 20:12:36 -05:00
Devangi N. Parikh	0bbe69d5ed	Updated plotting scripts in test/studies. Details: - Fixed indexing on plots to correspond to the removal of dtime in the test drivers.	2018-08-14 14:49:58 -05:00
Field G. Van Zee	addce08966	Format spec and other updates in test, test/3m4m. Details: - Removed the dtime (delta time, or wallclock time) column from the matlab output of all test drivers in test, test/3m4m, test/studies. This value was rarely (if ever) really needed and usually only served to take up screen space. - Updated format specifier in test/studies/skx to use %7.2f instead of %6.3f. - For the test drivers in 'test' directory, added an initial line of output that sets last entry of matlab matrix to zero in order to induce a pre-allocation of the entire array of performance results.	2018-08-06 13:18:20 -05:00
Field G. Van Zee	94d5ef42c8	Adjusted gflops format spec in testsuite, test/3m4m. Details: - Changed the format specifier for the gflops column in the testsuite output from %7.3f to %7.2f. This was done mainly to keep the output aligned properly when the expected perfomance exceeded 1000 gflops. Also, two decimal places still conveys plenty of precision for all practical applications, including just eyeballing performance deltas between two executions (let alone two implementations). - Changed the format specifier for gflops in the test/3m4m drivers from %6.3f to %7.2f (for the same reasons listed above).	2018-08-04 15:57:17 -05:00
Devangi N. Parikh	323eaaab99	Removed left over code from plotting scripts.	2018-07-13 11:40:06 -05:00

1 2 3

121 Commits