Commit Graph

128 Commits

Author SHA1 Message Date
Field G. Van Zee
22768bf959 Updated Eigen results in docs/graphs with 3.3.90.
Details:
- Updated the level-3 performance graphs in docs/graphs with new Eigen
  results, this time using a development version cloned from their git
  mirror on March 27, 2019 (version 3.3.90). Performance is improved
  over 3.3.7, though still noticeably short of BLIS/MKL in most cases.
- Very minor updates to docs/Performance.md and matlab scripts in
  test/3/matlab.
2019-08-23 14:18:08 +05:30
Field G. Van Zee
ee44719d43 Added ability to plot with Eigen in test/3/matlab.
Details:
- Updated matlab scripts in test/3/matlab to optionally plot/display
  Eigen performance curves. Whether Eigen is plotted is determined by
  a new boolean function parameter, with_eigen.
- Updated runme.m scratchpad to reflect the latest invocations of the
  plot_panel_4x5() function (with Eigen plotting enabled).
2019-08-23 14:18:08 +05:30
Field G. Van Zee
bd6cdd884b Fixed mislabeled eigen output from test/3 drivers.
Details:
- Fixed the Makefile in test/3 so that it no longer incorrectly labels
  the matlab output variables from Eigen-linked hemm, herk, trmm, and
  trsm driver output as "vendor". (The gemm drivers were already
  correctly outputing matlab variables containing the "eigen" label.)
2019-08-23 14:18:08 +05:30
Field G. Van Zee
b495ca9b76 Link to Eigen BLAS for non-gemm drivers in test/3.
Details:
- Adjusted test/3/Makefile so that the test drivers are linked against
  Eigen's BLAS library for hemm, herk, trmm, and trsm. We have to do
  this since Eigen's headers don't define implementations to the
  standard BLAS APIs.
- Simplified #included headers in hemm, herk, trmm, and trsm source
  driver files, since nothing specific to Eigen is needed at
  compile-time for those operations.
2019-08-23 14:18:08 +05:30
Field G. Van Zee
728e9666e1 Add more support for Eigen to drivers in test/3.
Details:
- Use compile-time implementations of Eigen in test_gemm.c via new
  EIGEN cpp macro, defined on command line. (Linking to Eigen's BLAS
  library is not necessary.) However, as of Eigen 3.3.7, Eigen only
  parallelizes the gemm operation and not hemm, herk, trmm, trsm, or
  any other level-3 operation.
- Fixed a bug in trmm and trsm drivers whereby the wrong function
  (bli_does_trans()) was being called to determine whether the object
  for matrix A should be created for a left- or right-side case. This
  was corrected by changing the function to bli_is_left(), as is done
  in the hemm driver.
- Added support for running Eigen test drivers from runme.sh.
2019-08-23 14:18:08 +05:30
Field G. Van Zee
a1c8b11b3f Added Eigen support to test/3 Makefile, runme.sh.
Details:
- Added targets to test/3/Makefile that link against a BLAS library
  build by Eigen. It appears, however, that Eigen's BLAS library does
  not support multithreading. (It may be that multithreading is only
  available when using the native C++ APIs.)
- Updated runme.sh with a few Eigen-related tweaks.
- Minor tweaks to docs/Performance.md.
2019-08-23 14:18:08 +05:30
Field G. Van Zee
c6793be46e Added docs/Performance.md and docs/graphs subdir.
Details:
- Added a new markdown document, docs/Performance.md, which reports
  performance of a representative set of level-3 operations across a
  variety of hardware architectures, comparing BLIS to OpenBLAS and a
  vendor library (MKL on Intel/AMD, ARMPL on ARM). Performance graphs,
  in pdf and png formats, reside in docs/graphs.
- Updated README.md to link to new Performance.md document.
- Minor updates to CREDITS, docs/Multithreading.md.
- Minor updates to matlab scripts in test/3/matlab.
2019-08-23 14:18:08 +05:30
Field G. Van Zee
da590746b0 Renamed test/3m4m to test/3.
Details:
- Renamed '3m4m' directory to '3', which captures the directory nicely
  since it builds test drivers to test level-3 operations.
- These test drivers ceased to be used to test the 3m and 4m (or even
  1m) induced methods long ago, hence the name change.
2019-08-23 14:18:07 +05:30
Field G. Van Zee
2ebe4aafe5 More minor updates and edits to test/3m4m.
Details:
- Further updates to matlab scripts, mostly for compatibility with
  GNU Octave.
- More tweaks to runme.sh.
- Updates to runme.m that allow copy-paste into matlab interactive
  session to generate graphs.
2019-08-23 14:18:07 +05:30
Field G. Van Zee
09ed72c5a7 Very minor updates to test/3m4m for ul252.
Details:
- Very minor updates to the newly revamped test/3m4m drivers when used
  on a Xeon Platinum (SkylakeX).
2019-08-23 14:18:07 +05:30
Field G. Van Zee
8beda64ea5 Overhauled test/3m4m Makefile and scripts.
Details:
- Rewrote much of Makefile to generate executables for single- and dual-
  socket multithreading as well as single-threaded. Each of the three
  can also use a different problem size range/increment, as is often
  appropriate when doubling/halving the number of threads.
- Rewrote runme.sh script to flexibly execute as many threading
  parameter scenarios as is given in the input parameter string
  (currently set within the script itself). The string also encodes
  the maximum problem size for each threading scenario, which is used
  to identify the executable to run. Also improved the "progress" output
  of the script to reduce redundant info and improve readability in
  terminals that are not especially wide.
- Minor updates to test_*.c source files.
- Updated matlab scripts according to changes made to the Makefile,
  test drivers, and runme.sh script, and renamed 'plot_all.m' to
  'runme.m'.
2019-08-23 14:18:07 +05:30
Field G. Van Zee
7918a6deca Updates (from ls5) to test/3m4m/runme.sh.
Details:
- Lonestar5-specific updates to runme.sh.
2019-08-23 14:18:07 +05:30
Field G. Van Zee
84282bba54 Updates to 3m4m/matlab scripts.
Details:
- Minor updates to matlab graph-generating scripts.
- Added a plot_all.m script that is more of a scratchpad for copying and
  pasting function invocations into matlab to generate plots that are
  presently of interest to us.
2019-08-23 14:18:07 +05:30
kdevraje
13806ba3b0 This check in has changes w.r.t Copyright information, which is changed to (start year) - 2019
Change-Id: Ide3c8f7172210b8d3538d3c36e88634ab1ba9041
2019-05-27 16:24:43 +05:30
kdevraje
a3554eb1dc Merge branch 'amd-staging-rome2.0' of ssh://git.amd.com:29418/cpulibraries/er/blis to configure zen2
Change-Id: I97e17bca9716b80b862925f97bb513c07b4b0cae
2019-05-23 11:53:32 +05:30
Kiran Varaganti
b80bd5bcb2 config/zen/bli_cntx_init_zen.c: removed BLIS_ENBLE_ZEN_BLOCK_SIZES macro. We have different configurations for both zen and zen2
config/zen/bli_family_zen.h: deleted macro BLIS_ENBLE_ZEN_BLOCK_SIZES
config/zen/make_defs.mk: removed compiler flag -mno-avx256-split-unaligned-store
frame/base/bli_cpuid.c: ROME family is 17H but model # is from 0x30H.
test/test_gemm.c - commented out #define FILE_IN_OUT (some compilation error when BLIS is configured as amd64)
Now we can use single configuration has ./configure amd64 - this will work both for ROME & Naples

Change-Id: I91b4fc35380f8a35b4f4c345da040c6b5910b4a2
2019-05-22 05:51:22 -04:00
kdevraje
df755848b8 Merge branch 'amd-staging-rome2.0' of ssh://git.amd.com:29418/cpulibraries/er/blis into rome2.0
Change-Id: Ie8aad1ab810f0f3c0b90ec67f9dd3dfb8dcc74cc
2019-05-22 13:30:07 +05:30
Kiran Varaganti
53842c7e7d Removed printing alpha and beta values
Change-Id: I49102db510311a30f6a936f9d843f35838f50d23
2019-03-22 13:57:14 +05:30
Kiran Varaganti
6805db45e3 Corrected setting alpha & beta values- alpha = -1 and beta = 1 - bli_setc(-1.0, 0, &alpha) should be used rather than bli_setc(0.0, -1.0, &alpha). This corrected now
Change-Id: Ic1102dfd6b50ccf212386a1211c6f31e8d987ef9
2019-03-22 12:55:35 +05:30
Kiran Varaganti
20153cd4b5 Modified test_gemm.c file in test folder
A Macro 'FILE_IN_OUT" is defined to read input parameters from a csv file.
Format for input file:
Each line defines a gemm problem with following parameters: m k n cs_a cs_b cs_c
The operation always implemented is C = C - A*B and column-major format.
When macro is disabled - it reverts back to original implementation.
Usage: ./test_gemm_<mkl/blis/openblas>.x input.csv output.csv
GEMM is called through BLAS interface
For BLIS - the test application also prints either 'S' indicating small gemm routine or 'N' - conventional BLIS gemm
for MKL/OpenBLAS - ignore this character

Change-Id: I0924ef2c1f7bdea48d4cdb230b888e2af2c86a36
2019-03-21 16:23:53 +05:30
Kiran Varaganti
f5ed95ecd7 Merged BLIS Release 1.3
Modified config/zen/make_defs.mk, now CKVECFLAGS     := -mavx2 -mfpmath=sse -mfma -march=znver1

Change-Id: Ia0942d285a21447cd0c470de1bc021fe63e80d81
2019-03-05 15:03:57 +05:30
Field G. Van Zee
b1f5ce8622 Minor updates to scripts in test/mixeddt/matlab. 2019-02-05 17:38:50 -06:00
Devangi N. Parikh
38203ecd15 Added thunderx2 system in the mixeddt test scripts
Details:
 - Added thunderx2 (tx2) as a system in the runme.sh in test/mixeddt
2019-02-04 15:28:28 -05:00
Field G. Van Zee
58c7fb4788 Added more matlab scripts for mixeddt paper.
Details:
- Added a variant set of matlab scripts geared to producing plots that
  reflect performance data gathered with and without extra memory
  optimizations enabled. These scripts reside (for now) in
  test/mixeddt/matlab/wawoxmem.
2019-01-08 17:00:27 -06:00
Field G. Van Zee
6885051a16 Generalizations/cleanup to mixeddt matlab scripts.
Details:
- Parameterized, reorganized, and added comments to matlab scripts in
  test/mixeddt/matlab.
- Reordered some lines of code and added comments to plot_l3_perf.m in
  test/3m4m/matlab.
2018-12-05 14:45:39 -06:00
Field G. Van Zee
cbdb0566bf Updates to 3m4m, mixeddt test driver files.
Details:
- Updated 3m4m and mixeddt Makefiles and runme.sh scripts, mostly to
  port recent changes to the former to the latter.
- Disabled (for now) code in 3m4m/test_*.c files that disables all
  induced methods except for the one that is requested from the
  Makefile via the IND macro. This is done because usually, we want to
  test whatever method is enabled automatically for complex datatypes.
  (That is, when native complex microkernels are missing, we usually
  want to test performance of 1m.)
2018-12-05 20:06:32 +00:00
Field G. Van Zee
0645f239fb Remove UT-Austin from copyright headers' clause 3.
Details:
- Removed explicit reference to The University of Texas at Austin in the
  third clause of the license comment blocks of all relevant files and
  replaced it with a more all-encompassing "copyright holder(s)".
- Removed duplicate words ("derived") from a few kernels' license
  comment blocks.
- Homogenized license comment block in kernels/zen/3/bli_gemm_small.c
  with format of all other comment blocks.
2018-12-04 14:31:06 -06:00
Field G. Van Zee
22384fd2b7 Minor updates to test_gemm.c in test/mixeddt. 2018-12-04 13:09:04 -06:00
Field G. Van Zee
279deae18f Added 4x5 matlab plotting scripts to test/3m4m.
Details:
- Added a new directory, test/3m4m/matlab, containing matlab scripts for
  plotting 4x5 panels of performance graphs (using the subplot()
  function) for gemm, hemm, herk, trmm, and trsm across all four
  floating-point datatypes. I expect to further refine these scripts as
  time goes on, but their current state constitutes a good start.
2018-11-16 11:34:19 -06:00
Field G. Van Zee
7b5ba7319b Merge branch 'dev' of github.com:flame/blis into dev 2018-11-14 12:32:01 -06:00
Field G. Van Zee
52392932dc Minor fixes to test/3m4m drivers.
Details:
- Cleanups to Makefile to allow all test drivers to be built for
  OpenBLAS and MKL in addition to BLIS.
- Fixed copy-paste typos in test_hemm in calls to ssymm_() and dsymm_().
- Fixed incorrect types for betap in BLAS cpp macro branch of
  test_herk.c.
2018-11-13 22:23:38 +00:00
Field G. Van Zee
4f12e36a0d Fixed number of columns in first output line.
Details:
- In previous commit, forgot to remove output column corresponding to
  the k dimension.
2018-11-13 14:23:12 -06:00
Field G. Van Zee
a2e0cdd7de Added hemm test driver to test/3m4m.
Details:
- Added a new test_hemm.c test driver to test/3m4m, which was modeled
  after the driver by the similar name in test. Also updated Makefile
  so that blis-nat-[sm]t would trigger builds for the new driver.
2018-11-13 14:15:11 -06:00
Field G. Van Zee
ce719f816d More edits to mixeddt matlab scripts.
Details:
- Renamed scripts in test/mixeddt/matlab:
    plot_case_all.m -> plot_dom_all.m
    plot_case_md.m  -> plot_dom_case.m
    plot_all_md.m   -> plot_dt_all.m
- Added plot_dt_select.m in order to plot select graphs for the main
  body of the mixeddt paper, and added additional related legend
  handling in plot_gemm_perf.m.
- Added test/mixeddt/matlab/output and a .gitkeep file within in order
  to force git to recognize the directory.
2018-11-10 14:48:43 -06:00
Field G. Van Zee
bf99e7c14b Minor updates to test/mixeddt driver.
Details:
- Cleaned up test/mixeddt Makefile in preparation for gathering new
  data for mixeddt paper, including renaming implementations to
  "internal" and "ad-hoc" to match the terminology to be used in the
  paper.
- Added new matlab scripts for generating 8 figures, each covering all
  mixed-precision cases for each mixed-domain case.
- Updated the runme.sh script according to changes to Makefile.
- Fixed a minor bug in test_gemm.c that may have given incorrect
  performance in complex, homogeneous storage datatype cases where
  the computation precision was equal to the storage precisions.
  (Examples: zzzd, cccs.)
2018-11-08 18:47:17 -06:00
Field G. Van Zee
06c23954e6 Defined unified bli_pthreads_*() API for all OSes.
Details:
- Expanded the bli_pthread_*() -> pthread_*() wrappers in
  frame/thread/bli_pthread.c to include cases for Windows taken from
  frame/base/bli_pthread_wrap.c. Now, bli_thread_*() is always defined
  and always used by BLIS and the BLIS testsuite (in lieu of calling
  pthreads directly, as before). The implementation used in this new
  API depends on whether we are building for Windows, and to a lesser
  extent, whether we are building on OS X. For the core API, Windows
  uses Windows threads, non-Windows (Linux, OS X) uses pthreads.
  OS X and Windows get barriers implemented in terms of other
  bli_pthread_*() functions, and Linux gets barriers implemented in
  terms of pthread_barrier*(). This commit addresses issue #273.
- Fixed a bug in the Linux definition of bli_pthread_mutex_unlock(),
  which was erroneously calling pthread_mutex_lock().
- Minor changes to configure so that the auto-detection executable
  can be built given the above changes (most notably, turning on
  POSIX extensions via -D_GNU_SOURCE).
- Removed temporary play-test code for shiftd that accidentally got
  committed into test/3m4m/test_gemm.c.
2018-10-23 19:16:54 -05:00
Field G. Van Zee
090e4f08fc Merge branch 'master' into dev 2018-10-19 18:41:10 -05:00
Field G. Van Zee
bb6df2814f Defined a new level-1d operation: shiftd.
Details:
- Defined a new level-1d operation called 'shiftd', including object and
  typed APIs. This operation adds a scalar value to every element along
  an arbitrary diagonal of a matrix. Currently, shiftd is implemented in
  terms of the addv kernel. (The scalar is passed in as the x vector
  with an increment of zero.)
- Replaced ad-hoc usage of setd and addd (after creating a temporary
  matrix object) with use of shiftd, which is much more concise, in
  various test driver files in the testsuite. Similar changes were made
  to the standalone test drivers and the example code.
- Added documentation entries in BLISObjectAPI.md and BLISTypedAPI.md
  for bli_shiftd() and bli_?shiftd(), respectively.
- Added observed object properties to level-1d documentation in
  BLISObjectAPI.md.
2018-10-18 17:11:39 -05:00
Field G. Van Zee
49d3f9fcbb Merge branch 'master' into dev 2018-10-17 18:00:40 -05:00
Field G. Van Zee
5fec95b99f Implemented mixed-datatype support for gemm.
Details:
- Implemented support for gemm where A, B, and C may have different
  storage datatypes, as well as a computational precision (and implied
  computation domain) that may be different from the storage precision
  of either A or B. This results in 128 different combinations, all
  which are implemented within this commit. (For now, the mixed-datatype
  functionality is only supported via the object API.) If desired, the
  mixed-datatype support may be disabled at configure-time.
- Added a memory-intensive optimization to certain mixed-datatype cases
  that requires a single m-by-n matrix be allocated (temporarily) per
  call to gemm. This optimization aims to avoid the overhead involved in
  repeatedly updating C with general stride, or updating C after a
  typecast from the computation precision. This memory optimization may
  be disabled at configure-time (provided that the mixed-datatype
  support is enabled in the first place).
- Added support for testing mixed-datatype combinations to testsuite.
  The user may test gemm with mixed domains, precisions, both, or
  neither.
- Added a standalone test driver directory for building and running
  mixed-datatype performance experiments.
- Defined a new variation of castm, castnzm, which operates like castm
  except that imaginary values are not touched when casting a real
  operand to a complex operand. (By contrast, in these situations castm
  sets the imaginary components of the destination matrix to zero.)
- Defined bli_obj_imag_is_zero() and substituted calls in lieu of all
  usages of bli_obj_imag_equals() that tested against BLIS_ZERO, and
  also simplified the implementation of bli_obj_imag_equals().
- Fixed bad behavior from bli_obj_is_real() and bli_obj_is_complex()
  when given BLIS_CONSTANT objects.
- Disabled dt_on_output field in auxinfo_t structure as well as all
  accessor functions. Also commented out all usage of accessor
  functions within macrokernels. (Typecasting in the microkernel is
  still feasible, though probably unrealistic for now given the
  additional complexity required.)
- Use void function pointer type (instead of void*) for storing function
  pointers in bli_l0_fpa.c.
- Added documentation for using gemm with mixed datatypes in
  docs/MixedDatatypes.md and example code in examples/oapi/11gemm_md.c.
- Defined level-1d operation xpbyd and level-1m operation xpbym.
- Added xpbym test module to testsuite.
- Updated frame/include/bli_x86_asm_macros.h with additional macros
  (courtsey of Devin Matthews).
2018-10-15 16:37:39 -05:00
Field G. Van Zee
98e01ea04b Merge branch 'master' into amd 2018-10-04 20:44:12 -05:00
Devangi N. Parikh
8bf30eb473 Fixed runme.sh in test/studies/thunderx2
Details:
- Fixed the setting of threads for a single core run.
2018-10-03 22:22:29 -04:00
Devangi N. Parikh
f6f2456ba2 Fixed the Makefile in test/studies/thunderx2
Details:
- Fixed target for make-all-st and make-all-mt so that the armpl
  targets are built
2018-10-03 21:43:46 -04:00
Field G. Van Zee
ac18949a4b Multithreading optimizations for l3 macrokernels.
Details:
- Adjusted the method by which micropanels are assigned to threads in
  the 2nd (jr) and 1st (ir) loops around the microkernel to (mostly)
  employ contiguous "slab" partitioning rather than interleaved (round
  robin) partitioning. The new partitioning schemes and related details
  for specific families of operations are listed below:
  - gemm: slab partitioning.
  - herk: slab partitioning for region corresponding to non-triangular
          region of C; round robin partitioning for triangular region.
  - trmm: slab partitioning for region corresponding to non-triangular
          region of B; round robin partitioning for triangular region.
          (NOTE: This affects both left- and right-side macrokernels:
          trmm_ll, trmm_lu, trmm_rl, trmm_ru.)
  - trsm: slab partitioning.
          (NOTE: This only affects only left-side macrokernels trsm_ll,
          trsm_lu; right-side macrokernels were not touched.)
  Also note that the previous macrokernels were preserved inside of
  the 'other' directory of each operation family directory (e.g.
  frame/3/gemm/other, frame/3/herk/other, etc).
- Updated gemm macrokernel in sandbox/ref99 in light of above changes
  and fixed a stale function pointer type in blx_gemm_int.c
  (gemm_voft -> gemm_var_oft).
- Added standalone test drivers in test/3m4m for herk, trmm, and trsm
  and minor changes to test/3m4m/Makefile.
- Updated the arguments and definitions of bli_*_get_next_[ab]_upanel()
  and bli_trmm_?_?r_my_iter() macros defined in bli_l3_thrinfo.h.
- Renamed bli_thread_get_range*() APIs to bli_thread_range*().
2018-09-30 18:54:56 -05:00
praveeng
86330953b1 Resolved conflicts and modified bli_trsm_small.c
Change-Id: I578d419cff658003e0fdd4c4cdc93145d951ce31
2018-09-28 10:08:06 +05:30
Devangi N. Parikh
02adab427c Created a 'thunderx2' subdirectory within test/studies
Details:
- Created a 'thunderx2' subdirectory within test/studies to house
  various level-3 test driver used to measure performance on
  ThunderX2.
2018-09-20 14:38:50 -04:00
Devangi N. Parikh
dad07245db Fixed yet another bug in runme script in test/studies
Details:
- Fixed another copy-paste bug
2018-09-12 04:16:58 -05:00
Devangi N. Parikh
e669057fe3 Fixed bug in runme script in test/studies
Details:
- Fixed bug in runme script for skx studies that set the number of
  threads incorrectly
2018-09-11 22:29:42 -05:00
Devangi N. Parikh
232fdc3df3 Updated runme script in test/studies.
Details:
- Updated runme script for skx studies to run multithreading tests
  on 1 and 2 sockets.
2018-09-10 18:45:50 -05:00
Field G. Van Zee
4b5437ec7a Define a cpp macro specific to BLIS compilation.
Details:
- Tweaked the cflags functions in common.mk so that a new preprocessor
  macro, BLIS_IS_BUILDING_LIBRARY, is defined, but only when BLIS
  itself is being built. This macro will not be defined when, for
  example, the testsuite or example code compiles code local to those
  applications. This was done in part by defining a new cflags function
  get-user-cflags-for(), which is now the designated function for
  application Makefiles if they wish to inherit a basic set of CFLAGS
  from BLIS. (The compiler flags returned are identical to that of
  get-frame-cflags-for() except that -DBLIS_IS_BUILDING_LIBRARY is
  omitted.)
- Updated all test driver-like makefiles to call get-user-cflags-for()
  instead of get-frame-cflags-for().
2018-09-07 17:24:32 -05:00