102 Commits

Author SHA1 Message Date
Field G. Van Zee
6885051a16 Generalizations/cleanup to mixeddt matlab scripts.
Details:
- Parameterized, reorganized, and added comments to matlab scripts in
  test/mixeddt/matlab.
- Reordered some lines of code and added comments to plot_l3_perf.m in
  test/3m4m/matlab.
2018-12-05 14:45:39 -06:00
Field G. Van Zee
cbdb0566bf Updates to 3m4m, mixeddt test driver files.
Details:
- Updated 3m4m and mixeddt Makefiles and runme.sh scripts, mostly to
  port recent changes to the former to the latter.
- Disabled (for now) code in 3m4m/test_*.c files that disables all
  induced methods except for the one that is requested from the
  Makefile via the IND macro. This is done because usually, we want to
  test whatever method is enabled automatically for complex datatypes.
  (That is, when native complex microkernels are missing, we usually
  want to test performance of 1m.)
2018-12-05 20:06:32 +00:00
Field G. Van Zee
0645f239fb Remove UT-Austin from copyright headers' clause 3.
Details:
- Removed explicit reference to The University of Texas at Austin in the
  third clause of the license comment blocks of all relevant files and
  replaced it with a more all-encompassing "copyright holder(s)".
- Removed duplicate words ("derived") from a few kernels' license
  comment blocks.
- Homogenized license comment block in kernels/zen/3/bli_gemm_small.c
  with format of all other comment blocks.
2018-12-04 14:31:06 -06:00
Field G. Van Zee
22384fd2b7 Minor updates to test_gemm.c in test/mixeddt. 2018-12-04 13:09:04 -06:00
Field G. Van Zee
279deae18f Added 4x5 matlab plotting scripts to test/3m4m.
Details:
- Added a new directory, test/3m4m/matlab, containing matlab scripts for
  plotting 4x5 panels of performance graphs (using the subplot()
  function) for gemm, hemm, herk, trmm, and trsm across all four
  floating-point datatypes. I expect to further refine these scripts as
  time goes on, but their current state constitutes a good start.
2018-11-16 11:34:19 -06:00
Field G. Van Zee
7b5ba7319b Merge branch 'dev' of github.com:flame/blis into dev 2018-11-14 12:32:01 -06:00
Field G. Van Zee
52392932dc Minor fixes to test/3m4m drivers.
Details:
- Cleanups to Makefile to allow all test drivers to be built for
  OpenBLAS and MKL in addition to BLIS.
- Fixed copy-paste typos in test_hemm in calls to ssymm_() and dsymm_().
- Fixed incorrect types for betap in BLAS cpp macro branch of
  test_herk.c.
2018-11-13 22:23:38 +00:00
Field G. Van Zee
4f12e36a0d Fixed number of columns in first output line.
Details:
- In previous commit, forgot to remove output column corresponding to
  the k dimension.
2018-11-13 14:23:12 -06:00
Field G. Van Zee
a2e0cdd7de Added hemm test driver to test/3m4m.
Details:
- Added a new test_hemm.c test driver to test/3m4m, which was modeled
  after the driver by the similar name in test. Also updated Makefile
  so that blis-nat-[sm]t would trigger builds for the new driver.
2018-11-13 14:15:11 -06:00
Field G. Van Zee
ce719f816d More edits to mixeddt matlab scripts.
Details:
- Renamed scripts in test/mixeddt/matlab:
    plot_case_all.m -> plot_dom_all.m
    plot_case_md.m  -> plot_dom_case.m
    plot_all_md.m   -> plot_dt_all.m
- Added plot_dt_select.m in order to plot select graphs for the main
  body of the mixeddt paper, and added additional related legend
  handling in plot_gemm_perf.m.
- Added test/mixeddt/matlab/output and a .gitkeep file within in order
  to force git to recognize the directory.
2018-11-10 14:48:43 -06:00
Field G. Van Zee
bf99e7c14b Minor updates to test/mixeddt driver.
Details:
- Cleaned up test/mixeddt Makefile in preparation for gathering new
  data for mixeddt paper, including renaming implementations to
  "internal" and "ad-hoc" to match the terminology to be used in the
  paper.
- Added new matlab scripts for generating 8 figures, each covering all
  mixed-precision cases for each mixed-domain case.
- Updated the runme.sh script according to changes to Makefile.
- Fixed a minor bug in test_gemm.c that may have given incorrect
  performance in complex, homogeneous storage datatype cases where
  the computation precision was equal to the storage precisions.
  (Examples: zzzd, cccs.)
2018-11-08 18:47:17 -06:00
Field G. Van Zee
06c23954e6 Defined unified bli_pthreads_*() API for all OSes.
Details:
- Expanded the bli_pthread_*() -> pthread_*() wrappers in
  frame/thread/bli_pthread.c to include cases for Windows taken from
  frame/base/bli_pthread_wrap.c. Now, bli_thread_*() is always defined
  and always used by BLIS and the BLIS testsuite (in lieu of calling
  pthreads directly, as before). The implementation used in this new
  API depends on whether we are building for Windows, and to a lesser
  extent, whether we are building on OS X. For the core API, Windows
  uses Windows threads, non-Windows (Linux, OS X) uses pthreads.
  OS X and Windows get barriers implemented in terms of other
  bli_pthread_*() functions, and Linux gets barriers implemented in
  terms of pthread_barrier*(). This commit addresses issue #273.
- Fixed a bug in the Linux definition of bli_pthread_mutex_unlock(),
  which was erroneously calling pthread_mutex_lock().
- Minor changes to configure so that the auto-detection executable
  can be built given the above changes (most notably, turning on
  POSIX extensions via -D_GNU_SOURCE).
- Removed temporary play-test code for shiftd that accidentally got
  committed into test/3m4m/test_gemm.c.
2018-10-23 19:16:54 -05:00
Field G. Van Zee
090e4f08fc Merge branch 'master' into dev 2018-10-19 18:41:10 -05:00
Field G. Van Zee
bb6df2814f Defined a new level-1d operation: shiftd.
Details:
- Defined a new level-1d operation called 'shiftd', including object and
  typed APIs. This operation adds a scalar value to every element along
  an arbitrary diagonal of a matrix. Currently, shiftd is implemented in
  terms of the addv kernel. (The scalar is passed in as the x vector
  with an increment of zero.)
- Replaced ad-hoc usage of setd and addd (after creating a temporary
  matrix object) with use of shiftd, which is much more concise, in
  various test driver files in the testsuite. Similar changes were made
  to the standalone test drivers and the example code.
- Added documentation entries in BLISObjectAPI.md and BLISTypedAPI.md
  for bli_shiftd() and bli_?shiftd(), respectively.
- Added observed object properties to level-1d documentation in
  BLISObjectAPI.md.
2018-10-18 17:11:39 -05:00
Field G. Van Zee
49d3f9fcbb Merge branch 'master' into dev 2018-10-17 18:00:40 -05:00
Field G. Van Zee
5fec95b99f Implemented mixed-datatype support for gemm.
Details:
- Implemented support for gemm where A, B, and C may have different
  storage datatypes, as well as a computational precision (and implied
  computation domain) that may be different from the storage precision
  of either A or B. This results in 128 different combinations, all
  which are implemented within this commit. (For now, the mixed-datatype
  functionality is only supported via the object API.) If desired, the
  mixed-datatype support may be disabled at configure-time.
- Added a memory-intensive optimization to certain mixed-datatype cases
  that requires a single m-by-n matrix be allocated (temporarily) per
  call to gemm. This optimization aims to avoid the overhead involved in
  repeatedly updating C with general stride, or updating C after a
  typecast from the computation precision. This memory optimization may
  be disabled at configure-time (provided that the mixed-datatype
  support is enabled in the first place).
- Added support for testing mixed-datatype combinations to testsuite.
  The user may test gemm with mixed domains, precisions, both, or
  neither.
- Added a standalone test driver directory for building and running
  mixed-datatype performance experiments.
- Defined a new variation of castm, castnzm, which operates like castm
  except that imaginary values are not touched when casting a real
  operand to a complex operand. (By contrast, in these situations castm
  sets the imaginary components of the destination matrix to zero.)
- Defined bli_obj_imag_is_zero() and substituted calls in lieu of all
  usages of bli_obj_imag_equals() that tested against BLIS_ZERO, and
  also simplified the implementation of bli_obj_imag_equals().
- Fixed bad behavior from bli_obj_is_real() and bli_obj_is_complex()
  when given BLIS_CONSTANT objects.
- Disabled dt_on_output field in auxinfo_t structure as well as all
  accessor functions. Also commented out all usage of accessor
  functions within macrokernels. (Typecasting in the microkernel is
  still feasible, though probably unrealistic for now given the
  additional complexity required.)
- Use void function pointer type (instead of void*) for storing function
  pointers in bli_l0_fpa.c.
- Added documentation for using gemm with mixed datatypes in
  docs/MixedDatatypes.md and example code in examples/oapi/11gemm_md.c.
- Defined level-1d operation xpbyd and level-1m operation xpbym.
- Added xpbym test module to testsuite.
- Updated frame/include/bli_x86_asm_macros.h with additional macros
  (courtsey of Devin Matthews).
2018-10-15 16:37:39 -05:00
Field G. Van Zee
98e01ea04b Merge branch 'master' into amd 2018-10-04 20:44:12 -05:00
Devangi N. Parikh
8bf30eb473 Fixed runme.sh in test/studies/thunderx2
Details:
- Fixed the setting of threads for a single core run.
2018-10-03 22:22:29 -04:00
Devangi N. Parikh
f6f2456ba2 Fixed the Makefile in test/studies/thunderx2
Details:
- Fixed target for make-all-st and make-all-mt so that the armpl
  targets are built
2018-10-03 21:43:46 -04:00
Field G. Van Zee
ac18949a4b Multithreading optimizations for l3 macrokernels.
Details:
- Adjusted the method by which micropanels are assigned to threads in
  the 2nd (jr) and 1st (ir) loops around the microkernel to (mostly)
  employ contiguous "slab" partitioning rather than interleaved (round
  robin) partitioning. The new partitioning schemes and related details
  for specific families of operations are listed below:
  - gemm: slab partitioning.
  - herk: slab partitioning for region corresponding to non-triangular
          region of C; round robin partitioning for triangular region.
  - trmm: slab partitioning for region corresponding to non-triangular
          region of B; round robin partitioning for triangular region.
          (NOTE: This affects both left- and right-side macrokernels:
          trmm_ll, trmm_lu, trmm_rl, trmm_ru.)
  - trsm: slab partitioning.
          (NOTE: This only affects only left-side macrokernels trsm_ll,
          trsm_lu; right-side macrokernels were not touched.)
  Also note that the previous macrokernels were preserved inside of
  the 'other' directory of each operation family directory (e.g.
  frame/3/gemm/other, frame/3/herk/other, etc).
- Updated gemm macrokernel in sandbox/ref99 in light of above changes
  and fixed a stale function pointer type in blx_gemm_int.c
  (gemm_voft -> gemm_var_oft).
- Added standalone test drivers in test/3m4m for herk, trmm, and trsm
  and minor changes to test/3m4m/Makefile.
- Updated the arguments and definitions of bli_*_get_next_[ab]_upanel()
  and bli_trmm_?_?r_my_iter() macros defined in bli_l3_thrinfo.h.
- Renamed bli_thread_get_range*() APIs to bli_thread_range*().
2018-09-30 18:54:56 -05:00
Devangi N. Parikh
02adab427c Created a 'thunderx2' subdirectory within test/studies
Details:
- Created a 'thunderx2' subdirectory within test/studies to house
  various level-3 test driver used to measure performance on
  ThunderX2.
2018-09-20 14:38:50 -04:00
Devangi N. Parikh
dad07245db Fixed yet another bug in runme script in test/studies
Details:
- Fixed another copy-paste bug
2018-09-12 04:16:58 -05:00
Devangi N. Parikh
e669057fe3 Fixed bug in runme script in test/studies
Details:
- Fixed bug in runme script for skx studies that set the number of
  threads incorrectly
2018-09-11 22:29:42 -05:00
Devangi N. Parikh
232fdc3df3 Updated runme script in test/studies.
Details:
- Updated runme script for skx studies to run multithreading tests
  on 1 and 2 sockets.
2018-09-10 18:45:50 -05:00
Field G. Van Zee
4b5437ec7a Define a cpp macro specific to BLIS compilation.
Details:
- Tweaked the cflags functions in common.mk so that a new preprocessor
  macro, BLIS_IS_BUILDING_LIBRARY, is defined, but only when BLIS
  itself is being built. This macro will not be defined when, for
  example, the testsuite or example code compiles code local to those
  applications. This was done in part by defining a new cflags function
  get-user-cflags-for(), which is now the designated function for
  application Makefiles if they wish to inherit a basic set of CFLAGS
  from BLIS. (The compiler flags returned are identical to that of
  get-frame-cflags-for() except that -DBLIS_IS_BUILDING_LIBRARY is
  omitted.)
- Updated all test driver-like makefiles to call get-user-cflags-for()
  instead of get-frame-cflags-for().
2018-09-07 17:24:32 -05:00
Field G. Van Zee
4fa4cb0734 Trivial comment header updates.
Details:
- Removed four trailing spaces after "BLIS" that occurs in most files'
  commented-out license headers.
- Added UT copyright lines to some files. (These files previously had
  only AMD copyright lines but were contributed to by both UT and AMD.)
- In some files' copyright lines, expanded 'The University of Texas' to
  'The University of Texas at Austin'.
- Fixed various typos/misspellings in some license headers.
2018-08-29 18:06:41 -05:00
Field G. Van Zee
0f491e994a Allow lesser Makefiles to reference installed BLIS.
Details:
- Updated the build system so that "lesser" Makefiles, such as those in
  belonging to example code or the testsuite, may be run even if the
  directory is orphaned from the original build tree. This allows a
  user to configure, compile, and install BLIS, delete the build tree
  (that is, the source distribution, or the build directory for out-
  of-tree builds) and then compile example or testsuite code and link
  against the installed copy of BLIS (provided the example or testsuite
  directory was preserved or obtained from another source). The only
  requirement is that make be invoked while setting the
  BLIS_INSTALL_PATH variable to the same installation prefix used when
  BLIS was configured. The easiest syntax is:

    make BLIS_INSTALL_PATH=/install/prefix

  though it's also permissible to set BLIS_INSTALL_PATH as an
  environment variable prior to running 'make'.
- Updated all lesser Makefiles to implement the new aforementioned build
  behavior.
- Relocated check-blastest.sh and check-blistest.sh from build to
  blastest and testsuite, respectively, so that if those directories are
  copied elsewhere the user can still run 'make check' locally.
- Updated docs/Testsuite.md with language that mentions this new option
  of building/linking against an installed copy of BLIS.
2018-08-25 20:12:36 -05:00
Devangi N. Parikh
0bbe69d5ed Updated plotting scripts in test/studies.
Details:
- Fixed indexing on plots to correspond to the removal of dtime in
  the test drivers.
2018-08-14 14:49:58 -05:00
Field G. Van Zee
addce08966 Format spec and other updates in test, test/3m4m.
Details:
- Removed the dtime (delta time, or wallclock time) column from the
  matlab output of all test drivers in test, test/3m4m, test/studies.
  This value was rarely (if ever) really needed and usually only served
  to take up screen space.
- Updated format specifier in test/studies/skx to use %7.2f instead of
  %6.3f.
- For the test drivers in 'test' directory, added an initial line of
  output that sets last entry of matlab matrix to zero in order to
  induce a pre-allocation of the entire array of performance results.
2018-08-06 13:18:20 -05:00
Field G. Van Zee
94d5ef42c8 Adjusted gflops format spec in testsuite, test/3m4m.
Details:
- Changed the format specifier for the gflops column in the testsuite
  output from %7.3f to %7.2f. This was done mainly to keep the output
  aligned properly when the expected perfomance exceeded 1000 gflops.
  Also, two decimal places still conveys plenty of precision for all
  practical applications, including just eyeballing performance deltas
  between two executions (let alone two implementations).
- Changed the format specifier for gflops in the test/3m4m drivers
  from %6.3f to %7.2f (for the same reasons listed above).
2018-08-04 15:57:17 -05:00
Devangi N. Parikh
323eaaab99 Removed left over code from plotting scripts. 2018-07-13 11:40:06 -05:00
Devangi N. Parikh
73b0b2a3ac Created hardware-specific test driver directory.
Details:
- Created a 'studies' subdirectory within 'test' to be used to house
   test drivers, makefiles, run scripts, matlab plot code, and related
   files that have been customized for collecting performance data on
   specific host machines or product lines. This new setup will help us
   catalog, track, and share test driver materials over time, and in a
   way that facilitates reproducibility.
- Created an 'skx' subdirectory within 'test/studies' to house various
   level-3 test driver files used to measure performance on SkylakeX
   nodes (specifically, those nodes used by TACC's stampede2 system).
2018-07-12 16:58:41 -05:00
Field G. Van Zee
8749fa0b48 Cleanups to ref99/README.md, test/3m4m/Makefile.
Details:
- Minor edits to sandbox/ref99/README.md.
- Removed cpp guards in sandbox/ref99/thread/blx_gemm_thread.h to be
  consistent with other headers in sandbox/ref99.
- Additional targets and related cleanups in test/3m4m/Makefile.
2018-05-31 12:34:01 -05:00
Field G. Van Zee
4b36e85be9 Converted function-like macros to static functions.
Details:
- Converted most C preprocessor macros in bli_param_macro_defs.h and
  bli_obj_macro_defs.h to static functions.
- Reshuffled some functions/macros to bli_misc_macro_defs.h and also
  between bli_param_macro_defs.h and bli_obj_macro_defs.h.
- Changed obj_t-initializing macros in bli_type_defs.h to static
  functions.
- Removed some old references to BLIS_TWO and BLIS_MINUS_TWO from
  bli_constants.h.
- Whitespace changes in select files (four spaces to single tab).
2018-05-08 14:26:30 -05:00
Field G. Van Zee
60366a3fab Updates to knl kernels and related code.
Details:
- Imported the 24x16 knl sgemm microkernel (and its corresonding spackm
  kernel) from TBLIS and enabled its use in the knl sub-config. Also
  Added sgemm microkernel prototype to bli_kernels_knl.h.
- Updated dgemm and dpackm microkernels from TBLIS, which included an
  important change regarding the offsets array (changed from extern
  declaration to static declaration/definition).
- Activated use of level-1v and -1f zen kernels in skx and knl
  sub-configs.
- Removed some old macros no longer needed in bli_family_skx.h now that
  libmemkind support exists in configure.
- Moved bli_avx512_macros.h to frame/include and adjusted #includes in
  skx and knl kernels accordingly.
- Moved unused kernels in kernels/knl/3 to kernels/knl/3/other
  directory.
- Fixed a minor bug in the 'make' output per compile when verboseness
  is not turned on. The rule-generating function 'make-kernel-rule' was
  previously passing in the name of the config, rather than the name of
  the kernel set returned by get-config-for-kset, which could give
  misleading information to the user when the kconfig_map mapped a
  kernel set to a sub-configuration that did not share the same name.
  (This didn't affect the CFLAGS that were actually used.)
- Updated test/3m4m/Makefile, removing acml targets and renaming the
  remaining targets.
2018-04-16 18:46:21 -05:00
Field G. Van Zee
2b7108a8ef Minor updates to test driver makefiles.
Details:
- Cleaned up and homogenized the various test driver Makefiles in
  testsuite and test directories.
- Very minor updates to test driver code.
2018-04-16 12:35:53 -05:00
Field G. Van Zee
7dc40eafdd Updates to top-level and test driver Makefiles.
Details:
- Added logic to common.mk that will choose a BLIS library against which
  to link (LIBBLIS_LINK). The default choice is the static (.a) library;
  the shared (.so) library is chosen only if the shared library build was
  enabled and the static one was disabled.
- Updated the various test driver Makefiles to reference this common,
  pre-chosen library against which to link. (Previously, these drivers
  unconditionally linked against the static library and would have
  failed if the static library build was disabled at configure-time.)
- Renamed many of the variables in common.mk and the top-level Makefile
  so that variables relating to the libblis.[a|so] files, including
  paths to those files, begin with "LIBBLIS".
- Shuffled around some of the library definitions from the top-level
  Makefile to common.mk.
- Renamed BLIS_ENABLE_DYNAMIC_BUILD to BLIS_ENABLE_SHARED_BUILD, and
  the @enable_dynamic@ anchor to @enable_shared@ in build/config.mk.in
  and in configure.
- A few other cleanups in the top-level Makefile.
2018-03-21 18:39:16 -05:00
Field G. Van Zee
16813335bd Merge branch 'amd' into rt
Details:
- Merged contributions made by AMD via 'amd' branch (see summary below).
  Special thanks to AMD for their contributions to-date, especially with
  regard to intrinsic- and assembly-based kernels.
- Added column storage output cases to microkernels in
  bli_gemm_zen_asm_d6x8.c and bli_gemmtrsm_l_zen_asm_d6x8.c. Even with
  the extra cost of transposing the microtile in registers, this is
  much faster than using the general storage case when the underlying
  matrix is column-stored.
- Added s and d assembly-based zen gemmtrsm_u microkernel (including
  column storage optimization mentioned above).
- Updated zen sub-configuration to reflect presence of new native
  kernels.
- Temporarily reverted zen sub-configuration's level-3 cache blocksizes
  to smaller haswell values.
- Temporarily disabled small matrix handling for zen configuration
  family in config/zen/bli_family_zen.h.
- Updated zen CFLAGS according to changes in 1e4365b.
- Updated haswell microkernels such that:
  - only one vzeroupper instruction is called prior to returning
  - movapd/movupd are used in leiu of movaps/movups for double-real
    microkernels. (Note that single-real microkernels still use
    movaps/movups.)
- Added kernel prototypes to kernels/zen/bli_kernels_zen.h, which is
  now included via frame/include/bli_arch_config.h.
- Minor updates to bli_amaxv_ref.c (and to inlined "test" implementation
  in testsuite/src/test_amaxv.c).
- Added early return for alpha == 0 in bli_dotxv_ref.c.
- Integrated changes from f07b176, including a fix for undefined
  behavior when executing the 1m method under certain conditions.
- Updated config_registry; no longer need haswell kernels for zen
  sub-configuration.
- Tweaked marginal and pass thresholds for dotxf.
- Reformatted level-1v, -1f, and -3 amd kernels and inserted additional
  comments.
- Updated LICENSE file to explicitly mention that parts are copyright
  UT-Austin and AMD.
- Added AMD copyright to header templates in build/templates.

Summary of previous changes from 'amd' branch.
- Added s and d assembly-based zen gemm microkernels (d6x8 and d8x6) and
  s and d assembly-based zen gemmtrsm_l microkernels (d6x8).
- Added s and d intrinsics-based zen kernels for amaxv, axpyv, dotv, dotxv,
  and scalv, with extra-unrolling variants for axpyv and scalv.
- Added a small matrix handler to bli_gemm_front(), with the handler
  implemented in kernels/zen/3/bli_gemm_small_matrix.c.
- Added additional logic to sumsqv that first attempts to compute the
  sum of the squares via dotv(). If there is a floating-point exception
  (FE_OVERFLOW), then the previous (numerically conservative) code is
  used; otherwise, the result of dotv() is square-rooted and stored as
  the result. This new implementation is only enabled when FE_OVERFLOW
  is #defined. If the macro is not #defined, then the previous
  implementation is used.
- Added axpyv and dotv standalone test drivers to test directory.
- Added zen support to old cpuid_x86.c driver in build/auto-detect/old.
- Added thread-local and __attribute__-related macros to bli_macro_defs.h.
2018-02-21 17:43:32 -06:00
Field G. Van Zee
86cd23b737 Fixed testsuite Makefile brokenness from 9091a207.
Details:
- Fixed a makefile error encountered when building the testsuite directly
  in its directory (as opposed to indirectly via 'make test'). The fix
  involves introducing a new variable, BUILD_PATH, alongside the existing
  DIST_PATH variable. By default, BUILD_PATH is set to the current
  directory, and is overridden by other Makefiles used by, for example,
  the testsuite and standalone test drivers in testsuite or test,
  respectively.
- Some files/directories in common.mk were redefined in terms of
  BUILD_DIR, such as the locations of config.mk file and the intermediate
  include directory.
2017-12-14 15:47:41 -06:00
Field G. Van Zee
70640a3710 Implemented library self-initialization.
Details:
- Defined two new functions in bli_init.c: bli_init_once() and
  bli_finalize_once(). Each is implemented with pthread_once(), which
  guarantees that, among the threads that pass in the same pthread_once_t
  data structure, exactly one thread will execute a user-defined function.
  (Thus, there is now a runtime dependency against libpthread even when
  multithreading is not enabled at configure-time.)
- Added calls to bli_init_once() to top-level user APIs for all
  computational operations as well as many other functions in BLIS to
  all but guarantee that BLIS will self-initialize through the normal
  use of its functions.
- Rewrote and simplified bli_init() and bli_finalize() and related
  functions.
- Added -lpthread to LDFLAGS in common.mk.
- Modified the bli_init_auto()/_finalize_auto() functions used by the
  BLAS compatibility layer to take and return no arguments. (The
  previous API that tracked whether BLIS was initialized, and then
  only finalized if it was initialized in the same function, was too
  cute by half and borderline useless because by default BLIS stays
  initialized when auto-initialized via the compatibility layer.)
- Removed static variables that track initialization of the sub-APIs in
  bli_const.c, bli_error.c, bli_init.c, bli_memsys.c, bli_thread, and
  bli_ind.c. We don't need to track initialization at the sub-API level,
  especially now that BLIS can self-initialize.
- Added a critical section around the changing of the error checking
  level in bli_error.c.
- Deprecated bli_ind_oper_has_avail() as well as all functions
  bli_<opname>_ind_get_avail(), where <opname> is a level-3 operation
  name. These functions had no use cases within BLIS and likely none
  outside of BLIS.
- Commented out calls to bli_init() and bli_finalize() in testsuite's
  main() function, and likewise for standalone test drivers in 'test'
  directory, so that self-initialization is exercised by default.
2017-12-11 17:18:43 -06:00
Field G. Van Zee
3c269f700d Makefile updates for test drivers, testsuite.
Details:
- Fixed semi-broken testsuite Makefile and very-broken test driver Makefiles,
  as well as those for test/3m4m, test/thread_ranges, and test/exec_sizes
  sub-directories.
- Factored out much of the top-level Makefile into common.mk. A Makefile
  needs only set DIST_PATH to the relative path to the top level of the
  BLIS source distribution before including common.mk in order to acquire
  all of the definitions typically needed in a Makefile that tests BLIS.
2017-10-20 13:57:21 -05:00
Nisanth M P
57e1e5cd51 Merge AMD authored changes 2017-08-22 17:07:44 +05:30
Field G. Van Zee
6e04f9df01 Restored deleted lines from makefile fragments. 2017-05-17 13:03:52 -05:00
Devin Matthews
555ddc30d4 Remove shebangs from makefiles. 2017-05-17 12:27:14 -05:00
Field G. Van Zee
69b4846ae9 Disabled experiment-related 1m code.
Details:
- Commented out code in frame/ind/oapi/bli_l3_3m4m1m_oapi.c that was
  specifically inserted to facilitate the benchmarking of 1m block-panel
  and panel-block algorithms.
- Updates to test/3m4m/Makefile, runme.sh script, and test_gemm.c to
  reflect changes used/needed during benchmarking.
2017-02-21 15:33:39 -06:00
Field G. Van Zee
126482a3b6 Implemented the 1m method.
Details:
- Implemented the 1m method for inducing complex domain matrix
  multiplication. 1m support has been added to all level-3 operations,
  including trsm, and is now the default induced method when native
  complex domain gemm microkernels are omitted from the configuration.
- Updated _cntx_init() operations to take a datatype parameter. This was
  needed for the corresponding function for 1m (because 1m requires us
  to choose between column-oriented or row-oriented execution, which
  requires us to query the context for the storage preference of the
  gemm microkernel, which requires knowing the datatype) but I decided
  that it made sense for consistency to add the parameter to all other
  cntx initialization functions as well, even though those functions
  don't use the parameter.
- Updated bli_cntx_set_blkszs() and bli_gks_cntx_set_blkszs() to take
  a second scalar for each blocksize entry. The semantic meaning of the
  two scalars now is that the first will scale the default blocksize
  while the second will scale the maximum blocksize. This allows scaling
  the two independently, and was needed to support 1m, which requires
  scaling for a register blocksize but not the register storage
  blocksize (ie: "packdim") analogue.
- Deprecated bli_blksz_reduce_dt_to() and defined two new functions,
  bli_blksz_reduce_def_to() and bli_blksz_reduce_max_to(), for reducing
  default and maximum blocksizes to some desired blocksize multiple.
  These functions are needed in the updated definitions of
  bli_cntx_set_blkszs() and bli_gks_cntx_set_blkszs().
- Added support for the 1e and 1r packing schemas to packm, including
  1e/1r packing kernels.
- Added a minor optimization to bli_gemm_ker_var2() that allows, under
  certain circumstances (specifically, real domain beta and row- or
  column-stored matrix C), the real domain macrokernel and microkernel
  to be called directly, rather than using the virtual microkernel
  via the complex domain macrokernel, which carries a slight additional
  amount of overhead.
- Added 1m support to the testsuite.
- Added 1m support to Makefile and runme.sh in test/3m4m. Also simplified
  some code in test_gemm.c driver.
2016-11-25 18:29:49 -06:00
Kiran Varaganti
a58dd35ed7 Implemented trsm single precision for lower triangular matrices, files added bli_trsm_l_int_6x16.cfiles modified bli_kernel.h to enable optimized trsm microkernel and test_trsm.c is modified to test trsm single precision
Change-Id: Ibddf989f4aad577e89558673e1038cf6ece654d9
2016-08-26 14:55:12 +05:30
praveeng
cdfb3c3f29 Merge master code as on 2016_07_29 to amd-staging branch by praveeng
Change-Id: Ic78b84d8b8d10158fb2a612f9a64bbc7b1f9b486
2016-07-29 12:46:21 +05:30
Field G. Van Zee
c31b1e7b9d Relax alignment restrictions for sandybridge ukrs.
Details:
- Relaxed the base pointer and leading dimension alignment restrictions
  in the sandybridge gemm microkernels, allowing the use of vmovups/vmovupd
  instead of vmovaps/vmovapd. These change mimic those made to the haswell
  microkernels in e0d2fa0 and ee2c139.
- Updated testsuite modules as well as standalone test drivers in 'test'
  directory to use DBL_MAX as the initial time candidate. Thanks to Devin
  Matthews for suggesting this change.
- Inserted #include "float.h" into bli_system.h (to gain access to DBL_MAX).
- Minor update (vis-a-vis contexts) to driver code in test/3m4m.
2016-07-27 15:58:07 -05:00
praveeng
1aa77dfc1d Merge master code as on 2016_07_21 to amd-staging branch by praveeng
Change-Id: Ic7d0a21101358f08147736e7f1884e7409937344
2016-07-21 14:23:41 +05:30