Commit Graph

21 Commits

Author SHA1 Message Date
Field G. Van Zee
86cd23b737 Fixed testsuite Makefile brokenness from 9091a207.
Details:
- Fixed a makefile error encountered when building the testsuite directly
  in its directory (as opposed to indirectly via 'make test'). The fix
  involves introducing a new variable, BUILD_PATH, alongside the existing
  DIST_PATH variable. By default, BUILD_PATH is set to the current
  directory, and is overridden by other Makefiles used by, for example,
  the testsuite and standalone test drivers in testsuite or test,
  respectively.
- Some files/directories in common.mk were redefined in terms of
  BUILD_DIR, such as the locations of config.mk file and the intermediate
  include directory.
2017-12-14 15:47:41 -06:00
Field G. Van Zee
70640a3710 Implemented library self-initialization.
Details:
- Defined two new functions in bli_init.c: bli_init_once() and
  bli_finalize_once(). Each is implemented with pthread_once(), which
  guarantees that, among the threads that pass in the same pthread_once_t
  data structure, exactly one thread will execute a user-defined function.
  (Thus, there is now a runtime dependency against libpthread even when
  multithreading is not enabled at configure-time.)
- Added calls to bli_init_once() to top-level user APIs for all
  computational operations as well as many other functions in BLIS to
  all but guarantee that BLIS will self-initialize through the normal
  use of its functions.
- Rewrote and simplified bli_init() and bli_finalize() and related
  functions.
- Added -lpthread to LDFLAGS in common.mk.
- Modified the bli_init_auto()/_finalize_auto() functions used by the
  BLAS compatibility layer to take and return no arguments. (The
  previous API that tracked whether BLIS was initialized, and then
  only finalized if it was initialized in the same function, was too
  cute by half and borderline useless because by default BLIS stays
  initialized when auto-initialized via the compatibility layer.)
- Removed static variables that track initialization of the sub-APIs in
  bli_const.c, bli_error.c, bli_init.c, bli_memsys.c, bli_thread, and
  bli_ind.c. We don't need to track initialization at the sub-API level,
  especially now that BLIS can self-initialize.
- Added a critical section around the changing of the error checking
  level in bli_error.c.
- Deprecated bli_ind_oper_has_avail() as well as all functions
  bli_<opname>_ind_get_avail(), where <opname> is a level-3 operation
  name. These functions had no use cases within BLIS and likely none
  outside of BLIS.
- Commented out calls to bli_init() and bli_finalize() in testsuite's
  main() function, and likewise for standalone test drivers in 'test'
  directory, so that self-initialization is exercised by default.
2017-12-11 17:18:43 -06:00
Field G. Van Zee
3c269f700d Makefile updates for test drivers, testsuite.
Details:
- Fixed semi-broken testsuite Makefile and very-broken test driver Makefiles,
  as well as those for test/3m4m, test/thread_ranges, and test/exec_sizes
  sub-directories.
- Factored out much of the top-level Makefile into common.mk. A Makefile
  needs only set DIST_PATH to the relative path to the top level of the
  BLIS source distribution before including common.mk in order to acquire
  all of the definitions typically needed in a Makefile that tests BLIS.
2017-10-20 13:57:21 -05:00
Field G. Van Zee
69b4846ae9 Disabled experiment-related 1m code.
Details:
- Commented out code in frame/ind/oapi/bli_l3_3m4m1m_oapi.c that was
  specifically inserted to facilitate the benchmarking of 1m block-panel
  and panel-block algorithms.
- Updates to test/3m4m/Makefile, runme.sh script, and test_gemm.c to
  reflect changes used/needed during benchmarking.
2017-02-21 15:33:39 -06:00
Field G. Van Zee
126482a3b6 Implemented the 1m method.
Details:
- Implemented the 1m method for inducing complex domain matrix
  multiplication. 1m support has been added to all level-3 operations,
  including trsm, and is now the default induced method when native
  complex domain gemm microkernels are omitted from the configuration.
- Updated _cntx_init() operations to take a datatype parameter. This was
  needed for the corresponding function for 1m (because 1m requires us
  to choose between column-oriented or row-oriented execution, which
  requires us to query the context for the storage preference of the
  gemm microkernel, which requires knowing the datatype) but I decided
  that it made sense for consistency to add the parameter to all other
  cntx initialization functions as well, even though those functions
  don't use the parameter.
- Updated bli_cntx_set_blkszs() and bli_gks_cntx_set_blkszs() to take
  a second scalar for each blocksize entry. The semantic meaning of the
  two scalars now is that the first will scale the default blocksize
  while the second will scale the maximum blocksize. This allows scaling
  the two independently, and was needed to support 1m, which requires
  scaling for a register blocksize but not the register storage
  blocksize (ie: "packdim") analogue.
- Deprecated bli_blksz_reduce_dt_to() and defined two new functions,
  bli_blksz_reduce_def_to() and bli_blksz_reduce_max_to(), for reducing
  default and maximum blocksizes to some desired blocksize multiple.
  These functions are needed in the updated definitions of
  bli_cntx_set_blkszs() and bli_gks_cntx_set_blkszs().
- Added support for the 1e and 1r packing schemas to packm, including
  1e/1r packing kernels.
- Added a minor optimization to bli_gemm_ker_var2() that allows, under
  certain circumstances (specifically, real domain beta and row- or
  column-stored matrix C), the real domain macrokernel and microkernel
  to be called directly, rather than using the virtual microkernel
  via the complex domain macrokernel, which carries a slight additional
  amount of overhead.
- Added 1m support to the testsuite.
- Added 1m support to Makefile and runme.sh in test/3m4m. Also simplified
  some code in test_gemm.c driver.
2016-11-25 18:29:49 -06:00
Field G. Van Zee
c31b1e7b9d Relax alignment restrictions for sandybridge ukrs.
Details:
- Relaxed the base pointer and leading dimension alignment restrictions
  in the sandybridge gemm microkernels, allowing the use of vmovups/vmovupd
  instead of vmovaps/vmovapd. These change mimic those made to the haswell
  microkernels in e0d2fa0 and ee2c139.
- Updated testsuite modules as well as standalone test drivers in 'test'
  directory to use DBL_MAX as the initial time candidate. Thanks to Devin
  Matthews for suggesting this change.
- Inserted #include "float.h" into bli_system.h (to gain access to DBL_MAX).
- Minor update (vis-a-vis contexts) to driver code in test/3m4m.
2016-07-27 15:58:07 -05:00
Field G. Van Zee
c3a4d39d03 Updates to haswell gemm micro-kernels.
Details:
- Added two new sets of [sd]gemm micro-kernels for haswell architectures,
  one that is 4x24/4x12 (s and d) and one that is 6x16/6x8.
- Changed the haswell configuration to use the 6x16/6x8 micro-kernels
  by default.
- Updated various Makefiles, in test, test/3m4m, and testsuite.
2016-05-04 17:22:56 -05:00
Field G. Van Zee
55329906ec Minor edits to README.md, testsuite.
Details:
- Fixed typos in README.md.
- Fixed column heading alignment for testsuite when matlab output is
  enabled.
- Minor updates to test/3m4m/runme.sh and test/3m4m/Makefile.
2015-09-26 20:47:19 -05:00
Field G. Van Zee
9135dfd69d Minor updates to test/3m4m files. 2015-06-05 13:37:44 -05:00
Field G. Van Zee
d62ceece94 Minor update to test/3m4m/runme.sh.
Details:
- Removed some stale script code that should have been removed
  during 590bb3b8c.
2015-06-03 12:56:45 -05:00
Field G. Van Zee
590bb3b8c5 Backed-out adjusted dim changes to test/3m4m.
Details:
- Reverted most changes applied during commit ec25807b.
2015-05-24 16:02:53 -05:00
Field G. Van Zee
ec25807b26 Tweaks to test/3m4m to test with adjusted dims.
Details:
- Updated test/3m4m driver files to build test drivers that allow
  comparision of real "asm_blis" results to complex "asm_blis" results,
  except with the latter's problem sizes adjusted so that problems are
  generated with equal flop counts.
2015-04-10 13:23:50 -05:00
Field G. Van Zee
c84286d5ce More minor tweaks to test/3m4m.
Details:
- Added a line of output that forces matlab to allocate the entire array
  up-front.
- Re-enabled real domain benchmarks in runme.sh, which were temporarily
  disabled.
2015-04-04 15:39:14 -05:00
Field G. Van Zee
309717c8eb More tweaks to test/3m4m, configurations.
Details:
- Fixed incorrect number of mc_x_kc memory blocks in
  sandybridge/bli_config.h.
- Enabled OpenMP multithreding in piledriver/bli_config.h.
- More updates to test/3m4m driver files.
2015-04-03 19:28:49 -05:00
Field G. Van Zee
4baf3b9c69 Tweaked test/3m4m driver, including acml support.
Details:
- Added ACML support to test/3m4m driver Makefile and runme.sh script.
2015-04-03 16:44:32 -05:00
Field G. Van Zee
349e075ad6 Tweaks to sandybridge config, test/3m4m driver.
Details:
- Enable OpenMP support by default in sandybridge's bli_config.h.
- Reorganized sandybridge's bli_kernel.h.
- Updated 3m4m Makefile, runme.sh to also test MKL implementation.
2015-04-02 18:12:28 -05:00
Field G. Van Zee
26a4b8f6f9 Implemented 3m2, 3m3 induced algorithms (gemm only).
Details:
- Defined a new "3ms" (separated 3m) pack schema and added appropriate
  support in packm_init(), packm_blk_var2().
- Generalized packm_struc_cxk_3mi to take the imaginary stride (is_p)
  as an argument instead of computing it locally. Exception: for trmm,
  is_p must be computed locally, since it changes for triangular
  packed matrices. Also exposed is_p in interface to dt-specific
  packm_blk_var2 (and _var1, even though it does not use imaginary
  stride).
- Renamed many functions/variables from _3mi to _3mis to indicate that
  they work for either interleaved or separated 3m pack schemas.
- Generalized gemm and herk macro-kernels to pass in imaginary stride
  rather than compute them locally.
- Added support for 3m2 and 3m3 algorithms to frame/ind, including 3m2-
  and 3m3-specific virtual micro-kernels.
- Added special gemm macro-kernels to support 3m2 and 3m3.
- Added support for 3m2 and 3m3 to testsuite.
- Corrected the type of the panel dimension (pd_) in various macro-
  kernels from inc_t to dim_t.
- Renamed many functions defined in bli_blocksize.c.
- Moved most induced-related macro defs from frame/include to
  frame/ind/include.
- Updated the _ukernel.c files so that the micro-kernel function pointers
  are obtained from the func_t objects rather than the cpp macros that
  define the function names.
- Updated test/3m4m driver, Makefile, and run script.
2015-04-01 10:44:54 -05:00
Field G. Van Zee
a86db60ee2 Extensive renaming of 3m/4m-related files, symbols.
Details:
- Renamed all remaining 3m/4m packing files and symbols to 3mi/4mi
  ('i' for "interleaved"). Similar changes to 3M/4M macros.
- Renamed all 3m/4m files and functions to 3m1/4m1.
- Whitespace changes.
2015-02-23 18:42:39 -06:00
Field G. Van Zee
c60619c7c3 Minor tweaks for 3m4m test drivers.
Details:
- Changed gemm_kc blocksizes to be reduced by two-thirds instead of
  half.
- Changed 3m4m/test_gemm.c driver to divide by 3 instead of 2 when
  computing the fixed k dimension.
- Fixed runme.sh so that it would use multiple threads for s/dgemm
  cases.
2014-12-16 17:08:22 -06:00
Field G. Van Zee
c6929ba6a5 Added 4m_1b to test/3m4m test driver and script. 2014-12-16 11:27:50 -06:00
Field G. Van Zee
add16b0e54 Added 3m4m test driver subdir of 'test'.
Details:
- Added a modified test driver for [cz]gemm that will test all 3m/4m
  as well as assembly-based and OpenBLAS implementations of gemm
  in single and multithreaded modes.
2014-10-17 11:49:24 -05:00