1664 Commits

Author SHA1 Message Date
Field G. Van Zee
18c876b989 Version file update (0.6.0) 0.6.0 2019-06-03 18:37:19 -05:00
Field G. Van Zee
0f1b3bf49e ReleaseNotes.md update in advance of next version.
Details:
- Updated ReleaseNotes.md in preparation for next version.
- CREDITS file update.
2019-06-03 18:35:19 -05:00
Field G. Van Zee
27da2e8400 Minor edits to docs/PerformanceSmall.md.
Details:
- Added performance analysis to "Comments" section of both Kaby Lake and
  Epyc sections.
- Added emphasis to certain passages.
2019-06-03 17:14:56 -05:00
Field G. Van Zee
09ba05c6f8 Added sup performance graphs/document to 'docs'.
Details:
- Added a new markdown document, docs/PerformanceSmall.md, which
  publishes new performance graphs for Kaby Lake and Epyc showcasing
  the new BLIS sup (small/skinny/unpacked) framework logic and kernels.
  For now, only single-threaded dgemm performance is shown.
- Reorganized graphs in docs/graphs into docs/graphs/large, with new
  graphs being placed in docs/graphs/sup.
- Updates to scripts in test/sup/octave, mostly to allow decent output
  in both GNU octave and Matlab.
- Updated README.md to mention and refer to the new PerformanceSmall.md
  document.
2019-06-03 16:53:19 -05:00
Field G. Van Zee
6bf449cc69 Merge branch 'amd' 2019-05-31 17:42:40 -05:00
Field G. Van Zee
a4e8801d08 Increased MT sup threshold for double to 201.
Details:
- Fine-tuned the double-precision real MT threshold (which controls
  whether the sup implementation kicks for smaller m dimension values)
  from 180 to 201 for haswell and 180 to 256 for zen.
- Updated octave scripts in test/sup/octave to include a seventh column
  to display performance for m = n = k.
2019-05-31 17:30:51 -05:00
Field G. Van Zee
abd8a9fa7d Inadvertantly hidden xerbla_() in blastest (#313).
Details:
- Attempted a fix to issue #313, which reports that when building only
  a shared library (ie: static library build is disabled), running the
  BLAS test drivers can fail because those drivers provide their own
  local version of xerbla_() as a clever (albeit still rather hackish)
  way of checking the error codes that result from the individual tests.
  This local xerbla_() function is never found at link-time because the
  BLAS test drivers' Makefile imports BLIS compilation flags via the
  get-user-cflags-for() function, which currently conveys the
  -fvisibility=hidden flag, which hides symbols unless they are
  explicitly annotated for export. The -fvisibility=hidden flag was
  only ever intended for use when building BLIS (not for applications),
  and so the attempted solution here is to omit the symbol export
  flag(s) from get-user-cflags-for() by storing the symbol export
  flag(s) to a new BULID_SYMFLAGS variable instead of appending it
  to the subconfigurations' CMISCFLAGS variable (which is returned by
  every get-*-cflags-for() function). Thanks to M. Zhou for reporting
  this issue and also to Isuru Fernando for suggesting the fix.
- Renamed BUILD_FLAGS to BUILD_CPPFLAGS to harmonize with the newly
  created BUILD_SYMFLAGS.
- Fixed typo in entry for --export-shared flag in 'configure --help'
  text.
2019-05-28 12:49:44 -05:00
Field G. Van Zee
755730608d Minor rewording of language around mt env. vars. 2019-05-23 17:34:36 -05:00
Field G. Van Zee
ba31abe73c Added BLIS theading info to Performance.md.
Details:
- Documented the BLIS environment variables that were set
  (e.g. BLIS_JC_NT, BLIS_IC_NT, BLIS_JR_NT) for each machine and
  threading configuration in order to achieve the parallelism reported
  on in docs/Performance.md.
2019-05-23 14:59:53 -05:00
Field G. Van Zee
cb788ffc89 Increased MT sup threshold for double to 180.
Details:
- Increased the double-precision real MT threshold (which controls
  whether the sup implementation kicks for smaller m dimension values)
  from 80 to 180, and this change was made for both haswell and zen
  subconfigurations. This is less about the m dimension in particular
  and more about facilitating a smoother performance transition when
  m = n = k.
2019-05-23 13:00:53 -05:00
Field G. Van Zee
057f5f3d21 Minor build system housekeeping.
Details:
- Commented out redundant setting of LIBBLIS_LINK within all driver-
  level Makefiles. This variable is already set within common.mk, and
  so the only time it should be overridden is if the user wants to link
  to a different copy of libblis.
- Very minor changes to build/gen-make-frags/gen-make-frag.sh.
- Whitespace and inconsequential quoting change to configure.
- Moved top-level 'windows' directory into a new 'attic' directory.
2019-05-23 12:51:17 -05:00
Jeff Hammond
32392cfc72 add info about CXX in configure (#311) 2019-05-14 14:52:30 -05:00
Field G. Van Zee
fa7e6b182b Define _POSIX_C_SOURCE in bli_system.h.
Details:
- Added
    #ifndef _POSIX_C_SOURCE
    #define _POSIX_C_SOURCE 200809L
    #endif
  to bli_system.h so that an application that uses BLIS (specifically,
  an application that #includes blis.h) does not need to remember to
  #define the macro itself (either on the command line or in the code
  that includes blis.h) in order to activate things like the pthreads.
  Thanks to Christos Psarras for reporting this issue and suggesting
  this fix.
- Commented out #include <sys/time.h> in bli_system.h, since I don't
  think this header is used/needed anymore.
- Comment update to function macro for bli_?normiv_unb_var1() in
  frame/util/bli_util_unb_var1.c.
2019-05-01 19:13:00 -05:00
Field G. Van Zee
3df84f1b5d Minor bugfixes in sup dgemm implementation.
Details:
- Fixed an obscure but in the bli_dgemmsup_rv_haswell_asm_5x8n() kernel
  that only affected the beta == 0, column-storage output case. Thanks
  to the BLAS test drivers for catching this bug.
- Previously, bli_gemmsup_ref_var1n() and _var2m() were returning if
  k = 0, when the correct action would be to scale by beta (and then
  return). Thanks to the BLAS test drivers to catching this bug.
- Changed the sup threshold behavior such that the sup implementation
  only kicks in if a matrix dimension is strictly less than (rather than
  less than or equal to) the threshold in question.
- Initialize all thresholds to zero (instead of 10) by default in
  ref_kernels/bli_cntx_ref.c. This, combined with the above change to
  threshold testing means that calls to BLIS or BLAS with one or more
  matrix dimensions of zero will no longer trigger the sup
  implementation.
- Added disabled debugging output to frame/3/bli_l3_sup.c (for future
  use, perhaps).
2019-04-27 21:27:32 -05:00
Field G. Van Zee
ecbdd1c42d Ceased use of BLIS_ENABLE_SUP_MR/NR_EXT macros.
Details:
- Removed already limited use of the BLIS_ENABLE_SUP_MR_EXT and
  BLIS_ENABLE_SUP_NR_EXT macros in bli_gemmsup_ref_var1n() and
  bli_gemmsup_ref_var2m(). Their purpose was merely to avoid a long
  conditional that would determine whether to allow the last iteration
  to be merged with the second-to-last iteration. Functionally, the
  macros were not needed, and they ended up causing problems when
  building configuration families such as intel64 and x86_64.
2019-04-27 19:38:11 -05:00
Field G. Van Zee
aa8a6bec30 Fixed typo in --disable-sup-handling macro guard.
Details:
- Fixed an incorrectly-named macro guard that is intended to allow
  disabling of the sup framework via the configure option
  --disable-sup-handling. In this case, the preprocessor macro,
  BLIS_DISABLE_SUP_HANDLING, was still named by its name from an older
  uncommitted version of the code (BLIS_DISABLE_SM_HANDLING).
2019-04-27 18:53:33 -05:00
Field G. Van Zee
b9c9f03502 Implemented gemm on skinny/unpacked matrices.
Details:
- Implemented a new sub-framework within BLIS to support the management
  of code and kernels that specifically target matrix problems for which
  at least one dimension is deemed to be small, which can result in long
  and skinny matrix operands that are ill-suited for the conventional
  level-3 implementations in BLIS. The new framework tackles the problem
  in two ways. First the stripped-down algorithmic loops forgo the
  packing that is famously performed in the classic code path. That is,
  the computation is performed by a new family of kernels tailored
  specifically for operating on the source matrices as-is (unpacked).
  Second, these new kernels will typically (and in the case of haswell
  and zen, do in fact) include separate assembly sub-kernels for
  handling of edge cases, which helps smooth performance when performing
  problems whose m and n dimension are not naturally multiples of the
  register blocksizes. In a reference to the sub-framework's purpose of
  supporting skinny/unpacked level-3 operations, the "sup" operation
  suffix (e.g. gemmsup) is typically used to denote a separate namespace
  for related code and kernels. NOTE: Since the sup framework does not
  perform any packing, it targets row- and column-stored matrices A, B,
  and C. For now, if any matrix has non-unit strides in both dimensions,
  the problem is computed by the conventional implementation.
- Implemented the default sup handler as a front-end to two variants.
  bli_gemmsup_ref_var2() provides a block-panel variant (in which the
  2nd loop around the microkernel iterates over n and the 1st loop
  iterates over m), while bli_gemmsup_ref_var1() provides a panel-block
  variant (2nd loop over m and 1st loop over n). However, these variants
  are not used by default and provided for reference only. Instead, the
  default sup handler calls _var2m() and _var1n(), which are similar
  to _var2() and _var1(), respectively, except that they defer to the
  sup kernel itself to iterate over the m and n dimension, respectively.
  In other words, these variants rely not on microkernels, but on
  so-called "millikernels" that iterate along m and k, or n and k.
  The benefit of using millikernels is a reduction of function call
  and related (local integer typecast) overhead as well as the ability
  for the kernel to know which micropanel (A or B) will change during
  the next iteration of the 1st loop, which allows it to focus its
  prefetching on that micropanel. (In _var2m()'s millikernel, the upanel
  of A changes while the same upanel of B is reused. In _var1n()'s, the
  upanel of B changes while the upanel of A is reused.)
- Added a new configure option, --[en|dis]able-sup-handling, which is
  enabled by default. However, the default thresholds at which the
  default sup handler is activated are set to zero for each of the m, n,
  and k dimensions, which effectively disables the implementation. (The
  default sup handler only accepts the problem if at least one dimension
  is smaller than or equal to its corresponding threshold. If all
  dimensions are larger than their thresholds, the problem is rejected
  by the sup front-end and control is passed back to the conventional
  implementation, which proceeds normally.)
- Added support to the cntx_t structure to track new fields related to
  the sup framework, most notably:
  - sup thresholds: the thresholds at which the sup handler is called.
  - sup handlers: the address of the function to call to implement
    the level-3 skinny/unpacked matrix implementation.
  - sup blocksizes: the register and cache blocksizes used by the sup
    implementation (which may be the same or different from those used
    by the conventional packm-based approach).
  - sup kernels: the kernels that the handler will use in implementing
    the sup functionality.
  - sup kernel prefs: the IO preference of the sup kernels, which may
    differ from the preferences of the conventional gemm microkernels'
    IO preferences.
- Added a bool_t to the rntm_t structure that indicates whether sup
  handling should be enabled/disabled. This allows per-call control
  of whether the sup implementation is used, which is useful for test
  drivers that wish to switch between the conventional and sup codes
  without having to link to different copies of BLIS. The corresponding
  accessor functions for this new bool_t are defined in bli_rntm.h.
- Implemented several row-preferential gemmsup kernels in a new
  directory, kernels/haswell/3/sup. These kernels include two general
  implementation types--'rd' and 'rv'--for the 6x8 base shape, with
  two specialized millikernels that embed the 1st loop within the kernel
  itself.
- Added ref_kernels/3/bli_gemmsup_ref.c, which provides reference
  gemmsup microkernels. NOTE: These microkernels, unlike the current
  crop of conventional (pack-based) microkernels, do not use constant
  loop bounds. Additionally, their inner loop iterates over the k
  dimension.
- Defined new typedef enums:
  - stor3_t: captures the effective storage combination of the level-3
    problem. Valid values are BLIS_RRR, BLIS_RRC, BLIS_RCR, etc. A
    special value of BLIS_XXX is used to denote an arbitrary combination
    which, in practice, means that at least one of the operands is
    stored according to general stride.
  - threshid_t: captures each of the three dimension thresholds.
- Changed bli_adjust_strides() in bli_obj.c so that bli_obj_create()
  can be passed "-1, -1" as a lazy request for row storage. (Note that
  "0, 0" is still accepted as a lazy request for column storage.)
- Added support for various instructions to bli_x86_asm_macros.h,
  including imul, vhaddps/pd, and other instructions related to integer
  vectors.
- Disabled the older small matrix handling code inserted by AMD in
  bli_gemm_front.c, since the sup framework introduced in this commit
  is intended to provide a more generalized solution.
- Added test/sup directory, which contains standalone performance test
  drivers, a Makefile, a runme.sh script, and an 'octave' directory
  containing scripts compatible with GNU Octave. (They also may work
  with matlab, but if not, they are probably close to working.)
- Reinterpret the storage combination string (sc_str) in the various
  level-3 testsuite modules (e.g. src/test_gemm.c) so that the order
  of each matrix storage char is "cab" rather than "abc".
- Comment updates in level-3 BLAS API wrappers in frame/compat.
2019-04-27 18:44:50 -05:00
Isuru Fernando
0d549ceda8 make unix friendly archives on appveyor (#310) 2019-04-27 17:56:02 -05:00
Field G. Van Zee
945928c650 Merge branch 'amd' of github.com:flame/blis into amd 2019-04-17 15:58:56 -05:00
Field G. Van Zee
74e513eb6a Support row storage in Eigen gemm test/3 driver.
Details:
- Added preprocessor branches to test/3/test_gemm.c to explicitly
  support row-stored matrices. Column-stored matrices are also still
  supported (and is the default for now). (This is mainly residual work
  leftover from initial integration of Eigen into the test drivers, so
  if we ever want to test Eigen with row-stored matrices, the code will
  be ready to use, even if it is not yet integrated into the Makefile
  in test/3.)
2019-04-17 13:34:44 -05:00
Field G. Van Zee
b5d457fae9 Applied forgotten variable rename from 89a70cc.
Details:
- Somehow the variable name change (root_file_name -> root_inputname)
  in flatten-headers.py mentioned in the commit log entry for 89a70cc
  didn't make it into the actual commit. This commit applies that
  change.
2019-04-16 12:50:01 -05:00
Field G. Van Zee
89a70cccf8 GNU-like handling of installation prefix et al.
Details:
- Changed the default installation prefix from $HOME/lib to /usr/local.
- Modified the way configure internally handles the prefix, libdir,
  includedir, and sharedir (and also added an --exec-prefix option).
  The defaults to these variables are set as follows:
    prefix:      /usr/local
    exec_prefix: ${prefix}
    libdir:      ${exec_prefix}/lib
    includedir:  ${prefix}/include
    sharedir:    ${prefix}/share
  The key change, aside from the addition of exec_prefix and its use to
  define the default to libdir, is that the variables are substituted
  into config.mk with quoting that delays evaluation, meaning the
  substituted values may contain unevaluated references to other
  variables (namely, ${prefix} and ${exec_prefix}). This more closely
  follows GNU conventions, including those used by GNU autoconf, and
  also allows make to override any one of the variables *after*
  configure has already been run (e.g. during 'make install').
- Updates to build/config.mk.in pursuant to above changes.
- Updates to output of 'configure --help' pursuant to above changes.
- Updated docs/BuildSystem.md to reflect the new default installation
  prefix, as well as mention EXECPREFIX and SHAREDIR.
- Changed the definitions of the UNINSTALL_OLD_* variables in the
  top-level Makefile to use $(wildcard ...) instead of 'find'. This
  was motivated by the new way of handling prefix and friends, which
  leads to the 'find' command being run on /usr/local (by default),
  which can take a while almost never yielding any benefit (since the
  user will very rarely use the uninstall-old targets).
- Removed periods from the end of descriptive output statements (i.e.,
  non-verbose output) since those statements often end with file or
  directory paths, which get confusing to read when puctuated by a
  period.
- Trival change to 'make showconfig' output.
- Removed my name from 'configure --help'. (Many have contributed to it
  over the years.)
- In configure script, changed the default state of threading_model
  variable from 'no' to 'off' to match that of debug_type, where there
  are similarly more than two valid states. ('no' is still accepted
  if given via the --enable-debug= option, though it will be
  standardized to 'off' prior to config.mk being written out.)
- Minor variable name change in flatten-headers.py that was intended for
  32812ff.
- CREDITS file update.
2019-04-11 18:33:08 -05:00
Field G. Van Zee
32812ff5ab Minor bugfix to flatten-headers.py.
Details:
- Fixed a minor bug in flatten-headers.py whereby the script, upon
  encountering a #include directive for the root header file, would
  erroneously recurse and inline the conents of that root header.
  The script has been modified to avoid recursion into any headers
  that share the same name as the root-level header that was passed
  into the script. (Note: this bug didn't actually manifest in BLIS,
  so it's merely a precaution for usage of flatten-headers.py in other
  contexts.)
2019-04-09 12:20:19 -05:00
Field G. Van Zee
bec90e0b6a Minor update to docs/HardwareSupport.md document.
Details:
- Added more details and clarifying language to implications of 1m and
  the recycling of microkernels between microarchitectures.
2019-04-02 17:45:13 -05:00
Field G. Van Zee
89cd650e7b Use void_fp for function pointers instead of void*.
Change void*-typed function pointers to void_fp.
- Updated all instances of void* variables that store function pointers
  to variables of a new type, void_fp. Originally, I wanted to define
  the type of void_fp as "void (*void_fp)( void )"--that is, a pointer
  to a function with no return value and no arguments. However, once
  I did this, I realized that gcc complains with incompatible pointer
  type (-Wincompatible-pointer-types) warnings every time any such a
  pointer is being assigned to its final, type-accurate function
  pointer type. That is, gcc will silently typecast a void* to
  another defined function pointer type (e.g. dscalv_ker_ft) during
  an assignment from the former to the latter, but the same statement
  will trigger a warning when typecasting from a void_fp type. I suspect
  an explicit typecast is needed in order to avoid the warning, which
  I'm not willing to insert at this time.
- Added a typedef to bli_type_defs.h defining void_fp as void*, along
  with a commented-out version of the aborted definition described
  above. (Note that POSIX requires that void* and function pointers
  be interchangeable; it is the C standard that does not provide this
  guarantee.)
- Comment updates to various _oapi.c files.
2019-04-02 17:23:55 -05:00
Field G. Van Zee
ffce3d632b Renamed armv8a gemm kernel filename.
Details:
- Renamed
    kernels/armv8a/3/bli_gemm_armv8a_opt_4x4.c
  to
    kernels/armv8a/3/bli_gemm_armv8a_asm_d6x8.c.
  This follows the naming convention used by other kernel sets, most
  notably haswell.
2019-04-02 14:40:50 -05:00
Isuru Fernando
77867478af Use pthreads on MinGW and Cygwin (#307) 2019-04-02 13:33:11 -05:00
Field G. Van Zee
7bc75882f0 Updated Eigen results in docs/graphs with 3.3.90.
Details:
- Updated the level-3 performance graphs in docs/graphs with new Eigen
  results, this time using a development version cloned from their git
  mirror on March 27, 2019 (version 3.3.90). Performance is improved
  over 3.3.7, though still noticeably short of BLIS/MKL in most cases.
- Very minor updates to docs/Performance.md and matlab scripts in
  test/3/matlab.
2019-03-28 17:40:50 -05:00
Field G. Van Zee
20ea7a1217 Minor text updates (Eigen) to docs/Performance.md.
Details:
- Added/updated a few more details, mostly regarding Eigen.
2019-03-27 18:09:17 -05:00
Field G. Van Zee
bfb7e1bc6a Merge branch 'dev' 2019-03-27 17:58:19 -05:00
Field G. Van Zee
2c85e1dd9d Added Eigen results to performance graphs.
Details:
- Updated the Haswell, SkylakeX, and Epyc performance graphs in
  docs/graphs to report on Eigen implementations, where applicable.
  Specifically, Eigen implements all level-3 operations sequentially,
  however, of those operations it only provides multithreaded gemm.
  Thus, mt results for symm/hemm, syrk/herk, trmm, and trsm are
  omitted. Thanks to Sameer Agarwal for his help configuring and
  using Eigen.
- Updated docs/Performance.md to note the new implementation tested.
- CREDITS file update.
2019-03-27 16:29:51 -05:00
Field G. Van Zee
bfac7e385f Added ability to plot with Eigen in test/3/matlab.
Details:
- Updated matlab scripts in test/3/matlab to optionally plot/display
  Eigen performance curves. Whether Eigen is plotted is determined by
  a new boolean function parameter, with_eigen.
- Updated runme.m scratchpad to reflect the latest invocations of the
  plot_panel_4x5() function (with Eigen plotting enabled).
2019-03-27 16:04:48 -05:00
Field G. Van Zee
67535317b9 Fixed mislabeled eigen output from test/3 drivers.
Details:
- Fixed the Makefile in test/3 so that it no longer incorrectly labels
  the matlab output variables from Eigen-linked hemm, herk, trmm, and
  trsm driver output as "vendor". (The gemm drivers were already
  correctly outputing matlab variables containing the "eigen" label.)
2019-03-27 13:32:18 -05:00
Isuru Fernando
044df9506f Test with shared on windows (#306)
Export macros can't support both shared and static at the same time.
When blis is built with both shared and static, headers assume that
shared is used at link time and dllimports the symbols with __imp_
prefix.

To use the headers with static libraries a user can give
-DBLIS_EXPORT= to import the symbol without the __imp_ prefix
2019-03-27 12:39:31 -05:00
Field G. Van Zee
5e6b160c8a Link to Eigen BLAS for non-gemm drivers in test/3.
Details:
- Adjusted test/3/Makefile so that the test drivers are linked against
  Eigen's BLAS library for hemm, herk, trmm, and trsm. We have to do
  this since Eigen's headers don't define implementations to the
  standard BLAS APIs.
- Simplified #included headers in hemm, herk, trmm, and trsm source
  driver files, since nothing specific to Eigen is needed at
  compile-time for those operations.
2019-03-26 19:10:59 -05:00
Field G. Van Zee
e593221383 Merge branch 'master' into dev 2019-03-26 15:51:45 -05:00
Field G. Van Zee
92fb9c87bf Add more support for Eigen to drivers in test/3.
Details:
- Use compile-time implementations of Eigen in test_gemm.c via new
  EIGEN cpp macro, defined on command line. (Linking to Eigen's BLAS
  library is not necessary.) However, as of Eigen 3.3.7, Eigen only
  parallelizes the gemm operation and not hemm, herk, trmm, trsm, or
  any other level-3 operation.
- Fixed a bug in trmm and trsm drivers whereby the wrong function
  (bli_does_trans()) was being called to determine whether the object
  for matrix A should be created for a left- or right-side case. This
  was corrected by changing the function to bli_is_left(), as is done
  in the hemm driver.
- Added support for running Eigen test drivers from runme.sh.
2019-03-26 15:43:23 -05:00
Isuru Fernando
c208b9dc46 Fix clang version detection (#305)
clang -dumpversion gives 4.2.1 for all clang versions as clang was
originally compatible with gcc 4.2.1

Apple clang version and clang version are two different things
and the real clang version cannot be deduced from apple clang version
programatically. Rely on wikipedia to map apple clang to clang version

Also fixes assembly detection with clang

clang 3.8 can't build knl as it doesn't recognize zmm0
2019-03-25 13:03:44 -05:00
Field G. Van Zee
feefcab442 Allow disabling of BLAS prototypes at compile-time.
Details:
- Modified bli_blas.h so that:
  - By default, if the BLAS layer is enabled at configure-time, BLAS
    prototypes are also enabled within blis.h;
  - But if the user #defines BLIS_DISABLE_BLAS_DEFS prior to including
    blis.h, BLAS prototypes are skipped over entirely so that, for
    example, the application or some other header pulled in by the
    application may prototype the BLAS functions without causing any
    duplication.
- Updated docs/BuildSystem.md to document the feature above, and
  related text.
2019-03-21 18:11:20 -05:00
Field G. Van Zee
288843b06d Added Eigen support to test/3 Makefile, runme.sh.
Details:
- Added targets to test/3/Makefile that link against a BLAS library
  build by Eigen. It appears, however, that Eigen's BLAS library does
  not support multithreading. (It may be that multithreading is only
  available when using the native C++ APIs.)
- Updated runme.sh with a few Eigen-related tweaks.
- Minor tweaks to docs/Performance.md.
2019-03-20 17:52:23 -05:00
Field G. Van Zee
153e0be21d More minor tweaks to docs/Performance.md.
Details:
- Defined GFLOPS as billions of floating-point operations per second,
  and reworded the sentence after about normalization.
2019-03-19 17:53:18 -05:00
Field G. Van Zee
05c4e42642 CHANGELOG update (0.5.2) 2019-03-19 17:07:20 -05:00
Field G. Van Zee
9204cd0cb0 Version file update (0.5.2) 0.5.2 2019-03-19 17:07:18 -05:00
Field G. Van Zee
64560cd924 ReleaseNotes.md update in advance of next version.
Details:
- Updated ReleaseNotes.md in preparation for next version.
2019-03-19 17:04:20 -05:00
Field G. Van Zee
ab5ad557ea Very minor tweaks to Performance.md. 2019-03-19 16:50:41 -05:00
Field G. Van Zee
03c4a25e1a Minor fixes to docs/Performance.md.
Details:
- Fixed some incorrect labels associated with the pdf/png graphs,
  apparently the result of copy-pasting.
2019-03-19 16:47:15 -05:00
Field G. Van Zee
fe6dd8b132 Fixed broken section links in docs/Performance.md.
Details:
- Fixed a few broken section links in the Contents section.
2019-03-19 16:30:23 -05:00
Field G. Van Zee
913cf97653 Added docs/Performance.md and docs/graphs subdir.
Details:
- Added a new markdown document, docs/Performance.md, which reports
  performance of a representative set of level-3 operations across a
  variety of hardware architectures, comparing BLIS to OpenBLAS and a
  vendor library (MKL on Intel/AMD, ARMPL on ARM). Performance graphs,
  in pdf and png formats, reside in docs/graphs.
- Updated README.md to link to new Performance.md document.
- Minor updates to CREDITS, docs/Multithreading.md.
- Minor updates to matlab scripts in test/3/matlab.
2019-03-19 16:15:24 -05:00
Field G. Van Zee
9945ef24fd Adjusted cache blocksizes for zen subconfig.
Details:
- Adjusted the zen sub-configuration's cache blocksizes for float,
  scomplex, and dcomplex based on the existing values for double.
  (The previous values were taken directly from the haswell subconfig,
  which targets Intel Haswell/Broadwell/Skylake systems.)
2019-03-19 15:28:44 -05:00
Field G. Van Zee
d202d008d5 Renamed --enable-export-all to --export-shared=[].
Details:
- Replaced the existing --enable-export-all / --disable-export-all
  configure option with --export-shared=[public|all], with the 'public'
  instance of the latter corresponding to --disable-export-all and the
  'all' instance corresponding to --enable-export-all. Nothing else
  semantically about the option, or its default, has changed.
2019-03-18 18:18:25 -05:00