Commit Graph

1829 Commits

Author SHA1 Message Date
Field G. Van Zee
31f11a06ea Updates to octave scripts in test/sup[mt]/octave.
Details:
- Optimized scripts in test/sup/octave and test/supmt/octave for use
  with octave 5.2.0 on Ubuntu 18.04.
- Fixed stray 'end' keywords in gen_opsupnames.m and plot_l3sup_perf.m,
  which were not only unnecessary but also causing issues with versions
  5.x.
2020-02-27 14:33:20 -06:00
Field G. Van Zee
9e5f7296cc Skip building thrinfo_t tree when mt is disabled.
Details:
- Return early from bli_thrinfo_sup_grow() if the thrinfo_t object
  address is equal to either &BLIS_GEMM_SINGLE_THREADED or
  &BLIS_PACKM_SINGLE_THREADED.
- Added preprocessor logic to bli_l3_sup_thread_decorator() in
  bli_l3_sup_decor_single.c that (by default) disables code that
  creates and frees the thrinfo_t tree and instead passes
  &BLIS_GEMM_SINGLE_THREADED as the thrinfo_t pointer into the
  sup implementation.
- The net effect of the above changes is that a small amount of
  thrinfo_t overhead is avoided when running small/skinny dgemm
  problems when BLIS is compiled with multithreading disabled.
2020-02-18 15:16:03 -06:00
Field G. Van Zee
90081e6a64 Fixed bug(s) in mt sup when single-threaded.
Details:
- Fixed a syntax bug in bli_l3_sup_decor_single.c as a result of
  changing function interface for the thread entry point function
  (of type l3supint_t).
- Unfortunately, fixing the interface was not enough, as it caused
  a memory leak in the sba at bli_finalize() time. It turns out that,
  due to the new multithreading-capable variant code useing thrinfo_t
  objects--specifically, their calling of bli_thrinfo_grow()--we
  have to pass in a real thrinfo_t object rather than the global
  objects &BLIS_PACKM_SINGLE_THREADED or &BLIS_GEMM_SINGLE_THREADED.
  Thus, I inserted the appropriate logic from the OpenMP and pthreads
  versions so that single-threaded execution would work as intended
  with the newly upgraded variants.
2020-02-17 14:57:25 -06:00
Field G. Van Zee
c0558fde45 Support multithreading within the sup framework.
Details:
- Added multithreading support to the sup framework (via either OpenMP
  or pthreads). Both variants 1n and 2m now have the appropriate
  threading infrastructure, including data partitioning logic, to
  parallelize computation. This support handles all four combinations
  of packing on matrices A and B (neither, A only, B only, or both).
  This implementation tries to be a little smarter when automatic
  threading is requested (e.g. via BLIS_NUM_THREADS) in that it will
  recalculate the factorization in units of micropanels (rather than
  using the raw dimensions) in bli_l3_sup_int.c, when the final
  problem shape is known and after threads have already been spawned.
- Implemented bli_?packm_sup_var2(), which packs to conventional row-
  or column-stored matrices. (This is used for the rrc and crc storage
  cases.) Previously, copym was used, but that would no longer suffice
  because it could not be parallelized.
- Minor reorganization of packing-related sup functions. Specifically,
  bli_packm_sup_init_mem_[ab]() are called from within packm_sup_[ab]()
  instead of from the variant functions. This has the effect of making
  the variant functions more readable.
- Added additional bli_thrinfo_set_*() static functions to bli_thrinfo.h
  and inserted usage of these functions within bli_thrinfo_init(), which
  previously was accessing thrinfo_t fields via the -> operator.
- Renamed bli_partition_2x2() to bli_thread_partition_2x2().
- Added an auto_factor field to the rntm_t struct in order to track
  whether automatic thread factorization was originally requested.
- Added new test drivers in test/supmt that perform multithreaded sup
  tests, as well as appropriate octave/matlab scripts to plot the
  resulting output files.
- Added additional language to docs/Multithreading.md to make it clear
  that specifying any BLIS_*_NT variable, even if it is set to 1, will
  be considered manual specification for the purposes of determining
  whether to auto-factorize via BLIS_NUM_THREADS.
- Minor comment updates.
2020-02-17 14:08:08 -06:00
Field G. Van Zee
d7a7679182 Fixed int-to-packbuf_t conversion error (C++ only).
Details:
- Fixed an error that manifests only when using C++ (specifically,
  modern versions of g++) to compile drivers in 'test' (and likely most
  other application code that #includes blis.h. Thanks to Ajay Panyala
  for reporting this issue (#374).
2020-02-07 17:37:03 -06:00
Field G. Van Zee
d626112b8d Removed sorting on LDFLAGS in common.mk (#373).
Details:
- Removed a line of code in common.mk that passed LDFLAGS through the
  sort function. The purpose was not to sort the contents, but rather
  to remove duplicates. However, there is valid syntax in a string of
  linker flags that, when sorted, yields different/broken behavior.
  So I've removed the line in common.mk that sorts LDFLAGS. Also, for
  future use, I've added a new function, rm-dupls, that removes
  duplicates without sorting. (This function was based on code from a
  stackoverflow thread that is linked to in the comments for that
  code.) Thanks to Isuru Fernando for reporting this issue (#373).
2020-01-15 13:27:02 -06:00
Field G. Van Zee
e67deb22aa CHANGELOG update (0.6.1) 2020-01-14 16:01:34 -06:00
Field G. Van Zee
10949f528c Version file update (0.6.1) 0.6.1 2020-01-14 16:01:33 -06:00
Field G. Van Zee
5db8e710a2 ReleaseNotes.md update in advance of next version.
Details:
- Updated ReleaseNotes.md in preparation for next version.
2020-01-14 15:59:59 -06:00
Field G. Van Zee
cde4d9d7a2 Removed 'attic/windows' (to prevent confusion).
Details:
- Finally removed 'attic/windows' and its contents. This directory once
  contained "proto" Windows support for BLIS, but we've since moved on
  to (thanks to Isuru Fernando) providing Windows DLL support via
  AppVeyor's build artifacts. Furthermore, since 'windows' was the only
  subdirectory within 'attic', the directory path would show up in
  GitHub's listing at https://github.com/flame/blis, which probably led
  to someone being confused about how BLIS provides Windows support. I
  assume (but don't know for sure) that nobody is using these files, so
  this is admittedly a case of shoot first and ask questions later.
2020-01-14 15:19:25 -06:00
Field G. Van Zee
7d3407d468 CREDITS file update. 2020-01-14 15:17:53 -06:00
Dave Love
f391b3e2e7 Fix parsing in vpu_count on workstation SKX (#351)
* Fix parsing in vpu_count on workstation SKX

* Document Skylake-X as Haswell for single FMA

* Update vpu_count for Skylake and Cascade Lake models

* Support printing the configuration selected, controlled by the environment

Intended particularly for diagnosing mis-selection of SKX through
unknown, or incorrect, number of VPUs.

* Move bli_log outside the cpp condition, and use it where intended

* Add Fixme comment (Skylake D)

* Mostly superficial edits to commits towards #351.

Details:
- Moved architecture/sub-config logging-related code from bli_cpuid.c
  to bli_arch.c, tweaked names, and added more set/get layering.
- Tweaked log messages output from bli_cpuid_is_skx() in bli_cpuid.c.
- Content, whitespace changes to new bullet in HardwareSupport.md that
  relates to single-VPU Skylake-Xs.

* Fix comment typos

Co-authored-by: Field G. Van Zee <field@cs.utexas.edu>
2020-01-06 14:15:48 -06:00
Field G. Van Zee
5ca1a3cfc1 Fixed 'configure' breakage introduced in 6433831.
Details:
- Added a missing 'fi' (endif) keyword to a conditional block added in
  the configure script in commit 6433831.
2020-01-06 12:29:12 -06:00
Field G. Van Zee
e7431b4a83 Updated 1m draft article link in README.md. 2020-01-06 12:01:41 -06:00
Jeff Hammond
6433831cc3 blacklist ICC 18 for knl/skx due to test failures
Signed-off-by: Jeff Hammond <jeff.r.hammond@intel.com>
2020-01-03 17:51:05 -08:00
Jeff Hammond
af3589f1f9 blacklist Intel 19+
Signed-off-by: Jeff Hammond <jeff.r.hammond@intel.com>
2020-01-03 17:51:05 -08:00
Jeff Hammond
60de939deb fix link to docs
the comment contains an incorrect link, which is trivially fixed here.

@fgvanzee I hope you don't mind that I committed directly to master but this cannot break anything.
2020-01-01 21:30:38 -08:00
Field G. Van Zee
5271107378 Fixed bugs in cblas_sdsdot(), sdsdot_().
Details:
- Fixed a bug in sdsdot_sub() that redundantly added the "alpha" scalar,
  named 'sb'. This value was already being added by the underlying
  sdsdot_() function. Thus, we no longer add 'sb' within sdsdot_sub().
  Thanks to Simon Lukas Märtens for reporting this bug via #367.
- Fixed a second bug in order of typecasting intermediate products in
  sdsdot_(). Previously, the "alpha" scalar was being added after the
  "outer" typecast to float. However, the operation is supposed to first
  add the dot product to the (promoted) scalar and THEN downcast the sum
  to float. Thanks to Devin Matthews for catching this bug.
2019-12-16 16:30:26 -06:00
Field G. Van Zee
fe2560a4b1 Annoted missing thread-related symbols for export.
Details:
- Added BLIS_EXPORT_BLIS annotation to function prototypes for

    bli_thrcomm_bcast()
    bli_thrcomm_barrier()
    bli_thread_range_sub()

  so that these functions are exported to shared libraries by default.
  This (hopefully) fixes issue #366. Thanks to Kyungmin Lee for
  reporting this bug.
- CREDITS file update.
2019-12-06 17:12:44 -06:00
Field G. Van Zee
2853825234 Merge branch 'master' into amd 2019-12-06 16:06:46 -06:00
Nicholai Tukanov
61b1f0b060 Add prototypes for POWER9 reference kernels (#365)
Updates and fixes to power9 subconfig.

Details:
- Register s,c,z reference gemm and trsm ukernels that assume elements
  of B have been broadcast.
- Added prototypes for level-3 ukernels that assume elements of B have
  been broadcast. Also added prototype for an spackm function that
  employs a duplication/broadcast factor of 4.
- Register virtual gemmtrsm ukernels that work with broadcasting of B.
- Disable right-side hemm, symm, trmm, and trmm3 in bli_family_power9.h.
- Thanks to Nicholai Tukanov for providing these updates.
2019-12-04 14:18:47 -06:00
Field G. Van Zee
efa61a6c8b Added missing bli_l3_sup_thread_decorator() symbol.
Details:
- Defined dummy versions of bli_l3_sup_thread_decorator() for Openmp
  and pthreads so that those builds don't fail when performing shared
  library linking (especially for Windows DLLs via AppVeyor). For now,
  these dummy implementations of bli_l3_sup_thread_decorator() are
  merely carbon-copies of the implementation provided for single-
  threaded execution (ie: the one found in bli_l3_sup_decor_single.c).
  Thus, an OpenMP or pthreads build will be able to use the gemmsup
  code (including the new selective packing functionality), as it did
  before 39fa7136, even though it will not actually employ any
  multithreaded parallelism.
2019-11-29 16:17:04 -06:00
Field G. Van Zee
39fa7136f4 Added support for selective packing to gemmsup.
Details:
- Implemented optional packing for A or B (or both) within the sup
  framework (which currently only supports gemm). The request for
  packing either matrix A or matrix B can be made via setting
  environment variables BLIS_PACK_A or BLIS_PACK_B (to any
  non-zero value; if set, zero means "disable packing"). It can also
  be made globally at runtime via bli_pack_set_pack_a() and
  bli_pack_set_pack_b() or with individual rntm_t objects via
  bli_rntm_set_pack_a() and bli_rntm_set_pack_b() if using the expert
  interface of either the BLIS typed or object APIs. (If using the
  BLAS API, environment variables are the only way to communicate the
  packing request.)
- One caveat (for now) with the current implementation of selective
  packing is that any blocksize extension registered in the _cntx_init
  function (such as is currently used by haswell and zen subconfigs)
  will be ignored if the affected matrix is packed. The reason is
  simply that I didn't get around to implementing the necessary logic
  to pack a larger edge-case micropanel, though this is entirely
  possible and should be done in the future.
- Spun off the variant-choosing portion of bli_gemmsup_ref() into
  bli_gemmsup_int(), in bli_l3_sup_int.c.
- Added new files, bli_l3_sup_packm_a.c, bli_l3_sup_packm_b.c, along
  with corresponding headers, in which higher-level packm-related
  functions are defined for use within the sup framework. The actual
  packm variant code resides in bli_l3_sup_packm_var.c.
- Pass the following new parameters into var1n and var2m: packa, packb
  bool_t's, pointer to a rntm_t, pointer to a cntl_t (which is for now
  always NULL), and pointer to a thrinfo_t* (which for nowis the address
  of the global single-threaded packm thread control node).
- Added panel strides ps_a and ps_b to the auxinfo_t structure so that
  the millikernel can query the panel stride of the packed matrix and
  step through it accordingly. If the matrix isn't packed, the panel
  stride of interest for the given millikernel will be set to the
  appropriate value so that the mkernel may step through the unpacked
  matrix as it normally would.
- Modified the rv_6x8m and rv_6x8n millikernels to read the appropriate
  panel strides (ps_a and ps_b, respectively) instead of computing them
  on the fly.
- Spun off the environment variable getting and setting functions into
  a new file, bli_env.c (with a corresponding prototype header). These
  functions are now used by the threading infrastructure (e.g.
  BLIS_NUM_THREADS, BLIS_JC_NT, etc.) as well as the selective packing
  infrastructure (e.g. BLIS_PACK_A, BLIS_PACK_B).
- Added a static initializer for mem_t objects, BLIS_MEM_INITIALIZER.
- Added a static initializer for pblk_t objects, BLIS_PBLK_INITIALIZER,
  for use within the definition of BLIS_MEM_INITIALIZER.
- Moved the global_rntm object to bli_rntm.c and extern it where needed.
  This means that the function bli_thread_init_rntm() was renamed to
  bli_rntm_init_from_global() and relocated accordingly.
- Added a new bli_pack.c function, which serves as the home for
  functions that manage the pack_a and pack_b fields of the global
  rntm_t, including from environment variables, just as we have
  functions to manage the threading fields of the global rntm_t in
  bli_thread.c.
- Reorganized naming for files in frame/thread, which mostly involved
  spinning off the bli_l3_thread_decorator() functions into their own
  files. This change makes more sense when considering the further
  addition of bli_l3_sup_thread_decorator() functions (for now limited
  only to the single-threaded form found in the  _single.c file).
- Explicitly initialize the reference sup handlers in both
  bli_cntx_init_haswell.c and bli_cntx_init_zen.c so that it's more
  obvious how to customize to a different handler, if desired.
- Removed various snippets of disabled code.
- Various comment updates.
2019-11-29 15:27:07 -06:00
Field G. Van Zee
bbb21fd0a9 Tweaked SIAM/SC Best Prize language in README.md. 2019-11-21 18:15:16 -06:00
Field G. Van Zee
043366f92d Fixed typo in previous commit (SIAM/SC prize). 2019-11-21 18:13:51 -06:00
Field G. Van Zee
05a4d583e6 Added SIAM/SC prize to "What's New" in README.md. 2019-11-21 18:12:24 -06:00
Field G. Van Zee
881b05ecd4 Fixed blastest failure for 'generic' subconfig.
Details:
- Fixed a subtle and complicated bug that only manifested via the BLAS
  test drivers in the generic subconfiguration, and possibly any other
  subconfiguration that did not register complex-domain gemm ukernels,
  or registered ONLY real-domain ukernels as row-preferential. This is
  a long story, but it boils down to an exception to the "transpose the
  operation to bring storage of C into agreement with ukernel pref"
  optimization in bli_hemm_front.c and bli_symm_front.c sabotaging the
  proper functioning of the 1m method, but only when the imaginary
  component of beta is zero. See the comments in issue #342 for more
  details. Thanks to Dave Love for identifying the commit in which this
  bug was introduced, and other feedback related to this bug.
2019-11-21 16:34:27 -06:00
Field G. Van Zee
0c7165fb01 Fixed obscure bug in bli_acquire_mpart_[mn]dim().
Details:
- Fixed a bug in bli_acquire_mpart_mdim(), bli_acquire_mpart_ndim(),
  and bli_acquire_mpart_mndim() that allowed the use of a blocksize b
  that is too large given the current row/column index (i.e., the i/j
  argument) and the size of the dimension being partitioned (i.e., the
  m/n argument). This bug only affected backwards partitioning/motion
  through the dimension and was the result of a misplaced conditional
  check-and-redirect to the backwards code path. It should be noted
  that this bug was discovered not because it manifested the way it
  could (thanks to the callers in BLIS making sure to always pass in
  the "correct" blocksize b), but could have manifested if the
  functions were used by 3rd party callers. Thanks to Minh Quan Ho for
  reporting the bug via issue #363.
2019-11-14 16:48:14 -06:00
Field G. Van Zee
fb8bef9982 Fixed copy-paste bug in bli_spackm_6xk_bb4_ref().
Details:
- Fixed a copy-paste bug in the new bli_spackm_6xk_bb4_ref() that
  manifested as failures in single-precision real level-3 operations.
  Also replaced the duplication factor constants with a const-qualifed
  varialbe, dfac, so that this won't happen again.
- Changed NC for single-precision real from 4080 to 8160 so that the
  packed matrix B will have the same byte footprint in both single
  and double real.
2019-11-14 13:05:28 -06:00
Field G. Van Zee
8f399c8940 Tweaked/added notes to docs/Multithreading.md.
Details:
- Added language to docs/Multithreading.md cautioning the reader about
  the nuances of setting multithreading parameters via the manual and
  automatic ways simultaneously, and also about how these parameters
  behave when multithreading is disabled at configure-time. These
  changes are an attempt to address the issues that arose in issue #362.
  Thanks to Jérémie du Boisberranger for his feedback on this topic.
- CREDITS file update.
2019-11-12 15:32:57 -06:00
Field G. Van Zee
bdc7ee3394 Various fixes to support packing duplication in B.
Details:
- Added cpp macros to trmm and trmm3 front-ends to optionally force
  those operations to be cast so the structured matrix is on the left.
  symm and hemm already had such macros, but these too were renamed so
  that the macros were individual to the operation. We now have four
  such macros:
    #define BLIS_DISABLE_HEMM_RIGHT
    #define BLIS_DISABLE_SYMM_RIGHT
    #define BLIS_DISABLE_TRMM_RIGHT
    #define BLIS_DISABLE_TRMM3_RIGHT
  Also, updated the comments in the symm and hemm front-ends related to
  the first two macro guards, and added corresponding comments to the
  trmm and trmm3 front-ends for the latter two guards. (They all
  functionally do the same thing, just for their specific operations.)
  Thanks to Jeff Hammond for reporting the bugs that led me to this
  change (via #359).
- Updated config/old/haswellbb subconfiguration (used to debug issues
  related to duplicating B during packing) to register: a packing
  kernel for single-precision real; gemmbb ukernels for s, c, and z;
  trsmbb ukernels for s, c, and z; gemmtrsmbb virtual ukrnels for s, c
  and z; and to use non-default cache and register blocksizes for s, c,
  and z datatypes. Also declared prototypes for all of the gemmbb,
  trsmbb, and gemmtrsmbb ukernel functions within the
  bli_cntx_init_haswellbb() function. This should, once applied to the
  power9 configuration, fix the remaining issues in #359.
- Defined bli_spackm_6xk_bb4_ref(), which packs single reals with a
  duplication factor of 4. This function is defined in the same file as
  bli_dpackm_6xk_bb2_ref() (bli_packm_cxk_bb_ref.c).
2019-11-11 15:47:17 -06:00
Field G. Van Zee
0eb79ca850 Avoid unused variable warning in lread.c (#356).
Details:
- Replaced the line

    f = f;

  with

    ( void )f;

  for the unused variable 'f' in blastest/f2c/lread.c. (Hopefully)
  addresses issue #356, but since we don't use xlc who knows. Thanks
  to Jeff Hammond for reporting this.
2019-11-08 14:48:48 -06:00
Jérôme Duval
f377bb4485 Add Haiku to the known OS list (#361) 2019-11-07 16:39:29 -06:00
Field G. Van Zee
e29b1f9706 Fixed failing testsuite gemmtrsm_ukr for power9.
Details:
- Added code that fixes false failures in the gemmtrsm_ukr module of the
  testsuite. The tests were failing because the computation (bli_gemv())
  that performs the numerical check was not able to properly travserse
  the matrix operands bx1 and b11 that are views into the micropanel of
  B, which has duplicated/broadcast elements under the power9 subconfig.
  (For example, a micropanel of B with duplication factor of 2 needs to
  use a column stride of 2; previously, the column stride was being
  interpreted as 1.)
- Defined separate bli_obj_set_row_stride() and bli_obj_set_col_stride()
  static functions in bli_obj_macro_defs.h. (Previously, only the
  function bli_obj_set_strides() was defined. Amazing to think that we
  got this far without these former functions.)
- Updated/expounded upon comments.
2019-11-05 17:15:19 -06:00
Field G. Van Zee
49177a6b9a Fixed latent testsuite ukr module bugs for power9.
Details:
- Fixed a latent bug in the testsuite ukernel modules (gemm, trsm, and
  gemmtrsm) that only manifested once we began running with parameters
  that mimic those of power9. The problem was rooted in the way those
  modules were creating objects (and thus allocating memory) for the
  micropanel operands to the microkernel being tested. Since power9
  duplicates/broadcasts elements of B in memory, we needed an easy way
  of asking for more than one storage element per logical element in
  the matrix. I incorrectly expressed this as:

    bli_obj_create( datatype, k, n, ldbp, 1, &bp );

  The problem here is that bli_obj_create() is exceedingly efficient
  at calculating the size it passes to malloc() and doesn't allocate a
  full leading dimension's worth of elements for the last column (or
  row, in this example). This would normally not bother anyone since
  you're not supposed to access that memory anyway. But here, my
  attempted "hack" for getting extra elements was insufficient, and
  needed to be changed to:

    bli_obj_create( datatype, k, ldbp, ldbp, 1, &bp );

  That is, the extra elements needed to be baked into the dimensions of
  the matrix object in order to have the intended effect on the number
  of elements actually allocated. Thanks to Jeff Hammond for reporting
  this bug.
- Fixed a typically harmless memory leak in the aforementioned test
  modules (the objects for the packed micropanels were not being freed).
- Updated/expanded a common comment across all three ukr test modules.
2019-11-04 18:09:37 -06:00
Field G. Van Zee
c84391314d Reverted minor temp/wspace changes from b426f9e.
Details:
- Added missing license header to bli_pwr9_asm_macros_12x6.h.
- Reverted temporary changes to various files in 'test' and 'testsuite'
  directories.
- Moved testsuite/jobscripts into testsuite/old.
- Minor whitespace/comment changes across various files.
2019-11-04 13:57:12 -06:00
Jeff Hammond
4870260f6b blacklist GCC 5 and older for POWER9 (#360) 2019-11-04 13:55:47 -06:00
Nicholai Tukanov
b426f9e04e POWER9 DGEMM (#355)
Implemented and registered power9 dgemm ukernel.

Details:
- Implemented 12x6 dgemm microkernel for power9. This microkernel 
  assumes that elements of B have been duplicated/broadcast during the
  packing step. The microkernel uses a column orientation for its 
  microtile vector registers and thus implements column storage and 
  general stride IO cases. (A row storage IO case via in-register
  transposition may be added at a future date.) It should be noted that 
  we recommend using this microkernel with gcc and *not* xlc, as issues 
  with the latter cropped up during development, including but not 
  limited to slightly incompatible vector register mnemonics in the GNU 
  extended inline assembly clobber list.
2019-11-01 17:57:03 -05:00
Field G. Van Zee
58102aeaa2 Merge branch 'amd' 2019-10-28 17:58:31 -05:00
Field G. Van Zee
52059506b2 Added "How to Download BLIS" section to README.md.
Details:
- Added a new section to the README.md, just prior to the "Getting
  Started" section, titled "How to Download BLIS". This section details
  the user's options for obtaining BLIS and lays out four common ways
  of downloading the library. Thanks to Jeff Diamond for his feedback
  on this topic.
2019-10-23 15:26:42 -05:00
Field G. Van Zee
e6f0a96cc5 Updated README.md to ack Facebook as funder. 2019-10-14 17:05:39 -05:00
Field G. Van Zee
b9bc222bfc Call bli_syrk_small() before error checking.
Details:
- In bli_syrk_front(), moved the conditional call to bli_syrk_check()
  (if error checking is enabled) and the conditional scaling of C by
  beta (if alpha is zero) so that they occur after, instead of before,
  the call to bli_syrk_small(). This sequencing now matches that of
  bli_gemm_small() in bli_gemm_front() and bli_trsm_small() in
  bli_trsm_front().
2019-10-14 16:38:15 -05:00
Field G. Van Zee
f0959a81db When manual config is blacklisted, output error.
Details:
- Fixed and adjusted the logic in configure so that a more informative
  error message is output when a user runs './configure ... <conf>' and
  <conf> is present in the configuration blacklist. Previously, this
  particular set of conditions would result in the message:

    'user-specified configuration '' is NOT registered!

  That is, the error message mis-identified the targeted configuration
  as the empty string, and (more importantly) mis-identifies the
  problem. Thanks to Tze Meng Low for reporting this issue.
- Fixed a nearby error messages somewhat unrelated to the issue above.
  Specifically, the wrong string was being printed when the error
  message was identifying an auto-detected configuration that did not
  appear to be registered.
2019-10-14 15:46:28 -05:00
Field G. Van Zee
6218ac95a5 Merge branch 'master' into amd 2019-10-11 11:53:51 -05:00
Field G. Van Zee
0016d541e6 Changed -march=znver2 to =znver1 for clang on zen2.
Details:
- In config/zen2/make_defs.mk, changed the -march= flag so that
  -march=znver1 is used instead of -march=znver2 when CC_VENDOR is
  clang. (The gcc branch attempts to differentiate between various
  versions, but the equivalent version cutoffs for clang are not
  yet known by us, so we have to use a single flag for all versions
  of clang. Hopefully -march=znver1 is new enough. If not, we'll
  fall back to -march=bdver4 -mno-fma4 -mno-tbm -mno-xop -mno-lwp.)
  This issue was discovered thanks to AppVeyor.
2019-10-11 11:09:44 -05:00
Field G. Van Zee
e94a0530e5 Corrected zen NC that was non-multiple of NR.
Details:
- Updated an incorrectly set cache blocksize NC for single real within
  config/zen/bli_cntx_init_zen.c that was non a multiple of the
  corresponding value of NR. This issue, which was caught by Travis CI,
  was introduced in 29b0e1e.
2019-10-11 10:48:27 -05:00
Field G. Van Zee
a2ffac7520 Merge branch 'amd-master' into amd 2019-10-11 10:31:18 -05:00
Field G. Van Zee
29b0e1ef4e Code review + tweaks to AMD's AOCL 2.0 PR (#349).
Details:
- NOTE: This is a merge commit of 'master' of git://github.com/amd/blis
  into 'amd-master' of flame/blis.
- Fixed a bug in the downstream value of BLIS_NUM_ARCHS, which was
  inadvertantly not incremented when the Zen2 subconfiguration was
  added.
- In bli_gemm_front(), added a missing conditional constraint around the
  call to bli_gemm_small() that ensures that the computation precision
  of C matches the storage precision of C.
- In bli_syrk_front(), reorganized and relocated the notrans/trans logic
  that existed around the call to bli_syrk_small() into bli_syrk_small()
  to minimize the calling code footprint and also to bring that code
  into stylistic harmony with similar code in bli_gemm_front() and
  bli_trsm_front(). Also, replaced direct accessing of obj_t fields with
  proper accessor static functions (e.g. 'a->dim[0]' becomes
  'bli_obj_length( a )').
- Added #ifdef BLIS_ENABLE_SMALL_MATRIX guard around prototypes for
  bli_gemm_small(), bli_syrk_small(), and bli_trsm_small(). This is
  strictly speaking unnecessary, but it serves as a useful visual cue to
  those who may be reading the files.
- Removed cpp macro-protected small matrix debugging code from
  bli_trsm_front.c.
- Added a GCC_OT_9_1_0 variable to build/config.mk.in to facilitate gcc
  version check for availability of -march=znver2, and added appropriate
  support to configure script.
- Cleanups to compiler flags common to recent AMD microarchitectures in
  config/zen/amd_config.mk, including: removal of -march=znver1 et al.
  from CKVECFLAGS (since the -march flag is added within make_defs.mk);
  setting CRVECFLAGS similarly to CKVECFLAGS.
- Cleanups to config/zen/bli_cntx_init_zen.c.
- Cleanups, added comments to config/zen/make_defs.mk.
- Cleanups to config/zen2/make_defs.mk, including making use of newly-
  added GCC_OT_9_1_0 and existing GCC_OT_6_1_0 to choose the correct
  set of compiler flags based on the version of gcc being used.
- Reverted downstream changes to test/test_gemm.c.
- Various whitespace/comment changes.
2019-10-11 10:24:24 -05:00
Field G. Van Zee
a617301f93 Updates to docs/CodingConventions.md. 2019-10-08 17:14:05 -05:00
Field G. Van Zee
171f100691 Merge remote-tracking branch 'loveshack/emacs' 2019-10-04 11:18:23 -05:00