Commit Graph

1589 Commits

Author SHA1 Message Date
Kiran Varaganti
f5ed95ecd7 Merged BLIS Release 1.3
Modified config/zen/make_defs.mk, now CKVECFLAGS     := -mavx2 -mfpmath=sse -mfma -march=znver1

Change-Id: Ia0942d285a21447cd0c470de1bc021fe63e80d81
2019-03-05 15:03:57 +05:30
praveeng
b06244d98c Merge branch 'ut-austin-amd' of ssh://git.amd.com:29418/cpulibraries/er/blis into ut-austin-amd 2019-02-21 12:56:15 +05:30
praveeng
e938ff08ce deleted test.txt
Change-Id: I3871f5fe76e548bc29ec2733745b29964e829dd3
2019-02-21 12:49:16 +05:30
mkv
ed13ad465d added test file for initial commit 2019-02-21 12:49:16 +05:30
praveeng
4c7e668083 deleted test.txt
Change-Id: I3871f5fe76e548bc29ec2733745b29964e829dd3
2019-02-21 12:44:38 +05:30
mkv
95e070581c added test file for initial commit 2019-02-21 01:04:16 -05:00
Field G. Van Zee
6b83273126 Generalized ref kernels' pragma omp simd usage.
Details:
- Replaced direct usage of _Pragma( "omp simd" ) in reference kernels
  with PRAGMA_SIMD, which is defined as a function of the compiler being
  used in a new bli_pragma_macro_defs.h file. That definition is cleared
  when BLIS detects that the -fopenmp-simd command line option is
  unsupported. Thanks to Devin Matthews and Jeff Hammond for suggestions
  that guided this commit.
- Updated configure and bli_config.h.in so that the appropriate anchor
  is substituted in (when the corresponding pragma omp simd support is
  present).
2019-02-12 16:01:28 -06:00
Field G. Van Zee
b1f5ce8622 Minor updates to scripts in test/mixeddt/matlab. 2019-02-05 17:38:50 -06:00
Devangi N. Parikh
38203ecd15 Added thunderx2 system in the mixeddt test scripts
Details:
 - Added thunderx2 (tx2) as a system in the runme.sh in test/mixeddt
2019-02-04 15:28:28 -05:00
Devangi N. Parikh
dfc91843ea Fixed gcc flags for thunderx2 subconfiguration
Details:
- Fixed -march flag. Thunderx2 is an armv8.1a architecture not armv8a.
2019-02-04 15:23:40 -05:00
Field G. Van Zee
c665eb9b88 Minor updates to docs, Makefiles.
Details:
- Changed all occurrances of
    micro-kernel -> microkernel
    macro-kernel -> macrokernel
    micro-panel  -> micropanel
  in all markdown documents in 'docs' directory. This change is being
  made since we've reached the point in adoption and acceptance of
  BLIS's insights where words such as "microkernel" are no longer new,
  and therefore now merit being unhyphenated.
- Updated "Implementation Notes" sections of KernelsHowTo.md, which
  still contained references to nonexistent cpp macros such as
  BLIS_DEFAULT_MR_? and BLIS_PACKDIM_MR_?.
- Added 'run-fast' and 'check-fast' targets to testsuite/Makefile.
- Minor updates to Testsuite.md, including suggesting use of
  'make check' and 'make check-fast' when running from the local
  testsuite directory.
- Added a comment to top-level Makefile explaining the purpose behind
  the TESTSUITE_WRAPPER variable, which at first glance appears to serve
  no purpose.
2019-01-28 16:22:23 -06:00
M. Zhou
1aa280d052 Amend OS detection for kFreeBSD. (#295) 2019-01-27 15:40:48 -06:00
Field G. Van Zee
fffc23bb35 CREDITS file update. 2019-01-25 13:35:31 -06:00
Field G. Van Zee
26c5cf495c Fixed bug in skx subconfig related to bdd46f9.
Details:
- Fixed code in the skx subconfiguration that became a bug after
  committing bdd46f9. Specifically, the bli_cntx_init_skx() function
  was overwriting default blocksizes for the scomplex and dcomplex
  microkernels despite the fact that only single and double real
  microkernels were being registered. This was not a problem prior to
  bdd46f9 since all microkernels used dynamically-queried (at runtime)
  register blocksizes for loop bounds. However, post-bdd46f9, this
  became a bug because the reference ukernels for scomplex and dcomplex
  were written with their register blocksizes hard-coded as constant
  loop bounds, which conflicted the the erroneous scomplex and dcomplex
  values that bli_cntx_init_skx() was setting in the context. The
  lesson here is that going forward, all subconfigurations must not set
  any blocksizes for datatypes corresponding to default/reference
  microkernels. (Note that a blocksize is left unchanged by the
  bli_cntx_set_blkszs() function if it was set to -1.)
2019-01-24 18:49:31 -06:00
Field G. Van Zee
180f8e42e1 Fixed undefined behavior trsm ukr bug in bdd46f9.
Details:
- Fixed a bug that mainfested anytime a configuration was used in which
  optimized microkernels were registered and the trsm operation (or
  kernel) was invoked. The bug resulted from the optimized microkernels'
  register blocksizes conflicting with the hard-coded values--expressed
  in the form of constant loop bounds--used in the new reference trsm
  ukernels that were introduced in bdd46f9. The fix was easy: reverting
  back to the implementation that uses variable-bound loops, which
  amounted to changing an #if 0 to #if 1 (since I preserved the older
  implementation in the file alongside the new code based on constant-
  bound loops). It should be noted that this fix must be permanent,
  since the trsm kernel code with constant-bound loops can never work
  with gemm ukernels that use different register blocksizes.
2019-01-24 18:01:15 -06:00
Field G. Van Zee
bdd46f9ee8 Rewrote reference kernels to use #pragma omp simd.
Details:
- Rewrote level-1v, -1f, and -3 reference kernels in terms of simplified
  indexing annotated by the #pragma omp simd directive, which a compiler
  can use to vectorize certain constant-bounded loops. (The new kernels
  actually use _Pragma("omp simd") since the kernels are defined via
  templatizing macros.) Modest speedup was observed in most cases using
  gcc 5.4.0, which may improve with newer versions. Thanks to Devin
  Matthews for suggesting this via issue #286 and #259.
- Updated default blocksizes defined in ref_kernels/bli_cntx_ref.c to
  be 4x16, 4x8, 4x8, and 4x4 for single, double, scomplex and dcomplex,
  respectively, with a default row preference for the gemm ukernel. Also
  updated axpyf, dotxf, and dotxaxpyf fusing factors to 8, 6, and 4,
  respectively, for all datatypes.
- Modified configure to verify that -fopenmp-simd is a valid compiler
  option (via a new detect/omp_simd/omp_simd_detect.c file).
- Added a new header in which prefetch macros are defined according to
  which compiler is detected (via macros such as __GNUC__). These
  prefetch macros are not yet employed anywhere, though.
- Updated the year in copyrights of template license headers in
  build/templates and removed AMD as a default copyright holder.
2019-01-24 17:23:18 -06:00
Field G. Van Zee
63de2b0090 Prevent redef of ftnlen in blastest f2c_types.h.
Details:
- Guard typedef of ftnlen in f2c_types.h with a #ifndef HAVE_BLIS_H
  directive to prevent the redefinition of that type. Thanks to Jeff
  Diamond for reporting this compiler warning (and apologies for the
  delay in committing a fix).
2019-01-23 12:16:27 -06:00
Field G. Van Zee
eec2e183a7 Added escaping to '/' in os_name in configure.
Details:
- Add os_name to the list of variables into which the '/' character is
  escaped. This is meant to address (or at least make progress toward
  addressing) #293. Thanks to Isuru Fernando for spotting this as the
  potential fix, and also thanks to M. Zhou for the original report.
2019-01-21 12:12:18 -06:00
Field G. Van Zee
adf5c17f08 Formally registered thunderx2 subconfiguration.
Details:
- Added a separate subconfiguration for thunderx2, which now uses
  different optimization flags than cortexa57/cortexa53.
2019-01-18 15:14:45 -06:00
M. Zhou
094cfdf7df Port BLIS to GNU Hurd OS. (#294)
Prevent blis.h from misidentifying Hurd as OSX.
2019-01-18 12:46:13 -06:00
Field G. Van Zee
5d7d616e8e README.md update re: mixeddt TOMS paper. 2019-01-15 20:52:51 -06:00
Field G. Van Zee
58c7fb4788 Added more matlab scripts for mixeddt paper.
Details:
- Added a variant set of matlab scripts geared to producing plots that
  reflect performance data gathered with and without extra memory
  optimizations enabled. These scripts reside (for now) in
  test/mixeddt/matlab/wawoxmem.
2019-01-08 17:00:27 -06:00
Field G. Van Zee
34286eb914 Minor update to docs/HardwareSupport.md. 2019-01-08 11:41:20 -06:00
Field G. Van Zee
108b04dc5b Regenerated symbols in build/libblis-symbols.def.
Details:
- Reran ./build/regen-symbols.sh after running
  'configure --enable-cblas auto' to reflect removal of
  bli_malloc_pool() and bli_free_pool().
2019-01-07 20:16:31 -06:00
Field G. Van Zee
706cbd9d56 Minor tweaks/cleanups to bli_malloc.c, _apool.c.
Details:
- Removed malloc_ft and free_ft function pointer arguments from the
  interface to bli_apool_init() after deciding that there is no need to
  specify the malloc()/free() for blocks within the apool. (The apool
  blocks are actually just array_t structs.) Instead, we simply call
  bli_malloc_intl()/_free_intl() directly. This has the added benefit
  of allowing additional output when memory tracing is enabled via
  --enable-mem-tracing. Also made corresponding changes elsewhere in
  the apool API.
- Changed the inner pools (elements of the array_t within the apool_t)
  to use BLIS_MALLOC_POOL and BLIS_FREE_POOL instead of BLIS_MALLOC_INTL
  and BLIS_FREE_INTL.
- Disabled definitions of bli_malloc_pool() and bli_free_pool() since
  there are no longer any consumers of these functions.
- Very minor comment / printf() updates.
2019-01-07 18:28:19 -06:00
Minh Quan Ho
579145039d Initialize error messages at compile time (#289)
* Initialize error messages at compile time

- Assigning strings directly to the bli_error_string array, instead of
snprintf() at execution-time.

* Retired bli_error_init(), _finalize().

Details:
- Removed functions obviated by changes in 80e8dc6: bli_error_init(),
  bli_error_finalize(), and bli_error_init_msgs(), as well as calls to
  the former two in bli_init.c.

* Regenerated symbols in build/libblis-symbols.def.

Details:
- Reran ./build/regen-symbols.sh after running
  'configure --enable-cblas auto'.
2019-01-07 16:00:15 -06:00
Field G. Van Zee
aafbca086e Updated external package language in README.md.
Details:
- Updated/added comments about Fedora, OpenSUSE, and GNU Guix under the
  newly-renamed "External GNU/Linux packages" section. Thanks to Dave
  Love for providing these revisions.
2019-01-07 12:38:21 -06:00
Field G. Van Zee
daacfe6840 Allow running configure with python 3.4.
Details:
- Relax version blacklisting of python3 to allow 3.4 or later instead
  of 3.5 or later. Thanks to Dave Love for pointing out that 3.4 was
  sufficient for the purpose of BLIS's build system. (It should be
  noted that we're not sure which, if any, python3 versions prior to
  3.4 are insufficient, and that the only thing stopping us from
  determining this is the fact that these earlier versions of python3
  are not readily available for us to test with.)
- Updated docs/BuildSystem.md to be explicit about current python2 vs
  python3 version requirements.
2019-01-07 12:12:47 -06:00
Field G. Van Zee
ad8d9adb09 README.md, CREDITS update.
Details:
- Added "What's New" and "What People Are Saying About BLIS" sections to
  README.md.
- Added missing github handles to various individuals' entries in the
  CREDITS file.
2019-01-03 16:08:24 -06:00
Field G. Van Zee
7052fca5ae Apply f272c289 to bli_fmalloc_noalign().
Details:
- Perform the same check for NULL return values and error message output
  in bli_fmalloc_noalign() as is performed by bli_fmalloc_align(). (This
  change was intended for f272c289.)
2019-01-02 13:48:40 -06:00
Field G. Van Zee
528e3ad16a Merge branch 'amd' 2019-01-02 13:39:19 -06:00
Field G. Van Zee
3126c52ea7 Merge branch 'amd' 2019-01-02 13:37:37 -06:00
Field G. Van Zee
f272c2899a Add error message to malloc() check for NULL.
Details:
- Output an error message if and when the malloc()-equivalent called by
  bli_fmalloc_align() ever returns NULL. Everything was already in place
  for this to happen, including the error return code, the error string
  sprintf(), the error checking function bli_check_valid_malloc_buf()
  definition, and its prototype. Thanks to Minh Quan Ho for pointing out
  the missing error message.
- Increased the default block_ptrs_len for each inner pool stored in the
  small block allocator from 10 to 25. Under normal execution, each
  thread uses only 21 blocks, so this change will prevent the sba from
  needing to resize the block_ptrs array of any given inner pool as
  threads initially populate the pool with small blocks upon first
  execution of a level-3 operation.
- Nix stray newline echo in configure.
2019-01-02 12:34:15 -06:00
Field G. Van Zee
eb97f778a1 Added missing AMD copyrights to previous commit.
Details:
- Forgot to add AMD copyrights to several touched files that did not
  already have them in 2f31743.
2018-12-25 20:17:09 -06:00
Field G. Van Zee
2f3174330f Implemented a pool-based small block allocator.
Details:
- Implemented a sophisticated data structure and set of APIs that track
  the small blocks of memory (around 80-100 bytes each) used when
  creating nodes for control and thread trees (cntl_t and thrinfo_t) as
  well as thread communicators (thrcomm_t). The purpose of the small
  block allocator, or sba, is to allow the library to transition into a
  runtime state in which it does not perform any calls to malloc() or
  free() during normal execution of level-3 operations, regardless of
  the threading environment (potentially multiple application threads
  as well as multiple BLIS threads). The functionality relies on a new
  data structure, apool_t, which is (roughly speaking) a pool of
  arrays, where each array element is a pool of small blocks. The outer
  pool, which is protected by a mutex, provides separate arrays for each
  application thread while the arrays each handle multiple BLIS threads
  for any given application thread. The design minimizes the potential
  for lock contention, as only concurrent application threads would
  need to fight for the apool_t lock, and only if they happen to begin
  their level-3 operations at precisely the same time. Thanks to Kiran
  Varaganti and AMD for requesting this feature.
- Added a configure option to disable the sba pools, which are enabled
  by default; renamed the --[dis|en]able-packbuf-pools option to
  --[dis|en]able-pba-pools; and rewrote the --help text associated with
  this new option and consolidated it with the --help text for the
  option associated with the sba (--[dis|en]able-sba-pools).
- Moved the membrk field from the cntx_t to the rntm_t. We now pass in
  a rntm_t* to the bli_membrk_acquire() and _release() APIs, just as we
  do for bli_sba_acquire() and _release().
- Replaced all calls to bli_malloc_intl() and bli_free_intl() that are
  used for small blocks with calls to bli_sba_acquire(), which takes a
  rntm (in addition to the bytes requested), and bli_sba_release().
  These latter two functions reduce to the former two when the sba pools
  are disabled at configure-time.
- Added rntm_t* arguments to various cntl_t and thrinfo_t functions, as
  required by the new usage of bli_sba_acquire() and _release().
- Moved the freeing of "old" blocks (those allocated prior to a change
  in the block_size) from bli_membrk_acquire_m() to the implementation
  of the pool_t checkout function.
- Miscellaneous improvements to the pool_t API.
- Added a block_size field to the pblk_t.
- Harmonized the way that the trsm_ukr testsuite module performs packing
  relative to that of gemmtrsm_ukr, in part to avoid the need to create
  a packm control tree node, which now requires a rntm_t that has been
  initialized with an sba and membrk.
- Re-enable explicit call bli_finalize() in testsuite so that users who
  run the testsuite with memory tracing enabled can check for memory
  leaks.
- Manually imported the compact/minor changes from 61441b24 that cause
  the rntm to be copied locally when it is passed in via one of the
  expert APIs.
- Reordered parameters to various bli_thrcomm_*() functions so that the
  thrcomm_t* to the comm being modified is last, not first.
- Added more descriptive tracing for allocating/freeing small blocks and
  formalized via a new configure option: --[dis|en]able-mem-tracing.
- Moved some unused scalm code and headers into frame/1m/other.
- Whitespace changes to bli_pthread.c.
- Regenerated build/libblis-symbols.def.
2018-12-25 19:35:01 -06:00
Field G. Van Zee
61441b24f3 Make local copy of user's rntm_t in level-3 ops.
Details:
- In the case that the caller passes in a non-NULL rntm_t pointer into
  one of the expert APIs for a level-3 operation (e.g. bli_gemm_ex()),
  make a local copy of the rntm_t and use the address of that local copy
  in all subsequent execution (which may change the contents of the
  rntm_t). This prevents a potentially confusing situation whereby a
  user-initialized rntm_t is used once (in, say, gemm), and then found
  by the user to be in a different state before it is used a second
  time.
2018-12-20 19:38:11 -06:00
Field G. Van Zee
e809b5d2f1 Merge branch 'master' into amd 2018-12-20 16:27:26 -06:00
Field G. Van Zee
0476f706b9 CHANGELOG update (0.5.1) 2018-12-18 14:56:20 -06:00
Field G. Van Zee
e0408c3ca3 Version file update (0.5.1) 0.5.1 2018-12-18 14:56:16 -06:00
Field G. Van Zee
3ab231afc9 ReleaseNotes.md update in advance of next version.
Details:
- Updated ReleaseNotes.md in preparation for next version.
2018-12-18 14:53:37 -06:00
Field G. Van Zee
d1aa87164e README.md update (External packages section).
Details:
- Updated External packages section in anticipation of introducing BLIS
  into Debian package universe. Thanks to M. Zhou for sponsoring BLIS in
  Debian.
2018-12-18 14:52:40 -06:00
Field G. Van Zee
d2b2a0819a Removed stray sections from Multithreading.md.
Details:
- Removed unintended section headers from before table of contents.
2018-12-17 19:26:35 -06:00
Field G. Van Zee
93d56319f2 Added missing bli_init_once() in bli_thread API.
Details:
- Fixed an issue with specifying threading globally at runtime via
  bli_thread_set_num_threads() (the automatic way) or via
  bli_thread_set_ways() (the manual way), with bli_thread_init_rntm()
  also affected. These functions were not calling bli_init_once() prior
  to acting, and therefore their effects on the global rntm_t structure
  were being wiped out by the eventual call to bli_init_once(), by some
  other BLIS function. Thanks to Ali Emre Gülcü for reporting the
  behavior associated with this bug.
- Added additional content to docs/Multithreading.md covering topics of
  choosing between OpenMP and pthreads, and specifying affinity via
  OpenMP.
- CREDITS file update.
2018-12-17 19:17:30 -06:00
Field G. Van Zee
76016691e2 Improvements to bli_pool; malloc()/free() tracing.
Details:
- Added malloc_ft and free_ft fields to pool_t, which are provided when
  the pool is initialized, to allow bli_pool_alloc_block() and
  bli_pool_free_block() to call bli_fmalloc_align()/bli_ffree_align()
  with arbitrary align_size values (according to how the pool_t was
  initialized).
- Added a block_ptrs_len argument to bli_pool_init(), which allows the
  caller to specify an initial length for the block_ptrs array, which
  previously suffered the cost of being reallocated, copied, and freed
  each time a new block was added to the pool.
- Consolidated the "buf_sys" and "buf_align" pointer fields in pblk_t
  into a single "buf" field. Consolidated the bli_pblk API accordingly
  and also updated the bli_mem API implementation. This was done
  because I'd previously already implemented opaque alignment via
  bli_malloc_align(), which allocates extra space and stores the
  original pointer returned by malloc() one element before the element
  whose address is aligned.
- Tweaked bli_membrk_acquire_m() and bli_membrk_release() to call
  bli_fmalloc_align() and bli_ffree_align(), which required adding an
  align_size field to the membrk_t struct.
- Pass the pack schemas directly into bli_l3_cntl_create_if() rather
  than transmit them via objects for A and B.
- Simplified bli_l3_cntl_free_if() and renamed to bli_l3_cntl_free().
  The function had not been conditionally freeing control trees for
  quite some time. Also, removed obj_t* parameters since they aren't
  needed anymore (or never were).
- Spun-off OpenMP nesting code in bli_l3_thread_decorator() to a
  separate function, bli_l3_thread_decorator_thread_check().
- Renamed:
    bli_malloc_align()   -> bli_fmalloc_align()
    bli_free_align()     -> bli_ffree_align()
    bli_malloc_noalign() -> bli_fmalloc_noalign()
    bli_free_noalign()   -> bli_ffree_noalign()
  The 'f' is for "function" since they each take a malloc_ft or free_ft
  function pointer argument.
- Inserted various printf() calls for the purposes of tracing memory
  allocation and freeing, guarded by cpp macro ENABLE_MEM_DEBUG, which,
  for now, is intended to be a "hidden" feature rather than one hooked
  up to a configure-time option.
- Defined bli_rntm_equals(), which compares two rntm_t for equality.
  (There are no use cases for this function yet, but there may be soon.)
- Whitespace changes to function parameter lists in bli_pool.c, .h.
2018-12-13 17:23:09 -06:00
Field G. Van Zee
f808d829c5 Handle edge cases, zero-filling in packm kernels.
Details:
- Updated the API and semantics of packm kernels such that they must now
  handle edge cases, meaning that a c-by-k packm kernel must be able to
  pack edge cases that are fewer than c rows/columns and be able to
  zero-fill the remaining elements. They must also be able to zero-fill
  the equivalent region when copying fewer than k columns/rows (which is
  needed by trsm). The new packm kernel API is generally:

    void packm_kernel
         (
           conj_t           conja,
           dim_t            cdim,
           dim_t            n,
           dim_t            n_max,
           ctype*  restrict kappa,
           ctype*  restrict a, inc_t inca, inc_t lda,
           ctype*  restrict p,             inc_t ldp,
           cntx_t* restrict cntx
         );

  where cdim and n are the dimensions (short and long, respectively) of
  the submatrix being copied from the source matrix A, and n_max is the
  "full" long dimension (corresponding to the k dimension in gemm) of
  the micropanel. The "full" short dimension (corresponding to the
  register blocksize MR or NR) is not part of the API because it is
  known intrinsically by the packm kernel implementation. Thanks to
  Devin Matthews for prompting us to make this change (#282).
- Updated all reference packm kernels in ref_kernels/1m according to
  above changes, as well as all optimized packm kernels (which only
  consisted of those for knl).
- Bumped the major soname version number in 'so_version' to 2. At first
  I was considering leaving it unchanged, but I couldn't escape the
  reality that the packm kernel API is much closer to an expert API
  than it is some obscure helper function interface within the framework
  that nobody would ever notice.
- Removed reference packm kernels for mr/nr = 30. The only sub-config
  that would have been using those kernels is knc, which is likely no
  longer being used by very many people (if any). (This also mostly
  offset the larger object code footprint incurred by moving the edge-
  case handling into the individual packm kernels.)
- Fixed an obscure race condition for 3mh and 4mh induced methods in
  which those implementations were modifying the contexts stored in the
  gks rather than a local copy.
- Fixed a minor bug in the testsuite that prevented non-1m-based induced
  method implementations of trsm from executing.
2018-12-12 15:22:59 -06:00
Field G. Van Zee
02ec0be3ba Merge branch 'master' into amd 2018-12-05 19:33:53 -06:00
Field G. Van Zee
c534da62c0 Disabled ARM configuration families in registry.
Details:
- Disabled (commented out) the arm32 and arm64 configuration families
  in the config_registry file. Having a configuration family registered
  only makes sense if BLIS is currently outfitted with runtime hardware
  detection logic to choose the appropriate sub-configuration. That
  logic is currently missing for ARM architectures, and thus having the
  ARM configuration families in the configuration registry only serves
  to confuse people. Thanks to Devangi Parikh for suggesting this
  change.
2018-12-05 15:51:05 -06:00
Field G. Van Zee
6885051a16 Generalizations/cleanup to mixeddt matlab scripts.
Details:
- Parameterized, reorganized, and added comments to matlab scripts in
  test/mixeddt/matlab.
- Reordered some lines of code and added comments to plot_l3_perf.m in
  test/3m4m/matlab.
2018-12-05 14:45:39 -06:00
Field G. Van Zee
cbdb0566bf Updates to 3m4m, mixeddt test driver files.
Details:
- Updated 3m4m and mixeddt Makefiles and runme.sh scripts, mostly to
  port recent changes to the former to the latter.
- Disabled (for now) code in 3m4m/test_*.c files that disables all
  induced methods except for the one that is requested from the
  Makefile via the IND macro. This is done because usually, we want to
  test whatever method is enabled automatically for complex datatypes.
  (That is, when native complex microkernels are missing, we usually
  want to test performance of 1m.)
2018-12-05 20:06:32 +00:00
Field G. Van Zee
0645f239fb Remove UT-Austin from copyright headers' clause 3.
Details:
- Removed explicit reference to The University of Texas at Austin in the
  third clause of the license comment blocks of all relevant files and
  replaced it with a more all-encompassing "copyright holder(s)".
- Removed duplicate words ("derived") from a few kernels' license
  comment blocks.
- Homogenized license comment block in kernels/zen/3/bli_gemm_small.c
  with format of all other comment blocks.
2018-12-04 14:31:06 -06:00