Commit Graph

886 Commits

Author SHA1 Message Date
Field G. Van Zee
244a6f4e66 Fixed POSIX sed non-compliance in flatten-header.sh.
Details:
- Changed GNU usage of 'i' and 'a' sed commands used in flatten-header.sh
  to POSIX-compliant usage that will work on OS X's sed.
2017-11-28 17:48:48 -06:00
Field G. Van Zee
4507862167 Generate/compile with/install monolithic blis.h.
Details:
- Rewrote monolithify-header.sh (and renamed to flatten-header.sh) so that
  headers are inserted recursively. This improves performance by a factor
  of 3-4x.
- Modified configure to create an 'include/<configname>' directory in which
  make can create a monolithic header.
- Modified the top-level Makefile so that a monolithic header is generated
  unconditionally prior to compilation (stored in include/<configname>) and
  so that the single header is installed instead of the 450 or so header
  files that reside throughout the framework source tree.
- Added "include/*/*.h" to .gitignore file.
- Removed some pnacl/emscripten leftovers that I intended to include in
  a1caeba (mostly in testsuite/Makefile).
- Trivial comment changes to frame/include/bli_f2c.h.
2017-11-28 15:16:22 -06:00
Field G. Van Zee
1f30b1301b Added missing framework support for x86_64 family.
Details:
- Added support for the x86_64 configuration family to bli_arch.c and
  bli_arch_config.h. Thanks to Johannes Dieterich for reporting this
  issue.
- Bumped the default value for BLIS_SIMD_NUM_REGISTERS from 16 to 32 and
  the default value for BLIS_SIMD_SIZE from 32 to 64. This will support
  configuration families that include Skylake and newer processors without
  any supported needed in the bli_family_*.h file. The semantics of these
  values have always been "maximum" and not exact values; comments in
  bli_kernel_macro_defs.h and the github wiki have been adjusted
  accordingly.
2017-11-25 16:54:26 -06:00
Field G. Van Zee
9f39806c4e Fixed a bug in e31f0b3/b131b9a.
Details:
- Erroneously placed the "don't overwrite existing blocksize" logic in
  bli_blksz_init*() rather than in bli_cntx_set_blkszs(). It belongs in
  the latter because that function copies blocksizes as-is from the
  blksz_t function argument to the appropriate field in the cntx_t. If
  the blksz_t was previously initialized selectively, based on the sign
  of the blocksize value passed into bli_blksz_init*(), that just leaves
  some fields possibly uninitialized (with garbage values), which
  definitely will not work.
- The aforementioned logic has been moved to bli_cntx_set_blkszs() via
  a new function bli_blksz_copy_if_pos(), which selectively copies only
  the blocksizes that are greater than zero.
2017-11-21 16:03:56 -06:00
Field G. Van Zee
b131b9a025 Updated configs to omit setting some blocksizes.
Details:
- Employ the new semantics of bli_blksz_init*() in e31f0b3 in various
  sub-configurations' bli_cntx_init_*() functions by passing in 0 for
  register and cache blocksizes that correpond to gemm microkernel
  datatypes that were not registered, allowing the default values
  set by the bli_cntx_init_*_ref() function call to remain.
2017-11-21 14:30:26 -06:00
Field G. Van Zee
499a4c002f Merge branch 'rt' of github.com:flame/blis into rt 2017-11-21 14:25:08 -06:00
Field G. Van Zee
e31f0b3e2d Subtle update to bli_blksz_init*() API.
Details:
- Updated the semantics of bli_blksz_init() and bli_blksz_init_ed() so
  that non-positive blocksize values are ignored entirely. This provides
  an easy way to indicate that certain existing values should not be
  touched by the update. Thanks to Devangi Parikh for feedback that led
  to these changes.
2017-11-21 14:21:25 -06:00
Field G. Van Zee
6c3ba502a1 Added 'x86_64' sub-config directory.
Details:
- Added missing x86_64 configuration directory, which was intended to be
  part of b7ca580.
- Added -Wfatal-errors compiler warning flag to all configurations so that
  compilation stops after the first error.
- Changed the vectorization flags for intel64 configuration to be compatible
  with 'penryn', the oldest sub-config included in that family.
- Changed the vectorization flags for penryn to target the 'core2'
  microarchitecture and ssse3.
2017-11-21 13:50:53 -06:00
Field G. Van Zee
25eee3cc49 Added a dummy file to kernels/generic.
Details:
- Added a dummy file to kernels/generic, which was previously empty, so
  that git would begin tracking the otherwise-empty directory. This
  directory's existence is necessary for proper execution of configure
  for any configuration family that contains the 'generic'
  sub-configuration. Thanks to Johannes Dieterich for reporting the
  issue that led to this fix.
2017-11-21 12:34:20 -06:00
Field G. Van Zee
ef024ce4ca More tweaks to monolithify-header.sh
Details:
- Further fixes monolithify-header.sh script.
- Removed unnecessary #include "blis.h" from frame/3/bli_l3_packm.h.
2017-11-20 18:08:29 -06:00
Field G. Van Zee
5028e7dec2 Second attempt to implement travis_wait.
Details:
- Corrected accidental misplacement of the travis_wait prefix (on the
  wrong line of the .travis.yml file) in commit 13e5d91.
2017-11-20 17:00:37 -06:00
Field G. Van Zee
13e5d9107b Added travis_wait prefix to testsuite via Travis.
Details:
- It appears that Travis CL has implemented a new policy that results in
  a test failing if it does not produce any output for more than 10
  minutes. (Two test instances are now failing in Travis despite the most
  recent commit not affecting the library or testsuite.) This issue can
  be worked around by executing the test run via travis_wait, which takes
  an optional time parameter. This commit attempts to use 'travis_wait 30'
  in the .travis.yml file to prevent the early failure at 10 minutes.
2017-11-20 15:57:06 -06:00
Field G. Van Zee
a1caeba0ea Removed pnacl, emscripten support from Makefile. 2017-11-20 13:31:20 -06:00
Field G. Van Zee
9df6dda9ec Improvements, bugfixes to monolithify-header.sh. 2017-11-18 19:03:26 -06:00
Field G. Van Zee
21d26201f9 Merge branch 'rt' of github.com:flame/blis into rt 2017-11-18 14:16:53 -06:00
Field G. Van Zee
43baa3b327 Removed unnecessary flags for generic config.
Details:
- Removed -D_POSIX_C_SOURCE=200112L and -m64 flags from make_defs.mk file
  of generic sub-configuration. These flags are generally not necessary,
  and particularly not desirable for the generic configuration since they
  unnecessarily restrict the environments in which the configuration can
  be built.
2017-11-18 14:14:44 -06:00
iotamudelta
b7ca580618 [WIP] Add x86 and x86_64 processor families. (#154)
* Add x86 and x86_64 processor families.
* Use generic config as fallback for more families.

After discussion with fgvanzee, a) it's "generic" and 2) use it for all the families as a fallback. Goal is that if a specific CPU is not yet supported by a family (say a new Intel microarchitecture on x86_64), it'll fall through to still work with the slower "generic" kernels
2017-11-18 13:56:05 -06:00
Field G. Van Zee
870597d166 Added bash script for creating monolithic headers.
Details:
- Added a new script, monolithify-header.sh, to the 'build' directory.
  This script recursively replaces all #include directives in a selected
  file with the contents of the header files referenced by each directive.
  The idea is to "flatten" a tree of .h files into a single file, with
  the script acting as a C preprocessor that only processes #include
  directives.
2017-11-17 17:06:42 -06:00
Field G. Van Zee
c76f77f4cc Removed unnecessary #include "blis.h" from header.
Details:
- Removed an errant #include "blis.h directive from bli_cntx_ind_stage.h.
  The generaly policy is that no header file in BLIS should include
  blis.h. This will be important in the near future when using a tool to
  recursively create a monolithic blis.h file from its consitutent
  headers.
2017-11-17 15:10:52 -06:00
Field G. Van Zee
2bb9bc6e95 Miscellaneous tweaks to gks, rt functionality.
Details:
- Updated bli_cpuid_query_id() so that BLIS_ARCH_GENERIC is always returned
  if the hardware fails to test positive for any supported sub-configuration.
- Defined bli_gks_init_ref_cntx(), which will call the context initialization
  function bli_cntx_init_configname() for the sub-configuration 'configname'
  associated with the arch_t id returned by bli_arch_query_id(). This makes
  initializing a reference context easy for experts who wish to construct
  those contexts.
2017-11-17 13:50:14 -06:00
Field G. Van Zee
d5bf79e50b Miscellaneous tweaks and fixes.
Details:
- Fixed incorrect calling sequence in bli_cntx_init_knl.c--an instance of
  bli_blksz_init_easy() that should have been bli_blksz_init().
- Fixed a bug in code that is supposed to output the list of sub-directories
  in the 'config' directory when configure script is run with no arguments.
- Expanded the output of "make showconfig" to include more info from config.mk.
- Minor changes to build/auto-detect/cpuid_x86.c, mostly in preparation for
  someone to add excavator and zen support.
- Added a link to the ConfigurationHowTo wiki to config_registry.
- Other minor tweaks to configure.
2017-11-13 14:24:29 -06:00
Field G. Van Zee
673e518403 Merge branch 'rt' of github.com:flame/blis into rt 2017-11-01 17:37:42 -05:00
Field G. Van Zee
2c51356a8b Implemented runtime hardware detection via cpuid.
Details:
- Added runtime support for selecting an appropriate arch_t value based
  on the results of the cpuid instruction (for x86_64). This allows
  deferral of choosing a context (kernels, blocksizes, etc.) until
  runtime, which allows BLIS to be built with support for multiple
  microarchitectures. Currently, only amd64 and intel64 configurations
  are registered in the config_registry; however, one could create
  custom configuration families to support arbitrary sets of x86_64
  microarchitectures.
- Current Intel microarchitectures supported via cpuid are knl, haswell,
  sandybridge, and penryn.
- Current AMD microarchitectures supported via cpuid are: zen, excavator,
  steamroller, piledriver, and bulldozer.
2017-11-01 17:37:02 -05:00
Field G. Van Zee
8f150f28a6 Revert to default SIMD alignment for bulldozer.
Details:
- Removed the default-overriding #define of BLIS_SIMD_ALIGN_SIZE set in
  bli_family_bulldozer.h. Not sure where this value came from, but it
  would seem to allow for insufficient starting address alignment for
  any matrices created via bli_malloc_user(), such as via
  bli_obj_create(). Thanks to Rene Sitt for reporting the behavior that
  led us to this bug.
2017-11-01 11:41:45 -05:00
Field G. Van Zee
e3f10557ca Use perl for some substitution for OS X compatibility.
Details:
- Discovered that sed commands where the replacement string contains '\n'
  are problematic with the version of sed present in OS X. For these cases
  cases in the configure script, we instead use 'perl -pe' for
  search-and-replace functionality.
- Various other minor comment/whitespace tweaks to configure.
- Removed remaining lines of code related to setting/checking variables to
  track "unregistered" configurations.
2017-10-30 13:37:54 -05:00
Field G. Van Zee
dd45cfdfc3 Merge branch 'master' into rt 2017-10-30 12:23:05 -05:00
Devin Matthews
f60c827ba9 Fix CVECFLAGS for bulldozer config. 2017-10-30 10:04:42 -05:00
Field G. Van Zee
3e4f42a4d2 Typecast l1mkr_t enum value prior to comparison.
Details:
- Typecast l1mkr_t enum value in bli_cntx.h to guint_t before testing for
  out-of-range value. This is an attempt to pacify a strange warning from
  clang on OS X that is seemingly the result of the following compiler
  warning flag:
    -Wtautological-constant-out-of-range-compare
2017-10-27 11:41:37 -05:00
Field G. Van Zee
aec6e038d9 Removed associative arrays from configure.
Details:
- Implemented a replacement for associative arrays in the configure script
  that does not utilize arrays, and therefore works in pre-4.0 versions of
  bash. (It appears that Mac OS X will be stuck with version 3.2 indefinitely
  due to bash switching to the GPL 3.0 license starting with version 4.0.)
2017-10-26 16:12:36 -05:00
Field G. Van Zee
07c352188b Added "generic" configuration.
Details:
- Added a "generic" configuration that leaves the default blocksizes and
  kernels unchanged. This replaces the older "reference" configuration.
  Updated auto-detect script and code accordingly.
- Added support for generic configuration to arch_t (bli_type_defs.h),
  bli_gks_init() (bli_gks.c), and bli_arch_config.h
- Moved bli_arch_query_id() to bli_arch.c (and prototype to bli_arch.h).
- Whitespace changes to configurations' make_defs.mk files.
2017-10-23 16:59:22 -05:00
Field G. Van Zee
c1a98d6f70 Minor update to .travis.yml file. 2017-10-23 14:24:41 -05:00
Field G. Van Zee
75b9383f01 Minor header renaming ahead of bli_arch.c.
Details:
- Renamed the various configurations' "bli_arch_<configname>.h" header files
  (replacing "arch" with "family") to free up the 'bli_arch' namespace for a
  different purpose (hardware detection).
- Renamed "bli_arch.h" and "bli_arch_pre_macro_defs.h" in frame/include to
  "bli_arch_config.h" and "bli_arch_config_pre.h", respectively.
2017-10-20 16:41:22 -05:00
Field G. Van Zee
482af51add Fixed 'make test' target from top-level Makefile.
Details:
- Updated the top-level Makefile's build rule for testsuite object files to
  properly obtain CFLAGS via get-frame-cflags-for() function instead of
  simply using the $(CFLAGS) variable (which is empty). This means that
  'make test' should now work as expected.
2017-10-20 15:44:26 -05:00
Field G. Van Zee
3c269f700d Makefile updates for test drivers, testsuite.
Details:
- Fixed semi-broken testsuite Makefile and very-broken test driver Makefiles,
  as well as those for test/3m4m, test/thread_ranges, and test/exec_sizes
  sub-directories.
- Factored out much of the top-level Makefile into common.mk. A Makefile
  needs only set DIST_PATH to the relative path to the top level of the
  BLIS source distribution before including common.mk in order to acquire
  all of the definitions typically needed in a Makefile that tests BLIS.
2017-10-20 13:57:21 -05:00
Field G. Van Zee
0557189d46 Minor updates to .travis.yml, configure script. 2017-10-18 15:05:27 -05:00
Field G. Van Zee
2553734d1d Merge branch 'master' into rt 2017-10-18 13:46:50 -05:00
Field G. Van Zee
375342799c Removed a duplicate bli_avx512_macros.h header.
Details:
- Removed a duplicate header file that was causing problems during
  installation for the 'knl' configuration. Thanks to Victor Eijkhout
  for reporting this issue.
2017-10-18 13:41:25 -05:00
Field G. Van Zee
453deb2906 Implemented runtime kernel management.
Details:
- Reworked the build system around a configuration registry file, named
  config_registry', that identifies valid configuration targets, their
  constituent sub-configurations, and the kernel sets that are needed by
  those sub-configurations. The build system now facilitates the building
  of a single library that can contains kernels and cache/register
  blocksizes for multiple configurations (microarchitectures). Reference
  kernels are also built on a per-configuration basis.
- Updated the Makefile to use new variables set by configure via the
  config.mk.in template, such as CONFIG_LIST, KERNEL_LIST, and KCONFIG_MAP,
  in determining which sub-configurations (CONFIG_LIST) and kernel sets
  (KERNEL_LIST) are included in the library, and which make_defs.mk files'
  CFLAGS (KCONFIG_MAP) are used when compiling kernels.
- Reorganized 'kernels' directory into a "flat" structure. Renamed kernel
  functions into a standard format that includes the kernel set name
  (e.g. 'haswell'). Created a "bli_kernels_<kernelset>.h" file in each
  kernels sub-directory. These files exist to provide prototypes for the
  kernels present in those directories.
- Reorganized reference kernels into a top-level 'ref_kernels' directory.
  This directory includes a new source file, bli_cntx_ref.c (compiled on
  a per-configuration basis), that defines the code needed to initialize
  a reference context and a context for induced methods for the
  microarchitecture in question.
- Rewrote make_defs.mk files in each configuration so that the compiler
  variables (e.g. CFLAGS) are "stored" (renamed) on a per-configuration
  basis.
- Modified bli_config.h.in template so that bli_config.h is generated with
  #defines for the config (family) name, the sub-configurations that are
  associated with the family, and the kernel sets needed by those
  sub-configurations.
- Deprecated all kernel-related information in bli_kernel.h and transferred
  what remains to new header files named "bli_arch_<configname>.h", which
  are conditionally #included from a new header bli_arch.h. These files
  are still needed to set library-wide parameters such as custom
  malloc()/free() functions or SIMD alignment values.
- Added bli_cntx_init_<configname>.c files to each configuration directory.
  The files contain a function, named the same as the file, that initializes
  a "native" context for a particular configuration (microarchitecture). The
  idea is that optimized kernels, if available, will be initialized into
  these contexts. Other fields will retain pointers to reference functions,
  which will be compiled on a per-configuration basis. These bli_cntx_init_*()
  functions will be called during the initialization of the global kernel
  structure. They are thought of as initializing for "native" execution, but
  they also form the basis for contexts that use induced methods. These
  functions are prototyped, along with their _ref() and _ind() brethren, by
  prototype-generating macros in bli_arch.h.
- Added a new typedef enum in bli_type_defs.h to define an arch_t, which
  identifies the various sub-configurations.
- Redesigned the global kernel structure (gks) around a 2D array of cntx_t
  structures (pointers to cntx_t, actually). The first dimension is indexed
  over arch_t and the inner dimension is the ind_t (induced method) for
  each microarchitecture. When a microarchitecture (configuration) is
  "registered" at init-time, the inner array for that configuration in the
  2D array is initialized (and allocated, if it hasn't been already). The
  cntx_t slot for BLIS_NAT is initialized immediately and those for other
  induced method types are initialized and cached on-demand, as needed. At
  cntx_t registration, we also store function pointers to cntx_init functions
  that will initialize (a) "reference" contexts and (b) contexts for use with
  induced methods. We don't cache the full contexts for reference contexts
  since they are rarely needed. The functions that initialize these two kinds
  of contexts are generated automatically for each targeted sub-configuration
  from cpp-templatized code at compile-time. Induced method contexts that
  need "stage" adjustments can still obtain them via functions in
  bli_cntx_ind_stage.c.
- Added new functions and functionality to bli_cntx.c, such as for setting
  the level-1f, level-1v, and packm kernels, and for converting a native
  context into one for executing an induced method.
- Moved the checking of register/cache blocksize consistency from being cpp
  macros in bli_kernel_macro_defs.h to being runtime checks defined in
  bli_check.c and called from bli_gks_register_cntx() at the time that the
  global kernel structure's internal context is initialized for a given
  microarchitecture/configuration.
- Deprecated all of the old per-operation bli_*_cntx.c files and removed
  the previous operation-level cntx_t_init()/_finalize() invocations.
  Instead, we now query the gks for a suitable context, usually via
  bli_gks_query_cntx().
- Deprecated support for the 3m2 and 3m3 induced methods. (They required
  hackery that I was no longer willing to support.)
- Consolidated the 1e and 1r packm kernels for any given register blocksize
  into a single kernel that will branch on the schema and support packing
  to both formats.
- Added the cntx_t* argument to all packm kernel signatures.
- Deprecated the local function pointer array in all bli_packm_cxk*.c files
  and instead obtain the packm kernel from the cntx_t.
- Added bli_calloc_intl(), which serves as the calloc-equivalent to to
  bli_malloc_intl(). Useful when we wish to allocate and initialize to
  zero/NULL.
- Converted existing cpp macro functions defined in bli_blksz.h, bli_func.h,
  bli_cntx.h into static functions.
2017-10-18 13:29:32 -05:00
Field G. Van Zee
b882648af8 Merge branch 'master' into rt 2017-10-11 16:32:21 -05:00
Field G. Van Zee
e02d3cb841 Fixed a pthread typo in previous commit.
Details:
- Misnamed 'pthread_mutex_t' type in bli_memsys.c as 'thread_mutex_t'.
2017-09-26 19:02:53 -05:00
Field G. Van Zee
f5962a1aae Fixed bugs in gemm/gemmtrsm ukr tests in testsuite.
Details:
- Fixed a bug in gemmtrsm test module that was due to improper partitioning
  into a k x k triangular matrix for the purposes of obtaining an mr x k
  micropanel of A with which to test.
- Fixed a bug in gemm and gemmtrsm test modules that would only manifest for
  very large k (depending on the product of mr x kc on that architecture).
  The bug arose from the fact that the test module was triggering the
  allocation of blocks from the internal memory pools, which are limited in
  size. This allocation imposes an implicit assumption that the micro-
  panel being tested with will fit inside, and this assumption is violated
  for large values of k. Arbitrarily large k may now be tested for both
  operation tests.
- Added OpenMP/pthread critical sections around the setting or getting of
  statuses from the induced method operation lookup table in bli_l3_ind.c.
- Added the 'static' keyword to all pthread_mutex_t global variables in BLIS.
- Thanks to Nisanth Padinharepatt of AMD for reporting the first and third
  issues.
2017-09-26 17:00:04 -05:00
Field G. Van Zee
8e917b256c Updated bibtex info for BLIS5 (3m4m) article. 2017-09-09 14:10:15 -05:00
Devin Matthews
adafe974b4 Merge pull request #150 from devinamatthews/vzeroupper
Add vzeroupper to Intel AVX kernels.
2017-08-15 15:17:21 -05:00
Devin Matthews
7dc78b49f9 Add vzeroupper to Intel AVX kernels. 2017-08-15 10:02:25 -05:00
Field G. Van Zee
f86ce54d6f Removed trailing enum commas from bli_type_defs.h.
Details:
- Removed trailing commas from enums in bli_type_defs.h. Thanks to
  Erling Andersen for pointing out this inconsistency and suggesting
  the change.
2017-08-10 16:24:28 -05:00
Field G. Van Zee
60a1eeb231 Added edge handling to _determine_blocksize_b().
Details:
- Added explicit handling of situations where i == dim to
  bli_determine_blocksize_b_sub(). This isn't actually needed by any
  current use case within BLIS, but handling the situation is nonetheless
  prudent. Thanks to Minh Quan for reporting this issue and requesting
  the fix.
2017-08-05 13:04:31 -05:00
Field G. Van Zee
b01c808299 Fixed a minor bug in level-3 packm management.
Details:
- Fixed a bug in bli_l3_packm() that caused cntl_t-cached packed mem_t
  entries to be released and then re-acquired unnecessarily. (In essence,
  the "<" operands in the conditional that guards the
  release-and-reacquire code block simply needed to be swapped.) The bug
  should have only affected performance (rather than the computed result).
  Thanks to Minh Quan for identifying and reporting the bug.
2017-08-04 14:17:44 -05:00
Field G. Van Zee
8b379069fc Merge branch 'master' into rt 2017-08-01 15:30:40 -05:00
Devin Matthews
05925dd5d3 Merge pull request #146 from devinamatthews/master
Change lsame_ signature to match lapacke.
2017-08-01 09:31:01 -05:00
Devin Matthews
cecdc05d28 Change lsame_ signature to match lapacke. 2017-07-31 15:19:51 -05:00