Commit Graph

1886 Commits

Author SHA1 Message Date
Field G. Van Zee
25eee3cc49 Added a dummy file to kernels/generic.
Details:
- Added a dummy file to kernels/generic, which was previously empty, so
  that git would begin tracking the otherwise-empty directory. This
  directory's existence is necessary for proper execution of configure
  for any configuration family that contains the 'generic'
  sub-configuration. Thanks to Johannes Dieterich for reporting the
  issue that led to this fix.
2017-11-21 12:34:20 -06:00
Field G. Van Zee
ef024ce4ca More tweaks to monolithify-header.sh
Details:
- Further fixes monolithify-header.sh script.
- Removed unnecessary #include "blis.h" from frame/3/bli_l3_packm.h.
2017-11-20 18:08:29 -06:00
Field G. Van Zee
5028e7dec2 Second attempt to implement travis_wait.
Details:
- Corrected accidental misplacement of the travis_wait prefix (on the
  wrong line of the .travis.yml file) in commit 13e5d91.
2017-11-20 17:00:37 -06:00
Field G. Van Zee
13e5d9107b Added travis_wait prefix to testsuite via Travis.
Details:
- It appears that Travis CL has implemented a new policy that results in
  a test failing if it does not produce any output for more than 10
  minutes. (Two test instances are now failing in Travis despite the most
  recent commit not affecting the library or testsuite.) This issue can
  be worked around by executing the test run via travis_wait, which takes
  an optional time parameter. This commit attempts to use 'travis_wait 30'
  in the .travis.yml file to prevent the early failure at 10 minutes.
2017-11-20 15:57:06 -06:00
Field G. Van Zee
a1caeba0ea Removed pnacl, emscripten support from Makefile. 2017-11-20 13:31:20 -06:00
praveeng
78199c539b Merge master code till 01-Nov-2017 to amd-staging
Change-Id: I40b53f876db84c8b947b3f2385c9b882245c6603
2017-11-20 15:52:00 +05:30
Field G. Van Zee
9df6dda9ec Improvements, bugfixes to monolithify-header.sh. 2017-11-18 19:03:26 -06:00
Field G. Van Zee
21d26201f9 Merge branch 'rt' of github.com:flame/blis into rt 2017-11-18 14:16:53 -06:00
Field G. Van Zee
43baa3b327 Removed unnecessary flags for generic config.
Details:
- Removed -D_POSIX_C_SOURCE=200112L and -m64 flags from make_defs.mk file
  of generic sub-configuration. These flags are generally not necessary,
  and particularly not desirable for the generic configuration since they
  unnecessarily restrict the environments in which the configuration can
  be built.
2017-11-18 14:14:44 -06:00
iotamudelta
b7ca580618 [WIP] Add x86 and x86_64 processor families. (#154)
* Add x86 and x86_64 processor families.
* Use generic config as fallback for more families.

After discussion with fgvanzee, a) it's "generic" and 2) use it for all the families as a fallback. Goal is that if a specific CPU is not yet supported by a family (say a new Intel microarchitecture on x86_64), it'll fall through to still work with the slower "generic" kernels
2017-11-18 13:56:05 -06:00
Field G. Van Zee
870597d166 Added bash script for creating monolithic headers.
Details:
- Added a new script, monolithify-header.sh, to the 'build' directory.
  This script recursively replaces all #include directives in a selected
  file with the contents of the header files referenced by each directive.
  The idea is to "flatten" a tree of .h files into a single file, with
  the script acting as a C preprocessor that only processes #include
  directives.
2017-11-17 17:06:42 -06:00
Field G. Van Zee
c76f77f4cc Removed unnecessary #include "blis.h" from header.
Details:
- Removed an errant #include "blis.h directive from bli_cntx_ind_stage.h.
  The generaly policy is that no header file in BLIS should include
  blis.h. This will be important in the near future when using a tool to
  recursively create a monolithic blis.h file from its consitutent
  headers.
2017-11-17 15:10:52 -06:00
Field G. Van Zee
2bb9bc6e95 Miscellaneous tweaks to gks, rt functionality.
Details:
- Updated bli_cpuid_query_id() so that BLIS_ARCH_GENERIC is always returned
  if the hardware fails to test positive for any supported sub-configuration.
- Defined bli_gks_init_ref_cntx(), which will call the context initialization
  function bli_cntx_init_configname() for the sub-configuration 'configname'
  associated with the arch_t id returned by bli_arch_query_id(). This makes
  initializing a reference context easy for experts who wish to construct
  those contexts.
2017-11-17 13:50:14 -06:00
Santanu Thangaraj
b3d8ab2ea0 Merge "Added AMD copyright line to the changed files in last 3 commits" into amd-staging 2017-11-15 01:33:12 -05:00
Nisanth M P
fe71c06e42 Added AMD copyright line to the changed files in last 3 commits
Change-Id: I37d5dbbbe1b199e07529610a5e9cc9e49d067c66
2017-11-15 11:11:17 +05:30
Field G. Van Zee
d5bf79e50b Miscellaneous tweaks and fixes.
Details:
- Fixed incorrect calling sequence in bli_cntx_init_knl.c--an instance of
  bli_blksz_init_easy() that should have been bli_blksz_init().
- Fixed a bug in code that is supposed to output the list of sub-directories
  in the 'config' directory when configure script is run with no arguments.
- Expanded the output of "make showconfig" to include more info from config.mk.
- Minor changes to build/auto-detect/cpuid_x86.c, mostly in preparation for
  someone to add excavator and zen support.
- Added a link to the ConfigurationHowTo wiki to config_registry.
- Other minor tweaks to configure.
2017-11-13 14:24:29 -06:00
Field G. Van Zee
673e518403 Merge branch 'rt' of github.com:flame/blis into rt 2017-11-01 17:37:42 -05:00
Field G. Van Zee
2c51356a8b Implemented runtime hardware detection via cpuid.
Details:
- Added runtime support for selecting an appropriate arch_t value based
  on the results of the cpuid instruction (for x86_64). This allows
  deferral of choosing a context (kernels, blocksizes, etc.) until
  runtime, which allows BLIS to be built with support for multiple
  microarchitectures. Currently, only amd64 and intel64 configurations
  are registered in the config_registry; however, one could create
  custom configuration families to support arbitrary sets of x86_64
  microarchitectures.
- Current Intel microarchitectures supported via cpuid are knl, haswell,
  sandybridge, and penryn.
- Current AMD microarchitectures supported via cpuid are: zen, excavator,
  steamroller, piledriver, and bulldozer.
2017-11-01 17:37:02 -05:00
Field G. Van Zee
ab57b97904 Revert to default SIMD alignment for bulldozer.
Details:
- Removed the default-overriding #define of BLIS_SIMD_ALIGN_SIZE set in
  config/bulldozer/bli_kernel.h. Not sure where this value came from, but
  it would seem to allow for insufficient starting address alignment for
  any matrices created via bli_malloc_user(), such as via
  bli_obj_create(). Thanks to Rene Sitt for reporting the behavior that
  led us to this bug.
- This commit is a manual patch of the same fix made to the 'rt' branch
  in 8f150f2.
2017-11-01 11:51:41 -05:00
Field G. Van Zee
8f150f28a6 Revert to default SIMD alignment for bulldozer.
Details:
- Removed the default-overriding #define of BLIS_SIMD_ALIGN_SIZE set in
  bli_family_bulldozer.h. Not sure where this value came from, but it
  would seem to allow for insufficient starting address alignment for
  any matrices created via bli_malloc_user(), such as via
  bli_obj_create(). Thanks to Rene Sitt for reporting the behavior that
  led us to this bug.
2017-11-01 11:41:45 -05:00
Field G. Van Zee
e3f10557ca Use perl for some substitution for OS X compatibility.
Details:
- Discovered that sed commands where the replacement string contains '\n'
  are problematic with the version of sed present in OS X. For these cases
  cases in the configure script, we instead use 'perl -pe' for
  search-and-replace functionality.
- Various other minor comment/whitespace tweaks to configure.
- Removed remaining lines of code related to setting/checking variables to
  track "unregistered" configurations.
2017-10-30 13:37:54 -05:00
Field G. Van Zee
dd45cfdfc3 Merge branch 'master' into rt 2017-10-30 12:23:05 -05:00
Devin Matthews
f60c827ba9 Fix CVECFLAGS for bulldozer config. 2017-10-30 10:04:42 -05:00
Field G. Van Zee
3e4f42a4d2 Typecast l1mkr_t enum value prior to comparison.
Details:
- Typecast l1mkr_t enum value in bli_cntx.h to guint_t before testing for
  out-of-range value. This is an attempt to pacify a strange warning from
  clang on OS X that is seemingly the result of the following compiler
  warning flag:
    -Wtautological-constant-out-of-range-compare
2017-10-27 11:41:37 -05:00
Field G. Van Zee
aec6e038d9 Removed associative arrays from configure.
Details:
- Implemented a replacement for associative arrays in the configure script
  that does not utilize arrays, and therefore works in pre-4.0 versions of
  bash. (It appears that Mac OS X will be stuck with version 3.2 indefinitely
  due to bash switching to the GPL 3.0 license starting with version 4.0.)
2017-10-26 16:12:36 -05:00
Santanu Thangaraj
189ffbb0d3 Merge changes Ie115b206,I7ce6cfa2,Iff59b6f4 into amd-staging
* changes:
  Adding __attribute__((constructor/destructor)) for CLANG case.
  Thread Safety: Move bli_init() before and bli_finalize() after main()
  Thread safety: Make the global induced method status array local to thread
2017-10-25 02:00:30 -04:00
Nisanth M P
3eb44f6761 Adding __attribute__((constructor/destructor)) for CLANG case.
CLANG supports __attribute__, but its documentation doesn't
mention support for constructor/destructor. Compiling with
clang and testing shows that it does support this.

Change-Id: Ie115b20634c26bda475cc09c20960d687fb7050b
2017-10-24 16:36:36 +05:30
Field G. Van Zee
07c352188b Added "generic" configuration.
Details:
- Added a "generic" configuration that leaves the default blocksizes and
  kernels unchanged. This replaces the older "reference" configuration.
  Updated auto-detect script and code accordingly.
- Added support for generic configuration to arch_t (bli_type_defs.h),
  bli_gks_init() (bli_gks.c), and bli_arch_config.h
- Moved bli_arch_query_id() to bli_arch.c (and prototype to bli_arch.h).
- Whitespace changes to configurations' make_defs.mk files.
2017-10-23 16:59:22 -05:00
Field G. Van Zee
c1a98d6f70 Minor update to .travis.yml file. 2017-10-23 14:24:41 -05:00
Field G. Van Zee
75b9383f01 Minor header renaming ahead of bli_arch.c.
Details:
- Renamed the various configurations' "bli_arch_<configname>.h" header files
  (replacing "arch" with "family") to free up the 'bli_arch' namespace for a
  different purpose (hardware detection).
- Renamed "bli_arch.h" and "bli_arch_pre_macro_defs.h" in frame/include to
  "bli_arch_config.h" and "bli_arch_config_pre.h", respectively.
2017-10-20 16:41:22 -05:00
Field G. Van Zee
482af51add Fixed 'make test' target from top-level Makefile.
Details:
- Updated the top-level Makefile's build rule for testsuite object files to
  properly obtain CFLAGS via get-frame-cflags-for() function instead of
  simply using the $(CFLAGS) variable (which is empty). This means that
  'make test' should now work as expected.
2017-10-20 15:44:26 -05:00
Field G. Van Zee
3c269f700d Makefile updates for test drivers, testsuite.
Details:
- Fixed semi-broken testsuite Makefile and very-broken test driver Makefiles,
  as well as those for test/3m4m, test/thread_ranges, and test/exec_sizes
  sub-directories.
- Factored out much of the top-level Makefile into common.mk. A Makefile
  needs only set DIST_PATH to the relative path to the top level of the
  BLIS source distribution before including common.mk in order to acquire
  all of the definitions typically needed in a Makefile that tests BLIS.
2017-10-20 13:57:21 -05:00
Field G. Van Zee
0557189d46 Minor updates to .travis.yml, configure script. 2017-10-18 15:05:27 -05:00
Field G. Van Zee
2553734d1d Merge branch 'master' into rt 2017-10-18 13:46:50 -05:00
Field G. Van Zee
375342799c Removed a duplicate bli_avx512_macros.h header.
Details:
- Removed a duplicate header file that was causing problems during
  installation for the 'knl' configuration. Thanks to Victor Eijkhout
  for reporting this issue.
2017-10-18 13:41:25 -05:00
Field G. Van Zee
453deb2906 Implemented runtime kernel management.
Details:
- Reworked the build system around a configuration registry file, named
  config_registry', that identifies valid configuration targets, their
  constituent sub-configurations, and the kernel sets that are needed by
  those sub-configurations. The build system now facilitates the building
  of a single library that can contains kernels and cache/register
  blocksizes for multiple configurations (microarchitectures). Reference
  kernels are also built on a per-configuration basis.
- Updated the Makefile to use new variables set by configure via the
  config.mk.in template, such as CONFIG_LIST, KERNEL_LIST, and KCONFIG_MAP,
  in determining which sub-configurations (CONFIG_LIST) and kernel sets
  (KERNEL_LIST) are included in the library, and which make_defs.mk files'
  CFLAGS (KCONFIG_MAP) are used when compiling kernels.
- Reorganized 'kernels' directory into a "flat" structure. Renamed kernel
  functions into a standard format that includes the kernel set name
  (e.g. 'haswell'). Created a "bli_kernels_<kernelset>.h" file in each
  kernels sub-directory. These files exist to provide prototypes for the
  kernels present in those directories.
- Reorganized reference kernels into a top-level 'ref_kernels' directory.
  This directory includes a new source file, bli_cntx_ref.c (compiled on
  a per-configuration basis), that defines the code needed to initialize
  a reference context and a context for induced methods for the
  microarchitecture in question.
- Rewrote make_defs.mk files in each configuration so that the compiler
  variables (e.g. CFLAGS) are "stored" (renamed) on a per-configuration
  basis.
- Modified bli_config.h.in template so that bli_config.h is generated with
  #defines for the config (family) name, the sub-configurations that are
  associated with the family, and the kernel sets needed by those
  sub-configurations.
- Deprecated all kernel-related information in bli_kernel.h and transferred
  what remains to new header files named "bli_arch_<configname>.h", which
  are conditionally #included from a new header bli_arch.h. These files
  are still needed to set library-wide parameters such as custom
  malloc()/free() functions or SIMD alignment values.
- Added bli_cntx_init_<configname>.c files to each configuration directory.
  The files contain a function, named the same as the file, that initializes
  a "native" context for a particular configuration (microarchitecture). The
  idea is that optimized kernels, if available, will be initialized into
  these contexts. Other fields will retain pointers to reference functions,
  which will be compiled on a per-configuration basis. These bli_cntx_init_*()
  functions will be called during the initialization of the global kernel
  structure. They are thought of as initializing for "native" execution, but
  they also form the basis for contexts that use induced methods. These
  functions are prototyped, along with their _ref() and _ind() brethren, by
  prototype-generating macros in bli_arch.h.
- Added a new typedef enum in bli_type_defs.h to define an arch_t, which
  identifies the various sub-configurations.
- Redesigned the global kernel structure (gks) around a 2D array of cntx_t
  structures (pointers to cntx_t, actually). The first dimension is indexed
  over arch_t and the inner dimension is the ind_t (induced method) for
  each microarchitecture. When a microarchitecture (configuration) is
  "registered" at init-time, the inner array for that configuration in the
  2D array is initialized (and allocated, if it hasn't been already). The
  cntx_t slot for BLIS_NAT is initialized immediately and those for other
  induced method types are initialized and cached on-demand, as needed. At
  cntx_t registration, we also store function pointers to cntx_init functions
  that will initialize (a) "reference" contexts and (b) contexts for use with
  induced methods. We don't cache the full contexts for reference contexts
  since they are rarely needed. The functions that initialize these two kinds
  of contexts are generated automatically for each targeted sub-configuration
  from cpp-templatized code at compile-time. Induced method contexts that
  need "stage" adjustments can still obtain them via functions in
  bli_cntx_ind_stage.c.
- Added new functions and functionality to bli_cntx.c, such as for setting
  the level-1f, level-1v, and packm kernels, and for converting a native
  context into one for executing an induced method.
- Moved the checking of register/cache blocksize consistency from being cpp
  macros in bli_kernel_macro_defs.h to being runtime checks defined in
  bli_check.c and called from bli_gks_register_cntx() at the time that the
  global kernel structure's internal context is initialized for a given
  microarchitecture/configuration.
- Deprecated all of the old per-operation bli_*_cntx.c files and removed
  the previous operation-level cntx_t_init()/_finalize() invocations.
  Instead, we now query the gks for a suitable context, usually via
  bli_gks_query_cntx().
- Deprecated support for the 3m2 and 3m3 induced methods. (They required
  hackery that I was no longer willing to support.)
- Consolidated the 1e and 1r packm kernels for any given register blocksize
  into a single kernel that will branch on the schema and support packing
  to both formats.
- Added the cntx_t* argument to all packm kernel signatures.
- Deprecated the local function pointer array in all bli_packm_cxk*.c files
  and instead obtain the packm kernel from the cntx_t.
- Added bli_calloc_intl(), which serves as the calloc-equivalent to to
  bli_malloc_intl(). Useful when we wish to allocate and initialize to
  zero/NULL.
- Converted existing cpp macro functions defined in bli_blksz.h, bli_func.h,
  bli_cntx.h into static functions.
2017-10-18 13:29:32 -05:00
Nisanth M P
4607aac297 Thread Safety: Move bli_init() before and bli_finalize() after main()
BLIS provides APIs to initialize and finalize its global context.
One application thread can finalize BLIS, while other threads
in the application are stil using BLIS.

This issue can be solved by removing bli_finalize() from API.
One way to do this is by getting bli_finalize() to execute by default
after application exits from main().

GCC supports this behaviour with the help of __attribute__((destructor))
added to the function that need to be executed after main exits.

Similarly bli_init() can be made to run before application enters main()
so that application need not call it.

Change-Id: I7ce6cfa28b384e92c0bdf772f3baea373fd9feac
2017-10-16 22:06:57 +05:30
Nisanth M P
0f5ce26fc5 Thread safety: Make the global induced method status array local to thread
BLIS retains a global status array for induced methods, and provides
APIs to modify this state during runtime. So, one application thread
can modify the state, before another starts the corresponding
BLIS operation.

This patch solves this issue by making the induced method status array
local to threads.

Change-Id: Iff59b6f473771344054c010b4eda51b7aa4317fe
2017-10-16 21:34:03 +05:30
Field G. Van Zee
b882648af8 Merge branch 'master' into rt 2017-10-11 16:32:21 -05:00
sthangar
06e0e6351a The inner loop paralleization is turned off by default, the JR and IR loop parameters are set to 1 by default
Change-Id: I8c3c2ecbbd636259f6ffb92768ec04148205c3e5
2017-09-28 12:15:36 +05:30
Field G. Van Zee
e02d3cb841 Fixed a pthread typo in previous commit.
Details:
- Misnamed 'pthread_mutex_t' type in bli_memsys.c as 'thread_mutex_t'.
2017-09-26 19:02:53 -05:00
Field G. Van Zee
f5962a1aae Fixed bugs in gemm/gemmtrsm ukr tests in testsuite.
Details:
- Fixed a bug in gemmtrsm test module that was due to improper partitioning
  into a k x k triangular matrix for the purposes of obtaining an mr x k
  micropanel of A with which to test.
- Fixed a bug in gemm and gemmtrsm test modules that would only manifest for
  very large k (depending on the product of mr x kc on that architecture).
  The bug arose from the fact that the test module was triggering the
  allocation of blocks from the internal memory pools, which are limited in
  size. This allocation imposes an implicit assumption that the micro-
  panel being tested with will fit inside, and this assumption is violated
  for large values of k. Arbitrarily large k may now be tested for both
  operation tests.
- Added OpenMP/pthread critical sections around the setting or getting of
  statuses from the induced method operation lookup table in bli_l3_ind.c.
- Added the 'static' keyword to all pthread_mutex_t global variables in BLIS.
- Thanks to Nisanth Padinharepatt of AMD for reporting the first and third
  issues.
2017-09-26 17:00:04 -05:00
Field G. Van Zee
8e917b256c Updated bibtex info for BLIS5 (3m4m) article. 2017-09-09 14:10:15 -05:00
Nisanth M P
7be8870573 Merging "Adding auto hardware detection for Zen"
Change-Id: Id450fb0c4f91a5cd5cbdc06970f4f9ed28dd8520
2017-09-07 19:49:00 +05:30
sthangar
e056d810d1 Bug fix for the testsuite build failing
Change-Id: I7cd8c9d187387c48b2564e45cbfb8df985e93d77
2017-08-29 12:50:40 +05:30
Kiran Varaganti
83796b7caf Merge "Adding auto hardware detection for Zen" into amd-staging 2017-08-28 05:23:28 -04:00
sthangar
d1ee776202 Adding auto hardware detection for Zen
Change-Id: I40ce6705dd66b35000c4ccddffad1c5b65998caf
2017-08-28 14:11:29 +05:30
praveeng
8176f4e438 resolving conflicts bli_gemm_front.c and LICENCE
Change-Id: Id24ce53896d4c1c7ceccc3e004014a0ecceb5474
2017-08-28 12:21:16 +05:30
Nisanth M P
57e1e5cd51 Merge AMD authored changes 2017-08-22 17:07:44 +05:30
Devin Matthews
adafe974b4 Merge pull request #150 from devinamatthews/vzeroupper
Add vzeroupper to Intel AVX kernels.
2017-08-15 15:17:21 -05:00