Commit Graph

623 Commits

Author SHA1 Message Date
Field G. Van Zee
4136553f0d Clear level-3 cntx_t's via memset() before use.
Details:
- In all level-3 operations' _cntx_init() functions, replaced calls to
  bli_cntx_obj_init() with calls to bli_cntx_obj_clear(), and in all
  level-3 operations' _cntx_finalize() functions, removed calls to
  bli_cntx_obj_finalize(), leaving those function definitions empty.
- Changed the definition of bli_cntx_obj_clear() so that the clearing
  occurs via a single call to memset().
2016-04-22 11:53:53 -05:00
Field G. Van Zee
dd0ab1d93f Converted some bli_cntx query functions to macros.
Details:
- Commented out several datatype-aware query functions (those ending in
  _dt) from bli_cntx.c, as well as their prototypes in bli_cntx.h, and
  added equivalent cpp query macros to bli_cntx.h.
- Added 'bli_config.h' to .gitignore.
2016-04-20 14:38:23 -05:00
Field G. Van Zee
a30ccbc4c6 Merge pull request #66 from devinamatthews/blas-configure
Add configure options and generate bli_config.h automatically.
2016-04-19 15:04:33 -05:00
Field G. Van Zee
eb2f18e484 More compile-time fixes to bgq gemm ukernel code. 2016-04-19 12:50:32 -05:00
Devin Matthews
0e1a9821d8 Add configure options and generate bli_config.h automatically.
Options to configure have been added for:
- Setting the internal BLIS and BLAS/CBLAS integer sizes.
- Enabling and disabling the BLAS and CBLAS layers.

Additionally, configure options which require defining macros (the above plus the threading model), write their macros to the automatically-generated bli_config.h file in the top-level build directory. The old bli_config.h files in the config dirs were removed, and any kernel-related macros (SIMD size and alignment etc.) were moved to bli_kernel.h. The Makefiles were also modified to find the new bli_config.h file.

Lastly, support for OMP in clang has been added (closes #56).
2016-04-19 11:44:37 -05:00
Field G. Van Zee
ff84469a45 Applied various compilation fixes to bgq kernels. 2016-04-18 12:29:09 -05:00
Tyler Michael Smith
cbcd0b739d Changing ifdef for OSX pthread barriers 2016-04-18 03:12:57 -05:00
Field G. Van Zee
dd62080cea Compile-time fix to bgq l1f kernels.
Details:
- Fixed an old reference to bli_daxpyf_fusefac, which no longer exists,
  by replacing it with the axpyf fusing factor (8), and cleaned up the
  relevant section of config/bgq/bli_kernel.h.
- Removed most of the details of the level-3 kernels from the template
  kernel code in config/template/kernels/3 and replaced it with a
  reference to the relevant kernel wiki maintained on the BLIS github
  website.
2016-04-15 11:15:41 -05:00
Field G. Van Zee
d5a915dd8d Merge branch 'master' of github.com:flame/blis 2016-04-14 12:56:36 -05:00
Field G. Van Zee
4320b725a1 Use kernel CFLAGS on "ukernels" directories.
Details:
- Updated the top-level Makefile so that the CFLAGS variable designated
  for kernel source code is applied not only to source code in
  directories named "kernels" but source code in any directory that
  contains the substring "kernels", such as "ukernels".
- Formally disabled some code in gen-make-frag.sh script that was already
  effectively disabled. The code was related to handling "noopt" and
  "kernel" directories, which is now handled independently within the
  top-level Makefile without needing to place these source files into
  a spearate makefile variable.
2016-04-14 12:51:29 -05:00
Tyler Smith
41694675e4 pthreads bugfixes
Getting pthreads to work on my Mac
Implemented a pthread barrier when _POSIX_BARRIER isn't defined
Now spawn n-1 threads instead of n threads so that master thread isn't just spinning the whole time
Add -lpthread instead of -pthread to LDFLAGS (for clang)
2016-04-13 15:51:08 -05:00
Field G. Van Zee
f756dbfa0d Removed stale #include from bgq configuration.
Details:
- Removed an old #include statement ("bli_gemm_8x8.h") from the
  bli_kernel.h file in the bgq configuration. It turns out this
  file was no longer needed even prior to 537a1f4.
2016-04-13 11:25:33 -05:00
Field G. Van Zee
0bd4169ea7 Fixed context-broken dunnington/penryn kernels.
Details:
- Added missing context parameters to several instances where simpler
  kernels, or reference kernels, are called instead of executing the
  main body code contained in the kernel function in question.
- Renamed axpyv and dotv kernel files to use "opt" instead of "int"
  substring, for consistency with level-1f kernels.
2016-04-11 18:08:32 -05:00
Field G. Van Zee
7912af5db4 CHANGELOG update (0.2.0) 2016-04-11 17:32:13 -05:00
Field G. Van Zee
898614a555 Version file update (0.2.0) 0.2.0 2016-04-11 17:32:09 -05:00
Field G. Van Zee
537a1f4f85 Implemented runtime contexts and reorganized code.
Details:
- Retrofitted a new data structure, known as a context, into virtually
  all internal APIs for computational operations in BLIS. The structure
  is now present within the type-aware APIs, as well as many supporting
  utility functions that require information stored in the context. User-
  level object APIs were unaffected and continue to be "context-free,"
  however, these APIs were duplicated/mirrored so that "context-aware"
  APIs now also exist, differentiated with an "_ex" suffix (for "expert").
  These new context-aware object APIs (along with the lower-level, type-
  aware, BLAS-like APIs) contain the the address of a context as a last
  parameter, after all other operands. Contexts, or specifically, cntx_t
  object pointers, are passed all the way down the function stack into
  the kernels and allow the code at any level to query information about
  the runtime, such as kernel addresses and blocksizes, in a thread-
  friendly manner--that is, one that allows thread-safety, even if the
  original source of the information stored in the context changes at
  run-time; see next bullet for more on this "original source" of info).
  (Special thanks go to Lee Killough for suggesting the use of this kind
  of data structure in discussions that transpired during the early
  planning stages of BLIS, and also for suggesting such a perfectly
  appropriate name.)
- Added a new API, in frame/base/bli_gks.c, to define a "global kernel
  structure" (gks). This data structure and API will allow the caller to
  initialize a context with the kernel addresses, blocksizes, and other
  information associated with the currently active kernel configuration.
  The currently active kernel configuration within the gks cannot be
  changed (for now), and is initialized with the traditional cpp macros
  that define kernel function names, blocksizes, and the like. However,
  in the future, the gks API will be expanded to allow runtime management
  of kernels and runtime parameters. The most obvious application of this
  new infrastructure is the runtime detection of hardware (and the
  implied selection of appropriate kernels). With contexts in place,
  kernels may even be "hot swapped" at runtime within the gks. Once
  execution enters a level-3 _front() function, the memory allocator will
  be reinitialized on-the-fly, if necessary, to accommodate the new
  kernels' blocksizes. If another application thread is executing with
  another (previously loaded) kernel, it will finish in a deterministic
  fashion because its kernel information was loaded into its context
  before computation began, and also because the blocks it checked out
  from the internal memory pools will be unaffected by the newer threads'
  reinitialization of the allocator.
- Reorganized and streamlined the 'ind' directory, which contains much of
  the code enabling use of induced methods for complex domain matrix
  multiplication; deprecated bli_bsv_query.c and bli_ukr_query.c, as
  those APIs' functionality is now mostly subsumed within the global
  kernel structure.
- Updated bli_pool.c to define a new function, bli_pool_reinit_if(),
  that will reinitialize a memory pool if the necessary pool block size
  has increased.
- Updated bli_mem.c to use bli_pool_reinit_if() instead of
  bli_pool_reinit() in the definition of bli_mem_pool_init(), and placed
  usage of contexts where appropriate to communicate cache and register
  blocksizes to bli_mem_compute_pool_block_sizes().
- Simplified control trees now that much of the information resides in
  the context and/or the global kernel structure:
  - Removed blocksize object pointers (blksz_t*) fields from all control
    tree node definitions and replaced them with blocksize id (bszid_t)
    values instead, which may be passed into a context query routine in
    order to extract the corresponding blocksize from the given context.
  - Removed micro-kernel function pointers (func_t*) fields from all
    control tree node definitions. Now, any code that needs these function
    pointers can query them from the local context, as identified by a
    level-3 micro-kernel id (l3ukr_t), level-1f kernel id, (l1fkr_t), or
    level-1v kernel id (l1vkr_t).
  - Removed blksz_t object creation and initialization, as well as kernel
    function object creation and initialization, from all operation-
    specific control tree initialization files (bli_*_cntl.c), since this
    information will now live in the gks and, secondarily, in the context.
- Removed blocksize multiples from blksz_t objects. Now, we track
  blocksize multiples for each blocksize id (bszid_t) in the context
  object.
- Removed the bool_t's that were required when a func_t was initialized.
  These bools are meant to allow one to track the micro-kernel's storage
  preferences (by rows or columns). This preference is now tracked
  separately within the gks and contexts.
- Merged and reorganized many separate-but-related functions into single
  files. This reorganization affects frame/0, 1, 1d, 1m, 1f, 2, 3, and
  util directories, but has the most obvious effect of allowing BLIS
  to compile noticeably faster.
- Reorganized execution paths for level-1v, -1d, -1m, and -2 operations
  in an attempt to reduce overhead for memory-bound operations. This
  includes removal of default use of object-based variants for level-2
  operations. Now, by default, level-2 operations will directly call a
  low-level (non-object based) loop over a level-1v or -1f kernel.
- Converted many common query functions in blk_blksz.c (renamed from
  bli_blocksize.c) and bli_func.c into cpp macros, now defined in their
  respective header files.
- Defined bli_mbool.c API to create and query "multi-bools", or
  heterogeneous bool_t's (one for each floating-point datatype), in the
  same spirit as blksz_t and func_t.
- Introduced two key parameters of the hardware: BLIS_SIMD_NUM_REGISTERS
  and BLIS_SIMD_SIZE. These values are needed in order to compute a third
  new parameter, which may be set indirectly via the aforementioned
  macros or directly: BLIS_STACK_BUF_MAX_SIZE. This value is used to
  statically allocate memory in macro-kernels and the induced methods'
  virtual kernels to be used as temporary space to hold a single
  micro-tile. These values are now output by the testsuite. The default
  value of BLIS_STACK_BUF_MAX_SIZE is computed as
  "2 * BLIS_SIMD_NUM_REGISTERS * BLIS_SIMD_SIZE".
- Cleaned up top-level 'kernels' directory (for example, renaming the
  embarrassingly misleading "avx" and "avx2" directories to "sandybridge"
  and "haswell," respectively, and gave more consistent and meaningful
  names to many kernel files (as well as updating their interfaces to
  conform to the new context-aware kernel APIs).
- Updated the testsuite to query blocksizes from a locally-initialized
  context for test modules that need those values: axpyf, dotxf,
  dotxaxpyf, gemm_ukr, gemmtrsm_ukr, and trsm_ukr.
- Reformatted many function signatures into a standard format that will
  more easily facilitate future API-wide changes.
- Updated many "mxn" level-0 macros (ie: those used to inline double loops
  for level-1m-like operations on small matrices) in frame/include/level0
  to use more obscure local variable names in an effort to avoid variable
  shaddowing. (Thanks to Devin Matthews for pointing these gcc warnings,
  which are only output using -Wshadow.)
- Added a conj argument to setm, so that its interface now mirrors that
  of scalm. The semantic meaning of the conj argument is to optionally
  allow implicit conjugation of the scalar prior to being populated into
  the object.
- Deprecated all type-aware mixed domain and mixed precision APIs. Note
  that this does not preclude supporting mixed types via the object APIs,
  where it produces absolutely zero API code bloat.
2016-04-11 17:21:28 -05:00
Field G. Van Zee
d1f8e5d9b2 Merge pull request #60 from esauvage/master
sgemm µkernel for bulldozer : bug correction for k%4 != 0
2016-04-05 12:21:27 -05:00
Etienne Sauvage
c11d28eed8 cgemm µkernel for bulldozer : bug correction for k%4 != 0 2016-04-02 21:15:48 +02:00
Field G. Van Zee
20af937b57 Merge pull request #59 from devinamatthews/fix_testsuite_makefile
Fix testsuite makefile
2016-03-31 14:37:30 -05:00
Devin Matthews
fc61a1143e Fix formatting in configure. 2016-03-31 10:53:01 -05:00
Devin Matthews
26379b14de Adjust paths in common.mk to support building from testsuite dir. 2016-03-31 10:45:48 -05:00
Field G. Van Zee
36c3abb05f Merge pull request #58 from esauvage/master
cgemm & zgemm micro-kernels for FMA4 instruction set (bulldozer confi…
2016-03-31 10:26:17 -05:00
Devin Matthews
356d854fc9 Make symlink to common.mk in build directory. 2016-03-30 16:33:15 -05:00
Devin Matthews
edbb847004 Refactor out some definitions which moved from make_defs.mk to Makefile for use in testsuite Makefile. 2016-03-30 16:27:11 -05:00
Etienne Sauvage
917ce75482 cgemm & zgemm micro-kernels for FMA4 instruction set (bulldozer configuration), based on x86_64/avx micro-kernel 2016-03-30 22:03:09 +02:00
Field G. Van Zee
64b41fa554 Merge pull request #54 from devinamatthews/more_config_opts
More config opts
2016-03-29 15:19:41 -05:00
Field G. Van Zee
1b09e343df Updated gcc version from 4.8 to 4.9 in .travis.yml. 2016-03-29 12:55:28 -05:00
Devin Matthews
0171ad5899 Add icc and clang support for Intel architectures, fixes #47. 2bd036f fixes #49 BTW. 2016-03-28 13:55:06 -05:00
Field G. Van Zee
3090fff64c Merge pull request #44 from esauvage/master
sgemm micro-kernel for FMA4 instruction set
2016-03-28 12:36:25 -05:00
Devin Matthews
e6e566426a Merge branch 'master' into more_config_opts 2016-03-26 14:10:15 -05:00
Field G. Van Zee
8624e36543 Merge pull request #50 from devinamatthews/fix_noopt_avx
Fix configuration issue where instruction set flags are not specified for debug builds.
2016-03-26 13:56:28 -05:00
Devin Matthews
469429ec34 Fix LD_FLAGS -> LDFLAGS. 2016-03-25 20:45:41 -05:00
Devin Matthews
8442d65c9e Replace -march=native with specific architecture flags to support cross-compiling, and add icc support for Intel architectures. 2016-03-25 20:06:48 -05:00
Devin Matthews
76099f20be Add threading option to configure. 2016-03-25 17:22:58 -05:00
Devin Matthews
ad43eab4c7 Merge branch 'fix_noopt_avx' into more_config_opts 2016-03-25 15:00:02 -05:00
Devin Matthews
9452bdb3af Add options for verbose make output and static/shared linking to configure. 2016-03-25 14:59:50 -05:00
Devin Matthews
2bd036f1f9 Fix configuration issue where instruction set flags are not specified for debug builds. 2016-03-25 12:16:49 -05:00
Field G. Van Zee
a315833f06 Merge pull request #48 from figual/master
Updated and improved ARMv8 micro-kernels.
2016-03-24 12:30:21 -05:00
figual
af92773f4f Updated and improved ARMv8 micro-kernels. 2016-03-23 22:07:02 +01:00
Field G. Van Zee
1d1a426d18 Merge pull request #46 from devinamatthews/new-config-opts
Add several changes to the build system.
2016-03-07 15:17:53 -06:00
Devin Matthews
d226dfa051 Add several changes to the build system.
1) Add -- options.
2) Add -d/--enable-debug option to enable debugging symbols with and without optimization.
3) Allow user to specify CC at configure time, and determine vendor (gcc/icc/etc.). For now configurations enforce a particular vendor.
4) Add make V=[0,1] option to control build verbosity.
2016-03-05 16:18:14 -06:00
Field G. Van Zee
5a978fffdb Merge pull request #45 from devinamatthews/high_prec_timers
Use clock_gettime(CLOCK_MONOTONIC) and mach_absolute_time instead of gettimeofday
2016-03-04 17:26:58 -06:00
Devin Matthews
63e2642390 Make sure that -lrt is linked on Linux. 2016-03-04 13:17:50 -06:00
Devin Matthews
44fddd48dc Add missing \. 2016-03-04 12:36:38 -06:00
Devin Matthews
7cabd2131f Use clock_gettime(CLOCK_MONOTONIC) and mach_absolute_time instead of gettimeofday. 2016-03-03 11:43:07 -06:00
Tyler Smith
adb2b4e096 Fixing guard for non implemented partitioning through packed matrices 2016-03-02 14:48:12 -06:00
Etienne Sauvage
4ca5d5b1fd sgemm micro-kernel for FMA4 instruction set (bulldozer configuration), based on x86_64/avx micro-kernel 2016-03-01 21:33:01 +01:00
Etienne Sauvage
627d59b5ba symbolic link for bulldozer configuration to kernels baseline 2016-02-29 21:53:12 +01:00
Field G. Van Zee
2dc5c0ae03 Merge pull request #40 from tkelman/bulldozer-symlink
Add symlink from config/bulldozer/kernels to kernels/x86_64/bulldozer
2016-02-29 12:22:51 -06:00
Field G. Van Zee
f2809fc5f7 Merge pull request #39 from devinamatthews/fix_f2c_conflicts
Devin's f2c type namespace update.

Details:
- Added "bla_" prefix to f2c type names to prevent conflicts with external user code.
- Removed most of the body of bli_f2c.h, which was unused.
2016-02-27 13:06:03 -06:00