Commit Graph

580 Commits

Author SHA1 Message Date
Field G. Van Zee
5a978fffdb Merge pull request #45 from devinamatthews/high_prec_timers
Use clock_gettime(CLOCK_MONOTONIC) and mach_absolute_time instead of gettimeofday
2016-03-04 17:26:58 -06:00
Devin Matthews
63e2642390 Make sure that -lrt is linked on Linux. 2016-03-04 13:17:50 -06:00
Devin Matthews
44fddd48dc Add missing \. 2016-03-04 12:36:38 -06:00
Devin Matthews
7cabd2131f Use clock_gettime(CLOCK_MONOTONIC) and mach_absolute_time instead of gettimeofday. 2016-03-03 11:43:07 -06:00
Tyler Smith
adb2b4e096 Fixing guard for non implemented partitioning through packed matrices 2016-03-02 14:48:12 -06:00
Field G. Van Zee
2dc5c0ae03 Merge pull request #40 from tkelman/bulldozer-symlink
Add symlink from config/bulldozer/kernels to kernels/x86_64/bulldozer
2016-02-29 12:22:51 -06:00
Field G. Van Zee
f2809fc5f7 Merge pull request #39 from devinamatthews/fix_f2c_conflicts
Devin's f2c type namespace update.

Details:
- Added "bla_" prefix to f2c type names to prevent conflicts with external user code.
- Removed most of the body of bli_f2c.h, which was unused.
2016-02-27 13:06:03 -06:00
Tony Kelman
3d0fae810d Add symlink from config/bulldozer/kernels to kernels/x86_64/bulldozer
to fix linking issue mentioned in #37 and https://groups.google.com/forum/#!topic/blis-devel/iypwljcaeEI
2016-02-25 23:24:03 -08:00
Devin Matthews
8624a33ccc Fix remaining f2c conflicts. 2016-02-25 13:51:26 -06:00
Devin Matthews
372eef0b6c Fixed most conflicts after hack-n-slash ofr bli_f2c.h, cleanup in
progress.
2016-02-25 12:01:58 -06:00
Field G. Van Zee
f86b94f206 Included missing blas2blis integer def to CBLAS.
Details:
- Added #include "bli_config_macro_defs" to all cblas_*.c files in
  compat/cblas/src. This has the effect of defining
  BLIS_BLAS2BLIS_INT_TYPE_SIZE to the default value if bli_config.h does
  not define it. Thanks to Tony Kelman for reporting this bug.
- In cblas_i?amax.c, changed the type of the variable 'iamax' from 'int'
  to 'f77_int'. This eliminates a compiler warning and a potential
  runtime bug and/or crash when the size of an int differs from the size
  of f77_int (as determined by BLIS_BLAS2BLIS_INT_TYPE_SIZE).
2016-02-23 18:12:34 -06:00
Field G. Van Zee
0b126de134 Consolidated packm_blk_var1 and packm_blk_var2.
Details:
- Consolidated the two blocked variants for packm into a single
  implementation (packm_blk_var1) and removed the other variant.
- Updated all induced method _cntl_init() functions in frame/cntl/ind/
  to use the new blocked variant 1.
- Defined two new macros, bli_is_ind_packed() and bli_is_nat_packed(),
  to detect pack_t schemas for induced methods and native execution,
  respectively.
2015-11-13 16:29:12 -06:00
Field G. Van Zee
30e5eb29e0 Minor changes to treatment of rs, cs in bli_obj.c.
Details:
- Applied a patch submitted by Devin Matthews that:
  - implements subtle changes to handling of somewhat unusual cases of
    row and column strides to accommodate certail tensor cases, which
    includes adding dimension parameters to _is_col_tilted() and
    _is_row_tilted() macros,
  - simplifies how buffers are sized when requested BLIS-allocated
    objects,
  - re-consolidates bli_adjust_strides_*() into one function, and
  - defines 'restrict' keyword as a "nothing" macro for C++ and pre-C99
    environments.
2015-11-13 12:14:19 -06:00
Field G. Van Zee
f0a4f41b5a Fixed unimplemented case in core2 sgemm ukernel.
Details:
- Implemented the "beta == 0" case for general stride output for the
  dunnington sgemm micro-kernel. This case had been, up until now,
  identical to the "beta != 0" case, which does not work when the
  output matrix has nan's and inf's. It had manifested as nan residuals
  in the test suite for right-side tests of ctrsm4m1a. Thanks to Devin
  Matthews for reporting this bug.
2015-11-12 15:22:50 -06:00
Field G. Van Zee
42810bbfa0 Fixed minor bugs for uncommon obj_create cases.
Details:
- Separated bli_adjust_strides() into _alloc() and _attach() flavors so
  that the latter can avoid a test performed by the former, in which the
  rs and cs are overridden and set to zero if either matrix dimension is
  zero. Actually, we also disable this overridding behavior, even for the
  _alloc() case, since keeping the original strides (probably) does not
  hurt anything. The original code has been kept commented-out, though,
  in case an unintended consequence is later discovered.
- Fixed a typo in an error check for general stride cases where rs == cs.
2015-11-12 12:07:46 -06:00
Field G. Van Zee
3e6dd11467 Minor re-expression in quadratic partitioning code.
Details:
- Minor change to quadratic equation solution code that avoids
  recomputation of the sqrt() parameter when the compiler is not
  smart enough to perform this optimization automatically.
2015-11-03 10:30:08 -06:00
Field G. Van Zee
0694b722f7 Merge branch 'master' of github.com:flame/blis 2015-11-02 17:24:25 -06:00
Field G. Van Zee
3e116f0a29 Fixed imaginary bug in quadratic partitioning code.
Details:
- Fixed a bug in the relatively new quadratic partitioning code that,
  under the right conditions, would perform sqrt() on a negative value.
  If the solution is imaginary, we discard it and use an alternate
  partition width that assumes no diagonal intersection. That alternate
  width is actually already computed, so, the fix was quite simple.
  Thanks to Devangi Parikh for reporting this bug.
2015-11-02 17:18:23 -06:00
Jeff Hammond
33557eccca add Travis CI build status icon to the README 2015-11-02 12:18:43 -08:00
Field G. Van Zee
4a502fbe77 Laid groundwork for runtime memory pool resizing.
Details:
- Changed bli_pool_finalize() so that the freeing begins with the block
  at top_index instead of block 0. This allows us to use the function
  for terminal finalization as well as temporary cleanup prior to
  reinitialization. Also, clear the pool_t struct upon _pool_finalize()
  in case it is called in the terminal case with some blocks still
  checked out to threads (in which case the threads will see the new
  block size as 0 and thus release the block as intended).
- Added bli_pool_reinit(), which calls _pool_finalize() followed by
  _pool_init() with new parameters.
- Added bli_mem_reinit(), which is based on bli_pool_reinit().
- Added new wrapper, _mem_compute_pool_block_sizes(), which calls
  _mem_compute_pool_block_sizes_dt().
- Updated bli_mem_release() so that the pblk_t is freed, via
  _pool_free_block(), if the block size recorded in the mem_t at the
  time the pblk_t was acquired is now different from the value in the
  pool_t.
2015-11-02 13:28:34 -06:00
Field G. Van Zee
37e55ca39b Fixed obscure 3m1/4m1a bugs in trmm[3] and trsm.
Details:
- Fixed a family of bugs in the triangular level-3 operations for
  certain complex implementations (3m1 and 4m1a) that only manifest if
  one of the register blocksizes (PACKMR/PACKNR, actually) is odd:
  - Fixed incorrect imaginary stride computation in bli_packm_blk_var2()
    for the triangular case.
  - Fixed the incorrect computation of imaginary stride, as stored in
    the auxinfo_t struct in trmm and trsm macro-kernels.
  - Fixed incorrect pointer arithmetic in the trsm macro-kernels in the
    cases where the the register blocksize for the triangular matrix is
    odd. Introduced a new byte-granular pointer arithmetic macro,
    bli_ptr_add(), that computes the correct value.
- Added cpp macro to bli_macro_defs.h for typeof() operator, defined in
  terms of __typeof__, which is used by bli_ptr_add() macro.
- Disabled the row- vs. column-storage optimization in bli_trmm_front()
  for singleton problems because the inherent ambiguity of whether a
  scalar is row-stored or column-stored causes the wrong parameter
  combination code to be executed (by dumb luck of our checking for
  row storage first).
- Added commented-out debugging lines to 3m1/4m1a and reference
  micro-kernels, and trsm_ll macro-kernel.
2015-10-30 18:25:04 -05:00
Field G. Van Zee
46294d80e5 Merge pull request #35 from figual/master
Fixed incomplete code in the double precision ARMv8 microkernel.
2015-10-27 12:41:23 -05:00
Francisco Igual
a0a7b85ac3 Fixed incomplete code in the double precision ARMv8 microkernel. 2015-10-27 08:59:15 +00:00
Field G. Van Zee
d3159c5740 Merge branch 'master' of github.com:flame/blis 2015-10-21 14:54:00 -05:00
Field G. Van Zee
b489152e11 Use vzeroall in haswell micro-kernels. 2015-10-21 14:53:17 -05:00
Field G. Van Zee
7e03e45bfe Merge pull request #33 from xianyi/master
Enable Travis CI
2015-10-14 13:26:07 -05:00
Zhang Xianyi
4f88c29f9e Detect Intel Broadwell (using Haswell config). 2015-10-14 12:57:50 -05:00
Zhang Xianyi
4b0ac1a998 Merge branch 'upstream_master' 2015-10-14 12:51:05 -05:00
Field G. Van Zee
77ddb0b1d3 Removed flop-counting mechanism.
Details:
- Removed the optional flop-counting feature introduced in commit
  7574c994.
2015-10-13 12:53:06 -05:00
Field G. Van Zee
276da36618 Minor formatting change to README.md. 2015-10-12 11:43:03 -05:00
Field G. Van Zee
d17057446f Added "Getting Started" section to README.md.
Details:
- Added section to README.md file containing links to wikis with brief
  descriptions.
2015-10-12 11:39:49 -05:00
Field G. Van Zee
e7e1f2f7b6 Minor updates to CREDITS, README files. 2015-10-02 16:51:52 -05:00
Field G. Van Zee
55329906ec Minor edits to README.md, testsuite.
Details:
- Fixed typos in README.md.
- Fixed column heading alignment for testsuite when matlab output is
  enabled.
- Minor updates to test/3m4m/runme.sh and test/3m4m/Makefile.
2015-09-26 20:47:19 -05:00
Field G. Van Zee
bbebdb5793 Replaced README with README.md.
Details:
- Replaced the old (and short) README file with a much more comprehensive
  version written in github-flavored markdown. The new file is based on
  content taken from the old Google Code homepage.
2015-09-25 14:47:27 -05:00
Field G. Van Zee
e2e9d64a63 Load balance thread ranges for arbitrary diagonals.
Details:
- Expanded/updated interface for bli_get_range_weighted() and
  bli_get_range() so that the direction of movement is specified in the
  function name (e.g. bli_get_range_l2r(), bli_get_range_weighted_t2b())
  and also so that the object being partitioned is passed instead of an
  uplo parameter. Updated invocations in level-3 blocked variants, as
  appropriate.
- (Re)implemented bli_get_range_*() and bli_get_range_weighted_*() to
  carefully take into account the location of the diagonal when computing
  ranges so that the area of each subpartition (which, in all present
  level-3 operations, is proportional to the amount of computation
  engendered) is as equal as possible.
- Added calls to a new class of routines to all non-gemm level-3 blocked
  variants:
    bli_<oper>_prune_unref_mparts_[mnk]()
  where <oper> is herk, trmm, or trsm and [mnk] is chosen based on which
  dimension is being partitioned. These routines call a more basic
  routine, bli_prune_unref_mparts(), to prune unreferenced/unstored
  regions from matrices and simultaneously adjust other matrices which
  share the same dimension accordingly.
- Simplified herk_blk_var2f, trmm_blk_var1f/b as a result of more the
  new pruning routines.
- Fixed incorrect blocking factors passed into bli_get_range_*() in
  bli_trsm_blk_var[12][fb].c
- Added a new test driver in test/thread_ranges that can exercise the new
  bli_get_range_*() and bli_get_range_weighted_*() under a range of
  conditions.
- Reimplemented m and n fields of obj_t as elements in a "dim"
  array field so that dimensions could be queried via index constant
  (e.g. BLIS_M, BLIS_N). Adjusted/added query and modification
  macros accordingly.
- Defined mdim_t type to enumerate BLIS_M and BLIS_N indexing values.
- Added bli_round() macro, which calls C math library function round(),
  and bli_round_to_mult(), which rounds a value to the nearest multiple
  of some other value.
- Added miscellaneous pruning- and mdim_t-related macros.
- Renamed bli_obj_row_offset(), bli_obj_col_offset() macros to
  bli_obj_row_off(), bli_obj_col_off().
2015-09-24 12:14:03 -05:00
Zhang Xianyi
fe3e355c9c Merge branch 'upstream_master' 2015-08-21 14:38:36 -05:00
Zhang Xianyi
efa641e36b Try to fix the compiling bug on travis. 2015-08-22 03:15:50 +08:00
Field G. Van Zee
4dd9dd3e1d Fixed minor alignment ambiguity bug in bli_pool.c.
Details:
- Fixed a typecasting ambiguity in bli_pool_alloc_block() in which
  pointer arithmetic was performed on a void* as if it were a byte
  pointer (such as char*). Some compilers may have already been
  interpreting this situation as intended, despite the sloppiness.
  Thanks to Aleksei Rechinskii for reporting this issue.
- Redefined pointer alignment macros to typecast to uintptr_t instead of
  siz_t.
2015-08-21 11:52:37 -05:00
Zhang Xianyi
12ffd568b0 Add Travis CI. 2015-08-22 00:24:28 +08:00
Field G. Van Zee
ecc3ebb749 CHANGELOG update (0.1.8) 2015-07-29 13:31:12 -05:00
Field G. Van Zee
47caa33485 Version file update (0.1.8) 0.1.8 2015-07-29 13:31:09 -05:00
Field G. Van Zee
ef0fbbbdb6 Merge branch 'master' of github.com:flame/blis 2015-07-09 13:54:54 -05:00
Field G. Van Zee
fdfe14f1e1 Added support for Intel Haswell/Broadwell.
Details:
- Added sgemm and dgemm micro-kernels, which employ 256-bit AVX vectors
  and FMA instructions. (Complex support is currently provided by default
  induced method, 4m1a.)
- Added a 'haswell' configuration, which uses the aforementioned kernels.
- Inserted auto-detection support for haswell configuration in
  build/auto-detect/cpuid_x86.c.
- Modified configure script to explicitly echo when automatic or manual
  configuration is in progress.
- Changed beta scalar in test_gemm.c module of test suite to -1.0 to 0.9.
2015-07-09 13:52:39 -05:00
Field G. Van Zee
d4b891369c Added 'carrizo' configuration.
Details:
- Added a new configuration for AMD Excavator-based hardware also known
  as Carrizo when referring to the entire APU. This configuration uses
  the same micro-kernels as the piledriver, but with different
  cache blocksizes.
2015-07-07 10:06:53 -05:00
Field G. Van Zee
0b7255a642 CHANGELOG update (0.1.7) 2015-06-19 12:01:50 -05:00
Field G. Van Zee
267253de8a Version file update (0.1.7) 0.1.7 2015-06-19 12:01:49 -05:00
Field G. Van Zee
7cd01b71b5 Implemented dynamic allocation for packing buffers.
Details:
- Replaced the old memory allocator, which was based on statically-
  allocated arrays, with one based on a new internal pool_t type, which,
  combined with a new bli_pool_*() API, provides a new abstract data
  type that implements the same memory pool functionality but with blocks
  from the heap (ie: malloc() or equivalent). Hiding the details of the
  pool in a separate API also allows for a much simpler bli_mem.c family
  of functions.
- Added a new internal header, bli_config_macro_defs.h, which enables
  sane defaults for the values previously found in bli_config. Those
  values can be overridden by #defining them in bli_config.h the same
  way kernel defaults can be overridden in bli_kernel.h. This file most
  resembles what was previously a typical configuration's bli_config.h.
- Added a new configuration macro, BLIS_POOL_ADDR_ALIGN_SIZE, which
  defaults to BLIS_PAGE_SIZE, to specify the alignment of individual
  blocks in the memory pool. Also added a corresponding query routine to
  the bli_info API.
- Deprecated (once again) the micro-panel alignment feature. Upon further
  reflection, it seems that the goal of more predictable L1 cache
  replacement behavior is outweighed by the harm caused by non-contiguous
  micro-panels when k % kc != 0. I honestly don't think anyone will even
  miss this feature.
- Changed bli_ukr_get_funcs() and bli_ukr_get_ref_funcs() to call
  bli_cntl_init() instead of bli_init().
- Removed query functions from bli_info.c that are no longer applicable
  given the dynamic memory allocator.
- Removed unnecessary definitions from configurations' bli_config.h files,
  which are now pleasantly sparse.
- Fixed incorrect flop counts in addv, subv, scal2v, scal2m testsuite
  modules. Thanks to Devangi Parikh for pointing out these
  miscalculations.
- Comment, whitespace changes.
2015-06-19 11:31:53 -05:00
Field G. Van Zee
9848f255a3 Added early return to API-level _init() routines.
Details:
- Added conditional code that returns early from the API-level _init()
  routines if the API is already initialized. Actually meant for this to
  be included in 5f93cbe8.
2015-06-11 19:14:22 -05:00
Field G. Van Zee
5f93cbe870 Introduced API-level initialization.
Details:
- Added API-level initialization state to _const, _error, _mem, _thread,
  _ind, and _cntl APIs. While this functionality will mostly go unused,
  adding miniscule overhead at init-time, there will be at least once
  instance in the near future where, in order to avoid an infinite loop,
  a certain portion of the initialization will call a query function that
  itself attempts to call bli_init(). API-level initialization will allow
  this later stage to verify that an earlier stage of initialization has
  completed, even if the overall call to bli_init() has not yet returned.
- Added _is_initialized() functions for each API, setting the underlying
  bool_t during _init() and unsetting it during _finalize().
- Comment, whitespace changes.
2015-06-11 18:52:12 -05:00
Field G. Van Zee
ee129c6b02 Fixed bugs in _get_range(), _get_range_weighted().
Details:
- Fixed some bugs that only manifested in multithreaded instances of
  some (non-gemm) level-3 operations. The bugs were related to invalid
  allocation of "edge" cases to thread subpartitions. (Here, we define
  an "edge" case to be one where the dimension being partitioned for
  parallelism is not a whole multiple of whatever register blocksize
  is needed in that dimension.) In BLIS, we always require edge cases
  to be part of the bottom, right, or bottom-right subpartitions.
  (This is so that zero-padding only has to happen at the bottom, right,
  or bottom-right edges of micro-panels.) The previous implementations
  of bli_get_range() and _get_range_weighted() did not adhere to this
  implicit policy and thus produced bad ranges for some combinations of
  operation, parameter cases, problem sizes, and n-way parallelism.
- As part of the above fix, the functions bli_get_range() and
  _get_range_weighted() have been renamed to use _l2r, _r2l, _t2b,
  and _b2t suffixes, similar to the partitioning functions. This is
  an easy way to make sure that the variants are calling the right
  version of each function. The function signatures have also been
  changed slightly.
- Comment/whitespace updates.
- Removed unnecessary '/' from macros in bli_obj_macro_defs.h.
2015-06-10 12:53:28 -05:00