Commit Graph

97 Commits

Author SHA1 Message Date
Field G. Van Zee
c31b1e7b9d Relax alignment restrictions for sandybridge ukrs.
Details:
- Relaxed the base pointer and leading dimension alignment restrictions
  in the sandybridge gemm microkernels, allowing the use of vmovups/vmovupd
  instead of vmovaps/vmovapd. These change mimic those made to the haswell
  microkernels in e0d2fa0 and ee2c139.
- Updated testsuite modules as well as standalone test drivers in 'test'
  directory to use DBL_MAX as the initial time candidate. Thanks to Devin
  Matthews for suggesting this change.
- Inserted #include "float.h" into bli_system.h (to gain access to DBL_MAX).
- Minor update (vis-a-vis contexts) to driver code in test/3m4m.
2016-07-27 15:58:07 -05:00
Field G. Van Zee
a89555d160 Added randn[vm] operations, support in testsuite.
Details:
- Defined a new randomization operation, randn, on vectors and matrices.
  The randnv and randnm operations randomize each element of the target
  object with values from a narrow range of values. Presently, those
  values are all integer powers of two, but they do not need to be powers
  of two in order to achieve the primary goal, which is to initialize
  objects that can be operated on with plenty of precision "slack"
  available to allow computations that avoid roundoff. Using this method
  of randomization makes it much more likely that testsuite residuals of
  properly-functioning operations are close to zero, if not exactly zero.
- Updated existing randomization operations randv and randm to skip
  special diagonal handling and normalization for matrices with structure.
  This is now handled by the testsuite modules by explicitly calling a
  testsuite function that loads the diagonal (and scales off-diagonal
  elements).
- Added support for randnv and randnm in the testsuite with a new switch
  in input.general that universally toggles between use of the classic
  randv/randm, which use real values on the interval [-1,1], and
  randnv/randnm, which use only values from a narrow range. Currently,
  the narrow range is: +/-{2^0, 2^-1, 2^-2, 2^-3, 2^-4, 2^-5, 2^-6}, as
  well as 0.0.
- Updated testsuite modules so that a testsutie wrapper function is called
  instead of directly calling the randomization operations (such as
  bli_randv() and bli_randm()). This wrapper also takes a bool_t that
  indicates whether the object's elements should be normalized. (NOTE: As
  alluded to above, in the test modules of triangular solve operations such
  as trsv and trsm, we perform the extra step of loading the diagonal.)
- Defined a new level-0 operation, invertsc, which inverts a scalar.
- Updated the abval2ris and sqrt2ris level-0 macros to avoid an unlikely
  but possible divide-by-zero.
- Updated function signature and prototype formatting in testsuite.
2016-06-17 14:08:35 -05:00
Field G. Van Zee
d309f20b73 Added alignment switch to testsuite.
Details:
- Added a new input parameter to input.general that globally toggles
  whether testsuite tests are performed on objects whose buffers and
  leading dimensions have been aligned, and changed the implementation
  of libblis_test_mobj_create() to employ alignment (or not) regardless
  of whether row, column, or general storage is being tested.
- Updated configure script's "--help" text to indicate default behavior
  for internal integer type size and BLAS/CBLAS integer type size
  options.
2016-05-18 15:13:53 -05:00
Field G. Van Zee
c3a4d39d03 Updates to haswell gemm micro-kernels.
Details:
- Added two new sets of [sd]gemm micro-kernels for haswell architectures,
  one that is 4x24/4x12 (s and d) and one that is 6x16/6x8.
- Changed the haswell configuration to use the 6x16/6x8 micro-kernels
  by default.
- Updated various Makefiles, in test, test/3m4m, and testsuite.
2016-05-04 17:22:56 -05:00
Field G. Van Zee
bbb8569b2a Use 'restrict' in all kernel APIs; wspace changes.
Details:
- Updated level-1v, level-1f kernel function types (bli_l1?_ft.h) and
  generic kernel prototypes (bli_l1?_ker.h) to use 'restrict' for all
  numerical operand pointers (ie: all pointers except the cntx_t).
- Updated level-1f reference kernel definitions to use 'restrict' for
  all numerical operand pointers. (Level-1v reference kernel definitions
  were already updated in bdbda6e.)
- Rewrote the level-1v and level-1f reference kernel prototypes in
  bli_l1v_ref.h and bli_l1f_ref.h, respectively, to simply #include
  bli_l1v_ker.h and bli_l1f_ker.h with redefined function base names
  (as was already being done for the level-3 micro-kernel prototypes
  in bli_l3_ref.h), rather than duplicate the signatures from the
  _ker.h files.
- Added definitions to frame/include/bli_kernel_prototypes.h for axpbyv
  and xpbyv, which were probably meant for inclusion in bdbda6e.
- Converted a number of instances of four spaces, as introduced in
  bdbda6e, to tabs.
2016-04-27 14:13:46 -05:00
Devin Matthews
bdbda6e6ac Give the level1v operations some love:
- Add missing axpby and xpby operations (plus test cases).
- Add special case for scal2v with alpha=1.
- Add restrict qualifiers.
- Add special-case algorithms for incx=incy=1.
2016-04-25 11:05:57 -05:00
Devin Matthews
0e1a9821d8 Add configure options and generate bli_config.h automatically.
Options to configure have been added for:
- Setting the internal BLIS and BLAS/CBLAS integer sizes.
- Enabling and disabling the BLAS and CBLAS layers.

Additionally, configure options which require defining macros (the above plus the threading model), write their macros to the automatically-generated bli_config.h file in the top-level build directory. The old bli_config.h files in the config dirs were removed, and any kernel-related macros (SIMD size and alignment etc.) were moved to bli_kernel.h. The Makefiles were also modified to find the new bli_config.h file.

Lastly, support for OMP in clang has been added (closes #56).
2016-04-19 11:44:37 -05:00
Field G. Van Zee
537a1f4f85 Implemented runtime contexts and reorganized code.
Details:
- Retrofitted a new data structure, known as a context, into virtually
  all internal APIs for computational operations in BLIS. The structure
  is now present within the type-aware APIs, as well as many supporting
  utility functions that require information stored in the context. User-
  level object APIs were unaffected and continue to be "context-free,"
  however, these APIs were duplicated/mirrored so that "context-aware"
  APIs now also exist, differentiated with an "_ex" suffix (for "expert").
  These new context-aware object APIs (along with the lower-level, type-
  aware, BLAS-like APIs) contain the the address of a context as a last
  parameter, after all other operands. Contexts, or specifically, cntx_t
  object pointers, are passed all the way down the function stack into
  the kernels and allow the code at any level to query information about
  the runtime, such as kernel addresses and blocksizes, in a thread-
  friendly manner--that is, one that allows thread-safety, even if the
  original source of the information stored in the context changes at
  run-time; see next bullet for more on this "original source" of info).
  (Special thanks go to Lee Killough for suggesting the use of this kind
  of data structure in discussions that transpired during the early
  planning stages of BLIS, and also for suggesting such a perfectly
  appropriate name.)
- Added a new API, in frame/base/bli_gks.c, to define a "global kernel
  structure" (gks). This data structure and API will allow the caller to
  initialize a context with the kernel addresses, blocksizes, and other
  information associated with the currently active kernel configuration.
  The currently active kernel configuration within the gks cannot be
  changed (for now), and is initialized with the traditional cpp macros
  that define kernel function names, blocksizes, and the like. However,
  in the future, the gks API will be expanded to allow runtime management
  of kernels and runtime parameters. The most obvious application of this
  new infrastructure is the runtime detection of hardware (and the
  implied selection of appropriate kernels). With contexts in place,
  kernels may even be "hot swapped" at runtime within the gks. Once
  execution enters a level-3 _front() function, the memory allocator will
  be reinitialized on-the-fly, if necessary, to accommodate the new
  kernels' blocksizes. If another application thread is executing with
  another (previously loaded) kernel, it will finish in a deterministic
  fashion because its kernel information was loaded into its context
  before computation began, and also because the blocks it checked out
  from the internal memory pools will be unaffected by the newer threads'
  reinitialization of the allocator.
- Reorganized and streamlined the 'ind' directory, which contains much of
  the code enabling use of induced methods for complex domain matrix
  multiplication; deprecated bli_bsv_query.c and bli_ukr_query.c, as
  those APIs' functionality is now mostly subsumed within the global
  kernel structure.
- Updated bli_pool.c to define a new function, bli_pool_reinit_if(),
  that will reinitialize a memory pool if the necessary pool block size
  has increased.
- Updated bli_mem.c to use bli_pool_reinit_if() instead of
  bli_pool_reinit() in the definition of bli_mem_pool_init(), and placed
  usage of contexts where appropriate to communicate cache and register
  blocksizes to bli_mem_compute_pool_block_sizes().
- Simplified control trees now that much of the information resides in
  the context and/or the global kernel structure:
  - Removed blocksize object pointers (blksz_t*) fields from all control
    tree node definitions and replaced them with blocksize id (bszid_t)
    values instead, which may be passed into a context query routine in
    order to extract the corresponding blocksize from the given context.
  - Removed micro-kernel function pointers (func_t*) fields from all
    control tree node definitions. Now, any code that needs these function
    pointers can query them from the local context, as identified by a
    level-3 micro-kernel id (l3ukr_t), level-1f kernel id, (l1fkr_t), or
    level-1v kernel id (l1vkr_t).
  - Removed blksz_t object creation and initialization, as well as kernel
    function object creation and initialization, from all operation-
    specific control tree initialization files (bli_*_cntl.c), since this
    information will now live in the gks and, secondarily, in the context.
- Removed blocksize multiples from blksz_t objects. Now, we track
  blocksize multiples for each blocksize id (bszid_t) in the context
  object.
- Removed the bool_t's that were required when a func_t was initialized.
  These bools are meant to allow one to track the micro-kernel's storage
  preferences (by rows or columns). This preference is now tracked
  separately within the gks and contexts.
- Merged and reorganized many separate-but-related functions into single
  files. This reorganization affects frame/0, 1, 1d, 1m, 1f, 2, 3, and
  util directories, but has the most obvious effect of allowing BLIS
  to compile noticeably faster.
- Reorganized execution paths for level-1v, -1d, -1m, and -2 operations
  in an attempt to reduce overhead for memory-bound operations. This
  includes removal of default use of object-based variants for level-2
  operations. Now, by default, level-2 operations will directly call a
  low-level (non-object based) loop over a level-1v or -1f kernel.
- Converted many common query functions in blk_blksz.c (renamed from
  bli_blocksize.c) and bli_func.c into cpp macros, now defined in their
  respective header files.
- Defined bli_mbool.c API to create and query "multi-bools", or
  heterogeneous bool_t's (one for each floating-point datatype), in the
  same spirit as blksz_t and func_t.
- Introduced two key parameters of the hardware: BLIS_SIMD_NUM_REGISTERS
  and BLIS_SIMD_SIZE. These values are needed in order to compute a third
  new parameter, which may be set indirectly via the aforementioned
  macros or directly: BLIS_STACK_BUF_MAX_SIZE. This value is used to
  statically allocate memory in macro-kernels and the induced methods'
  virtual kernels to be used as temporary space to hold a single
  micro-tile. These values are now output by the testsuite. The default
  value of BLIS_STACK_BUF_MAX_SIZE is computed as
  "2 * BLIS_SIMD_NUM_REGISTERS * BLIS_SIMD_SIZE".
- Cleaned up top-level 'kernels' directory (for example, renaming the
  embarrassingly misleading "avx" and "avx2" directories to "sandybridge"
  and "haswell," respectively, and gave more consistent and meaningful
  names to many kernel files (as well as updating their interfaces to
  conform to the new context-aware kernel APIs).
- Updated the testsuite to query blocksizes from a locally-initialized
  context for test modules that need those values: axpyf, dotxf,
  dotxaxpyf, gemm_ukr, gemmtrsm_ukr, and trsm_ukr.
- Reformatted many function signatures into a standard format that will
  more easily facilitate future API-wide changes.
- Updated many "mxn" level-0 macros (ie: those used to inline double loops
  for level-1m-like operations on small matrices) in frame/include/level0
  to use more obscure local variable names in an effort to avoid variable
  shaddowing. (Thanks to Devin Matthews for pointing these gcc warnings,
  which are only output using -Wshadow.)
- Added a conj argument to setm, so that its interface now mirrors that
  of scalm. The semantic meaning of the conj argument is to optionally
  allow implicit conjugation of the scalar prior to being populated into
  the object.
- Deprecated all type-aware mixed domain and mixed precision APIs. Note
  that this does not preclude supporting mixed types via the object APIs,
  where it produces absolutely zero API code bloat.
2016-04-11 17:21:28 -05:00
Devin Matthews
26379b14de Adjust paths in common.mk to support building from testsuite dir. 2016-03-31 10:45:48 -05:00
Devin Matthews
edbb847004 Refactor out some definitions which moved from make_defs.mk to Makefile for use in testsuite Makefile. 2016-03-30 16:27:11 -05:00
Field G. Van Zee
55329906ec Minor edits to README.md, testsuite.
Details:
- Fixed typos in README.md.
- Fixed column heading alignment for testsuite when matlab output is
  enabled.
- Minor updates to test/3m4m/runme.sh and test/3m4m/Makefile.
2015-09-26 20:47:19 -05:00
Field G. Van Zee
fdfe14f1e1 Added support for Intel Haswell/Broadwell.
Details:
- Added sgemm and dgemm micro-kernels, which employ 256-bit AVX vectors
  and FMA instructions. (Complex support is currently provided by default
  induced method, 4m1a.)
- Added a 'haswell' configuration, which uses the aforementioned kernels.
- Inserted auto-detection support for haswell configuration in
  build/auto-detect/cpuid_x86.c.
- Modified configure script to explicitly echo when automatic or manual
  configuration is in progress.
- Changed beta scalar in test_gemm.c module of test suite to -1.0 to 0.9.
2015-07-09 13:52:39 -05:00
Field G. Van Zee
7cd01b71b5 Implemented dynamic allocation for packing buffers.
Details:
- Replaced the old memory allocator, which was based on statically-
  allocated arrays, with one based on a new internal pool_t type, which,
  combined with a new bli_pool_*() API, provides a new abstract data
  type that implements the same memory pool functionality but with blocks
  from the heap (ie: malloc() or equivalent). Hiding the details of the
  pool in a separate API also allows for a much simpler bli_mem.c family
  of functions.
- Added a new internal header, bli_config_macro_defs.h, which enables
  sane defaults for the values previously found in bli_config. Those
  values can be overridden by #defining them in bli_config.h the same
  way kernel defaults can be overridden in bli_kernel.h. This file most
  resembles what was previously a typical configuration's bli_config.h.
- Added a new configuration macro, BLIS_POOL_ADDR_ALIGN_SIZE, which
  defaults to BLIS_PAGE_SIZE, to specify the alignment of individual
  blocks in the memory pool. Also added a corresponding query routine to
  the bli_info API.
- Deprecated (once again) the micro-panel alignment feature. Upon further
  reflection, it seems that the goal of more predictable L1 cache
  replacement behavior is outweighed by the harm caused by non-contiguous
  micro-panels when k % kc != 0. I honestly don't think anyone will even
  miss this feature.
- Changed bli_ukr_get_funcs() and bli_ukr_get_ref_funcs() to call
  bli_cntl_init() instead of bli_init().
- Removed query functions from bli_info.c that are no longer applicable
  given the dynamic memory allocator.
- Removed unnecessary definitions from configurations' bli_config.h files,
  which are now pleasantly sparse.
- Fixed incorrect flop counts in addv, subv, scal2v, scal2m testsuite
  modules. Thanks to Devangi Parikh for pointing out these
  miscalculations.
- Comment, whitespace changes.
2015-06-19 11:31:53 -05:00
Field G. Van Zee
26a4b8f6f9 Implemented 3m2, 3m3 induced algorithms (gemm only).
Details:
- Defined a new "3ms" (separated 3m) pack schema and added appropriate
  support in packm_init(), packm_blk_var2().
- Generalized packm_struc_cxk_3mi to take the imaginary stride (is_p)
  as an argument instead of computing it locally. Exception: for trmm,
  is_p must be computed locally, since it changes for triangular
  packed matrices. Also exposed is_p in interface to dt-specific
  packm_blk_var2 (and _var1, even though it does not use imaginary
  stride).
- Renamed many functions/variables from _3mi to _3mis to indicate that
  they work for either interleaved or separated 3m pack schemas.
- Generalized gemm and herk macro-kernels to pass in imaginary stride
  rather than compute them locally.
- Added support for 3m2 and 3m3 algorithms to frame/ind, including 3m2-
  and 3m3-specific virtual micro-kernels.
- Added special gemm macro-kernels to support 3m2 and 3m3.
- Added support for 3m2 and 3m3 to testsuite.
- Corrected the type of the panel dimension (pd_) in various macro-
  kernels from inc_t to dim_t.
- Renamed many functions defined in bli_blocksize.c.
- Moved most induced-related macro defs from frame/include to
  frame/ind/include.
- Updated the _ukernel.c files so that the micro-kernel function pointers
  are obtained from the func_t objects rather than the cpp macros that
  define the function names.
- Updated test/3m4m driver, Makefile, and run script.
2015-04-01 10:44:54 -05:00
Field G. Van Zee
f1a6b7d028 Reorganized code for induced complex methods.
Details:
- Consolidated most of the code relating to induced complex methods
  (e.g. 4mh, 4m1, 3mh, 3m1, etc.) into frame/ind. Induced methods
  are now enabled on a per-operation basis. The current "available"
  (enabled and implemented) implementation can then be queried on
  an operation basis. Micro-kernel func_t objects as well as blksz_t
  objects can also be queried in a similar maner.
- Redefined several micro-kernel and operation-related functions in
  bli_info_*() API, in accordance with above changes.
- Added mr and nr fields to blksz_t object, which point to the mr
  and nr blksz_t objects for each cache blocksize (and are NULL for
  register blocksizes). Renamed the sub-blocksize field "sub" to
  "mult" since it is really expressing a blocksize multiple.
- Updated bli_*_determine_kc_[fb]() for gemm/hemm/symm, trmm, and
  trsm to correctly query mr and nr (for purposes of nudging kc).
- Introduced an enumerated opid_t in bli_type_defs.h that uniquely
  identifies an operation. For now, only level-3 id values are defined,
  along with a generic, catch-all BLIS_NOID value.
- Reworked testsuite so that all induced methods that are enabled
  are tested (one at a time) rather than only testing the first
  available method.
- Reformated summary at the beginning of testsuite output so that
  blocksize and micro-kernel info is shown for each induced method
  that was requested (as well as native execution).
- Reduced the number of columns needed to display non-matlab
  testsuite output (from approx. 90 to 80).
2015-03-18 15:37:10 -05:00
Field G. Van Zee
c0acca0f51 Clarified comments in testsuite input.operations. 2015-03-03 10:56:22 -06:00
Field G. Van Zee
a86db60ee2 Extensive renaming of 3m/4m-related files, symbols.
Details:
- Renamed all remaining 3m/4m packing files and symbols to 3mi/4mi
  ('i' for "interleaved"). Similar changes to 3M/4M macros.
- Renamed all 3m/4m files and functions to 3m1/4m1.
- Whitespace changes.
2015-02-23 18:42:39 -06:00
Field G. Van Zee
59613f1d55 Added separeate micro-panel alignment for A and B.
Details:
- Changed the recently-added micro-panel alignment macros so that we now
  have two sets--one for micro-panels of matrix A and one for micro-
  panels of matrix B: BLIS_UPANEL_[AB]_ALIGN_SIZE_?.
- Store each set of alignment values into a separate blksz_t object in
  bli_gemm_cntl_init().
- Adjusted packm_init() to use the separate alignment values.
- Added query routines for the new alignment values to bli_info.c.
- Modified test suite output accordingly.
2014-10-23 17:21:37 -05:00
Field G. Van Zee
e64dba5633 Re-implemented micro-panel alignment.
Details:
- This commit re-implements a feature that was removed in commit
  c2b2ab62. It was removed because, at the time, I wasn't sure how the
  micro-panel alignment feature would interact with the 4m method (when
  applied at the micro-kernrel level), and so it seemed safer to disable
  the feature entirely rather than allow possible breakage. This commit
  revisits the issue and safely re-implements the feature in a way that
  is compatible with 4m, 3m, 4mh, and 3mh (and native execution).
- Modified the static memory pool to account for micro-panel alignment
  space.
- Modified packm_init and blocked variants to align whole micro-panels
  by a datatype-specific alignment value that may be set by the
  configuration. (If it is not set by the configuration, it will default
  to BLIS_SIZEOF_?.)
- Modified macro-kernels so that:
  - storage stride is handled properly given the new micro-panel
    alignment behavior;
  - indexing through 3m/4m/rih-type sub-panels, as is done by trmm and
    trsm, is more robust (e.g. will work if the applicable packing
    register blocksize is odd);
  - imaginary strides are computed and stored within auxinfo_t structs,
    which allows the virtual micro-kernels to more easily determine how
    to index into the micro-panel operands.
- Modified virtual 3m and 4m micro-kernels to use the imaginary strides
  within the auxinfo_t structs instead of panel strides.
- Deprecated the panel stride fields from the auxinfo_t structs.
- Updated test suite to print out the micro-panel alignment values.
2014-10-20 19:23:06 -05:00
Field G. Van Zee
0d954087b2 Minor changes and fixes.
Details:
- Redefined bli_is_last_iter() to take thread_id and num_thread
  arguments, which allows the macro to correctly compute whether a
  given iteration is the last that the thread will compute in that
  particular loop. The new definition, however, remains disabled
  (commented out) until someone can look at this more closely, as
  the new definition seems to actually hurt performance slightly.
- Whitespace and related updates to level-3 macro-kernels.
- Updated test suite so that performance results in the hundreds of
  gigaflops does not disrupt the column alignment of the output.
2014-10-17 11:19:34 -05:00
Field G. Van Zee
99fd9a3971 Fixed two minor bugs.
Details:
- Fixed a bug in the test suite for the trsm_ukr and gemmtrsm_ukr test
  modules whereby the uplo bits of some packed matrix objects were not
  being set properly, resulting in false FAILURE results for those
  tests. Thanks to Tyler Smith for bringing this issue to my attention.
- Fixed a bug in bli_obj_alloc_buffer() that caused an unnecessary
  "not yet implemented" abort() when creating a 1x1 object with non-unit
  strides.
2014-10-09 16:38:04 -05:00
Field G. Van Zee
96302d4fc8 Renamed bli_info_get_*_ukr_type() functions.
Details:
- Added _string() suffix to bli_info_get_*_ukr_type() function names.
  This makes them consistent with the bli_info_get_*_impl_string()
  functions.
2014-09-18 09:43:40 -05:00
Field G. Van Zee
e9899be090 Added high-level implementations of 4m, 3m.
Details:
- Added "4mh" and "3mh" APIs, which implement the 4m and 3m methods at
  high levels, respectively. APIs for trmm and trsm were NOT added due
  to the fact that these approaches are inherently incompatible with
  implementing 4m or 3m at high levels (because the input right-hand
  side matrix is overwritten).
- Added 4mh, 3mh virtual micro-kernels, and updated the existing 4m and
  3m so that all are stylistically consistent.
- Added new "rih" packing kernels (both low-level and structure-aware)
  to support both 4mh and 3mh.
- Defined new pack_t schemas to support real-only, imaginary-only, and
  real+imaginary packing formats.
- Added various level0 scalar macros to support the rih packm kernels.
- Minor tweaks to trmm macro-kernels to facilitate 4mh and 3mh.
- Added the ability to enable/disable 4mh, 3m, and 3mh, and adjusted
  level-3 front-ends to check enabledness of 3mh, 3m, 4mh, and 4m (in
  that order) and execute the first one that is enabled, or the native
  implementation if none are enabled.
- Added implementation query functions for each level-3 operation so
  that the user can query a string that describes the implementation
  that is currently enabled.
- Updated test suite to output implementation types for reach level-3
  operation, as well as micro-kernel types for each of the five micro-
  kernels.
- Renamed BLIS_ENABLE_?COMPLEX_VIA_4M macros to _ENABLE_VIRTUAL_?COMPLEX.
- Fixed an obscure bug when packing Hermitian matrices (regular packing
  type) whereby the diagonal elements of the packed micro-panels could
  get tainted if the source matrix's imaginary diagonal part contained
  garbage.
2014-09-16 18:19:32 -05:00
Field G. Van Zee
cf5efdde05 Pass pack_t schemas into ukernels via auxinfo_t.
Details:
- Modified macro-kernels to pass the pack_t schema values for matrices
  A and B into the datatype-specific functions, where they are now
  inserted into a newly-expanded auxinfo_t struct. This gives gives the
  micro-kernels access to the pack_t schema values embedded in the
  control trees, which determine the precise format into which the
  matrix elements are packed.
- Updated a call to bli_packm_init_pack() in src/test_libblis.c to
  remove densify argument. Meant to include this in commit c472993b.
2014-09-11 11:47:56 -05:00
Field G. Van Zee
af521ee6f2 Changed semantics of blocksize extensions.
Details:
- Changed semantics of cache and register blocksize extensions so that
  the extended values are tracked, rather than just the marginal
  extensions.
- BLIS_EXTEND_[MKN]C_? has been renamed BLIS_MAXIMUM_[MKN]C_?.
- BLIS_EXTEND_[MKN]R_? has been renamed BLIS_PACKDIM_[MKN]R_?.
- bli_blksz_ext_*() APIs have been renamed to bli_blksz_max_*(). Note
  that these "max" query routines grab the maximum value for cache
  blocksizes and the packdim value for register blocksizes.
- bli_info_*() API has been updated accordingly.
- All configurations have been updated accordingly.
2014-09-01 14:06:46 -05:00
Field G. Van Zee
45692e3ad4 Reverted some accidental changes.
Details:
- Reverted some changes that were unintentionally included in the
  previous commit (9526ce98). Thanks to Tony Kelman for pointing
  this out. (Note: a few select changes were not reverted.)
2014-08-07 13:21:15 -05:00
Field G. Van Zee
9526ce9881 Updated copyright headers of emscripten configuration files. 2014-08-06 14:15:34 -05:00
Field G. Van Zee
7ed415824d Updated copyright headers (continued).
Details:
- Inserted "at Austin" into third clause of license declarations.
  Meant to include this change in previous commit.
2014-07-14 16:14:33 -05:00
Field G. Van Zee
5c2c6c8561 Updated copyright headers to contain "at Austin".
Details:
- Updated copyright headers to include "at Austin" in the name of the
  University of Texas.
- Updated the copyright years of a few headers to 2014 (from 2011 and
  2012).
2014-07-14 16:05:03 -05:00
Field G. Van Zee
26cd819906 Added bli_info_*() query functions.
Details:
- Added a new API family, bli_info_*(), which can be used to query
  information about how BLIS was configured. Most of these values are
  returned as gint_t, with the exception of the version string which
  is char*.
- Changed how the testsuite driver queries information about how BLIS
  was configured (from using macro constants directly to using the
  new bli_info API).
- Removed bli_version.c and its header file.
- Added STRINGIFY_INT() macro to bli_macro_defs.h
- Renamed info_t type in bli_type_defs.h to objbits_t (not because of
  an actual naming conflict, but because the name 'info_t' would now be
  somewhat misleading in the presence of the new bli_info API, as the
  two are unrelated).
2014-07-10 13:16:07 -05:00
Field G. Van Zee
970b431416 Minor bugfixes to BLAS compatibility layer.
Details:
- Changed bla_amax.c so that i?amax() routines now correctly return 0
  if ( n < 1 || incx <= 0 ).
- Changed bla_rotg.c and bla_rotmg.c to use bli_fabs() macro instead of
  f2c's abs() macro for float and double cases.
- Thanks to Murtaza Ali for suggesting the two fixes above.
- Updated label of fnormv to normfv in testsuite/input.operations.
2014-07-10 09:30:00 -05:00
Field G. Van Zee
4702350278 Defined _ukernel_void() wrappers to micro-kernels.
Details:
- Added wrappers for micro-kernels so that users may invoke the
  micro-kernels without knowing what the function names actually are.
  This is useful when an application wishes to call the micro-kernel
  from a shared library instance of BLIS, where the application may not
  necessarily have the luxury of grabbing the micro-kernel name(s) from
  C preprocessor macros at compile-time. Also, since the wrappers use
  void* pointers, one's environment does not need to be aware of some
  BLIS types such as scomplex and dcomplex. These wrappers now join the
  level-1 and level-1f kernel wrappers, which pre-dated this commit.
- Removed the wrapper definitions and prototypes from the micro-kernel
  test suite modules, and replaced calls to them with calls to the new
  wrappers mentioned above.
2014-07-03 11:48:23 -05:00
Marat Dukhan
f064711a5e SGEMM and DGEMM kernels for PNaCl 2014-06-15 06:27:37 -04:00
Tyler Smith
23d9eab354 Merge https://github.com/flame/blis 2014-03-20 16:54:35 -05:00
Field G. Van Zee
fd3e32a5f4 Refined INSERT_GENTFUNC macro usage.
Details:
- Defined new INSERT_GENTFUNC macros so that the macro always takes
  exactly the number of arguments needed for the particular operation or
  variant being defined. Many operations were using INSERT_GENTFUNC
  macros that expected one auxiliary argument even though none were
  needed. Those instances have now been updated. Most of these instances
  were in the level-0 and -1v operations, as well as some operations
  defined in frame/util.
2014-03-20 13:59:48 -05:00
Field G. Van Zee
a3902750b9 Reorganized norm operations.
Details:
- Completely reoganized norm operations:
  - Renames:
    - fnormsc, fnormv, fnormm -> normfsc, normfv, normfm (2-norm)
    - absumv -> norm1v (vector 1-norm)
  - New operations:
    - norm1m (matrix 1-norm)
    - normiv, normim (infinity-norm)
    - amaxv (BLAS-like absolute maximum value index)
    - asumv (BLAS-like absolute sum)
- Deprecated absumm, as it did not correspond to any actual norm.
  (However, an inlined version now exists in the testsuite module for
  randm.)
2014-03-19 12:35:17 -05:00
Tyler Smith
92233cf642 Some fixes to gemm thread info tree creation,
Changed microkernel tests to use the new BLIS_PACKM_SINGLE_THREADED
instead of BLIS_SINGLE_THREADED
2014-03-11 14:16:08 -05:00
Tyler Smith
2c158fb885 Merge https://github.com/flame/blis
Conflicts:
	frame/1m/packm/bli_packm_blk_var1.c
2014-02-27 16:46:23 -06:00
Tyler Smith
01b125e815 First pass at adding parallelism to BLIS.
Added a multithreading infrastructure that should be independent of multithreading implementation in the future.
Currently, gemm blocked variants 1f and 2f, and packm variant blocked variant 1 is parallelized.
2014-02-27 11:55:45 -06:00
Field G. Van Zee
c2b2ab6270 Deprecated panel stride alignment in bli_config.h.
Details:
- Removed BLIS_CONTIG_STRIDE_ALIGN_SIZE from bli_config.h of all
  configurations. It was already going unused in packm_init() since the
  recent 4m/3m commit. This setting was rarely, if ever, useful, and its
  existence only posed a potential risk for 4m/3m-based implementations.
- Removed BLIS_CONTIG_STRIDE_ALIGN_SIZE usage from mem_pool_macro_defs.h.
- Updated comments regarding CONTIG_STRIDE_ALIGN_SIZE in template
  micro-kernels.
2014-02-26 12:46:45 -06:00
Field G. Van Zee
fde5f1fdec Added extensive support for configuration defaults.
Details:
- Standard names for reference kernels (levels-1v, -1f and 3) are now
  macro constants. Examples:
    BLIS_SAXPYV_KERNEL_REF
    BLIS_DDOTXF_KERNEL_REF
    BLIS_ZGEMM_UKERNEL_REF
- Developers no longer have to name all datatype instances of a kernel
  with a common base name; [sdcz] datatype flavors of each kernel or
  micro-kernel (level-1v, -1f, or 3) may now be named independently.
  This means you can now, if you wish, encode the datatype-specific
  register blocksizes in the name of the micro-kernel functions.
- Any datatype instances of any kernel (1v, 1f, or 3) that is left
  undefined in bli_kernel.h will default to the corresponding reference
  implementation. For example, if BLIS_DGEMM_UKERNEL is left undefined,
  it will be defined to be BLIS_DGEMM_UKERNEL_REF.
- Developers no longer need to name level-1v/-1f kernels with multiple
  datatype chars to match the number of types the kernel WOULD take in
  a mixed type environment, as in bli_dddaxpyv_opt(). Now, one char is
  sufficient, as in bli_daxpyv_opt().
- There is no longer a need to define an obj_t wrapper to go along with
  your level-1v/-1f kernels. The framework now prvides a _kernel()
  function which serves as the obj_t wrapper for whatever kernels are
  specified (or defaulted to) via bli_kernel.h
- Developers no longer need to prototype their kernels, and thus no
  longer need to include any prototyping headers from within
  bli_kernel.h. The framework now generates kernel prototypes, with the
  proper type signature, based on the kernel names defined (or defaulted
  to) via bli_kernel.h.
- If the complex datatype x (of [cz]) implementation of the gemm micro-
  kernel is left undefined by bli_kernel.h, but its same-precision real
  domain equivalent IS defined, BLIS will use a 4m-based implementation
  for the datatype x implementations of all level-3 operations, using
  only the real gemm micro-kernel.
2014-02-25 13:34:56 -06:00
Field G. Van Zee
6363a9f658 Added level-3 support for complex via 4m-/3m.
Details:
- Added the ability to induce complex domain level-3 operations via new
  virtual complex micro-kernels which are implemented via only real
  domain micro-kernels. Two new implementations are provided: 4m and 3m.
  4m implements complex matrix multiplication in terms of four real
  matrix multiplications, where as 3m uses only three and thus is
  capable of even higher (than peak) performance. However, the 3m method
  has somewhat weaker numerical properties, making it less desirable
  in general.
- Further refined packing routines, which were recently revamped, and
  added packing functionality for 4m and 3m.
- Some modifications to trmm and trsm macro-kernels to facilitate indexing
  into micro-panels which were packed for 4m/3m virtual kernels.
- Added 4m and 3m interfaces for each level-3 operation.
- Various other minor changes to facilitate 4m/3m methods.
2014-02-19 17:00:52 -06:00
Field G. Van Zee
b7da57b282 Updated calls to packm_blk_var2() in testsuite.
Details:
- In ukernel testsuite modules, replaced calls to packm_blk_var2() with
  _var1(). Meant to include this in previous commit.
2014-02-11 10:28:23 -06:00
Field G. Van Zee
c255a293e2 Consolidated packm_blk_var2 and var3.
Details:
- Consolidated the functionality previously supported by packm_blk_var2()
  and packm_blk_var3() into a new variant, packm_blk_var1().
- Updates to packm_gen_cxk(), packm_herm_cxk.c(), and packm_tri_cxk()
  to accommodate above changes.
- Removed packm_blk_var3() and retired packm_blk_var2() to
  frame/1m/packm/old.
- Updated all level-3 _cntl_init() functions so that the new, more
  versatile packm_blk_var1 is used for all level-3 matrix packing.
2014-02-10 14:31:24 -06:00
Field G. Van Zee
6c80670287 Renamed enumerated type in testsuite and modules.
Details:
- Renamed the test suite's "mt_impl_t" enumerated type to "iface_t", and
  renamed all corresponding "impl" variables to "iface".
2014-02-07 11:27:15 -06:00
Field G. Van Zee
32cae66326 Fixed some instances of sloppy 'restrict' usage.
Details:
- Fixed some technical incorrectness with some usage of the 'restrict'
  keyword in the reference trsm micro-kernels.
- Tweak to testsuite/Makefile that causes rebuild if libblis was
  touched.
2014-02-06 18:06:42 -06:00
Field G. Van Zee
f8f67d7251 Typecast bli_getopt() return value in testsuite.
Details:
- In the test suite driver, inserted an explicit typecast of the return
  value of bli_getopt() prior parsing. The lack of typecast caused a
  problem on at least one system whereby a return value of -1 was
  interpreted as garbage character. Thanks to Francisco Igual for finding
  and submitting this fix.
2014-01-10 09:06:11 -06:00
Field G. Van Zee
89c76a8a51 Allow building outside source distribution.
Details:
- Modified build system (mostly configure and top-level Makefile) so that
  a user can build a BLIS library outside of the top-level directory of
  the source distribution.
- Added "test" target to Makefile so that the user can run "make test",
  which will compile, link, and run the testsuite binary. This works even
  if the build directory is externally located, thanks to the test suite
  binary's new -g and -o command-line options. Also, when creating the
  test suite via the top-level Makefile, the linking is against the
  local archive, in lib/<configname>, rather than at <install_prefix>/lib.
- Modified testsuite/Makefile so that it links against the library built
  locally, in ../lib/<configname>.
- Added "-lm" to LDFLAGS of most configurations' make_defs.mk.
- Various other cleanups to build system.
2014-01-09 12:08:37 -06:00
Field G. Van Zee
12fa82ec12 Implemented bli_getopt().
Details:
- Added bli_getopt.c and .h files to frame/base. These files implement
  a custom version of getopt(), which may be used to parse command line
  options passed into a program via argc/argv. I am implementing this
  function myself, as opposed to using the version available via unistd.h,
  for portability reasons, as the only requirements are string.h (which
  is available via the standard C library).
- Modified test suite to allow the user to specify the file name (and/or
  path) to the parameters and operations input files: -g may be used to
  specify the general input file and -o to specify the operations input
  file). If -g or -o or both are not given, default filenames are assumed
  (as well as their existence in the current directory).
2014-01-08 16:09:26 -06:00
Field G. Van Zee
2cb13600f9 Updated year in copyright headers to 2014. 2014-01-03 12:29:13 -06:00