Commit Graph

687 Commits

Author SHA1 Message Date
Field G. Van Zee
87fddeab3c Merge branch 'compose' 2016-10-05 13:35:01 -05:00
Field G. Van Zee
6f71cd3449 Merge pull request #94 from flame/distcomm
Implemented distributed thrinfo_t management.
2016-10-04 15:53:46 -05:00
Field G. Van Zee
86969873b5 Reclassified amaxv operation as a level-1v kernel.
Details:
- Moved amaxv from being a utility operation to being a level-1v operation.
  This includes the establishment of a new amaxv kernel to live beside all
  of the other level-1v kernels.
- Added two new functions to bli_part.c:
    bli_acquire_mij()
    bli_acquire_vi()
  The first acquires a scalar object for the (i,j) element of a matrix,
  and the second acquires a scalar object for the ith element of a vector.
- Added integer support to bli_getsc level-0 operation. This involved
  adding integer support to the bli_*gets level-0 scalar macros.
- Added a new test module to test amaxv as a level-1v operation. The test
  module works by comparing the value identified by bli_amaxv() to the
  the value found from a reference-like code local to the test module
  source file. In other words, it (intentionally) does not guarantee the
  same index is found; only the same value. This allows for different
  implementations in the case where a vector contains two or more elements
  containing exactly the same floating point value (or values, in the case
  of the complex domain).
- Removed the directory frame/include/old/.
2016-10-04 14:24:59 -05:00
Field G. Van Zee
8d55033c96 Implemented distributed thrinfo_t management.
Details:
- Implemented Ricardo Magana's distributed thread info/communicator
  management. Rather that fully construct the thrinfo_t structures, from
  root to leaf, prior to spawning threads, the threads individually
  construct their thrinfo_t trees (or, chains), and do so incrementally,
  as needed, reusing the same structure nodes during subsequent blocked
  variant iterations. This required moving the initial creation of the
  thrinfo_t structure (now, the root nodes) from the _front() functions
  to the bli_l3_thread_decorator(). The incremental "growing" of the tree
  is performed in the internal back-end (ie: _int()) function, and so
  mostly invisible. Also, the incremental growth of the thrinfo_t tree is
  done as a function of the current and parent control tree nodes (as well
  as the parent thrinfo_t node), further reinforcing the parallel
  relationship between the two data structures.
- Removed the "inner" communicator from thrinfo_t structure definition,
  as well as its id. Changed all APIs accordingly. Renamed
  bli_thrinfo_needs_free_comms() to bli_thrinfo_needs_free_comm().
- Defined bli_l3_thrinfo_print_paths(), which prints the information
  in an array of thrinfo_t* structure pointers. (Used only as a
  debugging/verification tool.)
- Deprecated the following thrinfo_t creation functions:
    bli_packm_thrinfo_create()
    bli_l3_thrinfo_create()
  because they are no longer used. bli_thrinfo_create() is now called
  directly when creating thrinfo_t nodes.
2016-09-27 15:20:58 -05:00
Field G. Van Zee
fd04869ae4 Changed configure's 'omp' threading to 'openmp'.
Details:
- Changed the configure script so that the expected string argument to the
  -t (or --enable-threading=) option that enables OpenMP multithreading is
  'openmp'. The previous expected string, 'omp', is still supported but
  should be considered deprecated.
2016-09-27 14:14:11 -05:00
Field G. Van Zee
9424af8720 Merge branch 'compose' 2016-09-27 12:51:08 -05:00
Field G. Van Zee
efa7341df0 Merge pull request #92 from ShadenSmith/readme_fix
Fixes broken URL in README.md
2016-09-16 11:01:57 -05:00
Shaden Smith
e1453f68f6 Fixes broken URL in README.md 2016-09-16 09:29:28 -05:00
Field G. Van Zee
c0630c4024 Added debugging printf()'s to bli_l3_thrinfo.c.
Details:
- Added optional printf() statements to print out thread communicator
  info as the thrinfo_t structure is built in bli_l3_thrinfo.c.
- Minor changes to frame/thread/bli_thrinfo.h.
2016-09-12 13:59:02 -05:00
Field G. Van Zee
7b3bf1ffcd Merge branch 'master' into compose 2016-09-06 15:47:13 -05:00
Field G. Van Zee
121c39d455 Added complex gemm micro-kernels for haswell.
Details:
- Defined cgemm (3x8) and zgemm (3x4) micro-kernels for haswell-based
  architectures. As with their real domain brethren, these kernels perfer
  row storage, (though this doesn't affect most users due to high-level
  optimizations in most level-3 operations that induce a transpose to
  whatever storage preference the kernel may have).
2016-09-05 13:11:42 -05:00
Field G. Van Zee
35509818cb Added, moved some thread barriers.
Details:
- Removed thread barriers from the end of the loop bodies of
  bli_gemm_blk_var1(), bli_gemm_blk_var2(), bli_trsm_blk_var1(),
  and bli_trsm_blk_var2().
- Moved the thread barrier at the end of bli_packm_int() to the
  end of bli_l3_packm(), and added missing barriers to that function.
- Removed the no longer necessary (and now incorrect) ochief guard
  in bli_gemm3m3_packa() on the bli_obj_scalar_reset() on C.
- Thanks to Tyler Smith for help with these changes.
2016-08-31 17:34:15 -05:00
Field G. Van Zee
abd61f9fa7 Updated BLIS4 TOMS citation in README.md. 2016-08-30 12:34:19 -05:00
Field G. Van Zee
701b9aa3ff Redesigned control tree infrastructure.
Details:
- Altered control tree node struct definitions so that all nodes have the
  same struct definition, whose primary fields consist of a blocksize id,
  a variant function pointer, a pointer to an optional parameter struct,
  and a pointer to a (single) sub-node. This unified control tree type is
  now named cntl_t.
- Changed the way control tree nodes are connected, and what computation
  they represent, such that, for example, packing operations are now
  associated with nodes that are "inline" in the tree, rather than off-
  shoot braches. The original tree for the classic Goto gemm algorithm was
  expressed (roughly) as:

    blk_var2 -> blk_var3 -> blk_var1 -> ker_var2
                         |           |
                         -> packb    -> packa

  and now, the same tree would look like:

    blk_var2 -> blk_var3 -> packb -> blk_var1 -> packa -> ker_var2

  Specifically, the packb and packa nodes perform their respective packing
  operations and then recurse (without any loop) to a subproblem. This means
  there are now two kinds of level-3 control tree nodes: partitioning and
  non-partitioning. The blocked variants are members of the former, because
  they iteratively partition off submatrices and perform suboperations on
  those partitions, while the packing variants belong to the latter group.
  (This change has the effect of allowing greatly simplified initialization
  of the nodes, which previously involved setting many unused node fields to
  NULL.)
- Changed the way thrinfo_t tree nodes are arranged to mirror the new
  connective structure of control trees. That is, packm nodes are no longer
  off-shoot branches of the main algorithmic nodes, but rather connected
  "inline".
- Simplified control tree creation functions. Partitioning nodes are created
  concisely with just a few fields needing initialization. By contrast, the
  packing nodes require additional parameters, which are stored in a
  packm-specific struct that is tracked via the optional parameters pointer
  within the control tree struct. (This parameter struct must always begin
  with a uint64_t that contains the byte size of the struct. This allows
  us to use a generic function to recursively copy control trees.) gemm,
  herk, and trmm control tree creation continues to be consolidated into
  a single function, with the operation family being used to select
  among the parameter-agnostic macro-kernel wrappers. A single routine,
  bli_cntl_free(), is provided to free control trees recursively, whereby
  the chief thread within a groups release the blocks associated with
  mem_t entries back to the memory broker from which they were acquired.
- Updated internal back-ends, e.g. bli_gemm_int(), to query and call the
  function pointer stored in the current control tree node (rather than
  index into a local function pointer array). Before being invoked, these
  function pointers are first cast to a gemm_voft (for gemm, herk, or trmm
  families) or trsm_voft (for trsm family) type, which is defined in
  frame/3/bli_l3_var_oft.h.
- Retired herk and trmm internal back-ends, since all execution now flows
  through gemm or trsm blocked variants.
- Merged forwards- and backwards-moving variants by querying the direction
  from routines as a function of the variant's matrix operands. gemm and
  herk always move forward, while trmm and trsm move in a direction that
  is dependent on which operand (a or b) is triangular.
- Added functions bli_thread_get_range_mdim(), bli_thread_get_range_ndim(),
  each of which takes additional arguments and hides complexity in managing
  the difference between the way ranges are computed for the four families
  of operations.
- Simplified level-3 blocked variants according to the above changes, so that
  the only steps taken are:
  1. Query partitioning direction (forwards or backwards).
  2. Prune unreferenced regions, if they exist.
  3. Determine the thread partitioning sub-ranges.
  <begin loop>
    4. Determine the partitioning blocksize (passing in the partitioning
       direction)
    5. Acquire the curren iteration's partitions for the matrices affected
       by the current variants's partitioning dimension (m, k, n).
    6. Call the subproblem.
  <end loop>
- Instantiate control trees once per thread, per operation invocation.
  (This is a change from the previous regime in which control trees were
  treated as stateless objects, initialized with the library, and shared
  as read-only objects between threads.) This once-per-thread allocation
  is done primarily to allow threads to use the control tree as as place
  to cache certain data for use in subsequent loop iterations. Presently,
  the only application of this caching is a mem_t entry for the packing
  blocks checked out from the memory broker (allocator). If a non-NULL
  control tree is passed in by the (expert) user, then the tree is copied
  by each thread. This is done in bli_l3_thread_decorator(), in
  bli_thrcomm_*.c.
- Added a new field to the context, and opid_t which tracks the "family"
  of the operation being executed. For example, gemm, hemm, and symm are
  all part of the gemm family, while herk, syrk, her2k, and syr2k are
  all part of the herk family. Knowing the operation's family is necessary
  when conditionally executing the internal (beta) scalar reset on on
  C in blocked variant 3, which is needed for gemm and herk families,
  but must not be performed for the trmm family (because beta has only
  been applied to the current row-panel of C after the first rank-kc
  iteration).
- Reexpressed 3m3 induced method blocked variant in frame/3/gemm/ind
  to comform with the new control tree design, and renamed the macro-
  kernel codes corresponding to 3m2 and 4m1b.
- Renamed bli_mem.c (and its APIs) to bli_memsys.c, and renamed/relocated
  bli_mem_macro_defs.h from frame/include to frame/base/bli_mem.h.
- Renamed/relocated bli_auxinfo_macro_defs.h from frame/include to
  frame/base/bli_auxinfo.h.
- Fixed a minor bug whereby the storage-to-ukr-preference matching
  optimization in the various level-3 front-ends was not being applied
  properly when the context indicated that execution would be via an
  induced method. (Before, we always checked the native micro-kernel
  corresponding to the datatype being executed, whereas now we check
  the native micro-kernel corresponding to the datatype's real projection,
  since that is the micro-kernel that is actually used by induced methods.
- Added an option to the testsuite to skip the testing of native level-3
  complex implementations. Previously, it was always tested, provided that
  the c/z datatypes were enabled. However, some configurations use
  reference micro-kernels for complex datatypes, and testing these
  implementations can slow down the testsuite considerably.
2016-08-26 19:04:45 -05:00
Field G. Van Zee
73517f522b Merge branch 'master' into compose 2016-08-23 13:46:59 -05:00
Field G. Van Zee
50293da38d Avoid compiling BLAS/CBLAS files when disabled.
Details:
- Updated the top-level Makefile, build/config.mk.in template, and
  configure script so that object files corresponding to source files
  belonging to the BLAS compatibility layer are not compiled (or archived)
  when the compatibility layer is disabled. (Same for CBLAS.) Thanks
  to Devin Matthews for suggesting this optimization.
- Slight change to the way configure handles internal variables. Instead
  of converting (overwriting) some, such as enable_blas2blis and
  enable_cblas, from a "yes" or "no" to a "1" or "0" value, the latter are
  now stored in new variables that live alongside the originals (with the
  suffix "_01").  This is convenient since some values need to be
  sed-substituted into the config.mk.in template, which requires "yes" or
  "no", while some need to be written to the bli_config.h.in template,
  which requires "0" or "1".
2016-08-23 13:38:36 -05:00
Field G. Van Zee
c6f5c215ee Merge branch 'master' into compose 2016-08-22 17:33:02 -05:00
Field G. Van Zee
16a4c7a823 Fixed bugs in bli_mutex_init() and friends.
Details:
- Fixed a couple of bugs that affected OpenMP and POSIX threads
  configurations that resulted in compiler errors and warnings due
  to type mismatch, and in the case of pthreads, a missing function
  argument. The bugs are fairly recent, introduced in a017062.
2016-08-19 11:38:36 -05:00
Field G. Van Zee
d52cb76715 Merge branch 'master' into compose 2016-07-27 16:04:55 -05:00
Field G. Van Zee
c31b1e7b9d Relax alignment restrictions for sandybridge ukrs.
Details:
- Relaxed the base pointer and leading dimension alignment restrictions
  in the sandybridge gemm microkernels, allowing the use of vmovups/vmovupd
  instead of vmovaps/vmovapd. These change mimic those made to the haswell
  microkernels in e0d2fa0 and ee2c139.
- Updated testsuite modules as well as standalone test drivers in 'test'
  directory to use DBL_MAX as the initial time candidate. Thanks to Devin
  Matthews for suggesting this change.
- Inserted #include "float.h" into bli_system.h (to gain access to DBL_MAX).
- Minor update (vis-a-vis contexts) to driver code in test/3m4m.
2016-07-27 15:58:07 -05:00
Field G. Van Zee
95abea46f8 Merge branch 'master' into compose 2016-07-23 15:38:33 -05:00
Field G. Van Zee
a017062fdf Integrated "memory broker" (membrk_t) abstraction.
Details:
- Integrated a patch originally authored and submitted by Ricardo Magana
  of HP Enterprise. The changeset inserts use of a new object type, membrk_t,
  (memory broker) that allows multiple sets of memory pools on, for example,
  separate NUMA nodes, each of which has a separate memory space.
- Added membrk field to cntx_t and defined corresponding accessor macros.
- Added membrk field to mem_t object and defined corresponding accessor macros.
- Created new bli_membrk.c file, which contains the new memory broker API,
  including:
    bli_membrk_init(), bli_membrk_finalize()
    bli_membrk_acquire_[mv](), bli_membrk_release(),
    bli_membrk_init_pools(), bli_membrk_reinit_pools(),
    bli_membrk_finalize_pools(),
    bli_membrk_pool_size()
- In bli_mem.c, changed function calls to
    bli_mem_init_pools()     -> bli_membrk_init()
    bli_mem_reinit_pools()   -> bli_membrk_reinit()
    bli_mem_finalize_pools() -> bli_membrk_finalize()
- In bli_packv_init.c, bli_packm_init.c, changed function calls to:
    bli_mem_acquire_[mv]() -> bli_membrk_acquire_[mv]()
    bli_mem_release()      -> bli_membrk_release()
- Added bli_mutex.c and related files to frame/thread. These files define
  abstract mutexes (locks) and corresponding APIs for pthreads, openmp, or
  single-threaded execution. This new API is employed within functions
  such as bli_membrk_acquire_[mv]() and bli_membrk_release().
2016-07-22 17:02:59 -05:00
Field G. Van Zee
ce59f81108 Merge pull request #88 from devinamatthews/32bit-dim_t
Handle 32-bit dim_t in 64-bit microkernels.
2016-07-22 14:48:14 -05:00
Devin Matthews
707a2b7fac Somehow forgot the most important microkernel. 2016-07-22 13:49:44 -05:00
Devin Matthews
47ec045056 Merge remote-tracking branch 'upstream/master' into 32bit-dim_t 2016-07-22 13:45:23 -05:00
Devin Matthews
08f1d6b6fa Use 64-bit intermediate variable for k for architectures that do 64-bit loads in case dim_t is 32-bit. 2016-07-22 13:44:37 -05:00
Field G. Van Zee
ff41153f4e Merge pull request #86 from devinamatthews/haswell-vmovups
Remove alignment restrictions on C in haswell kernel.
2016-07-22 13:21:03 -05:00
Devin Matthews
e0d2fa0d83 Relax alignment restrictions for haswell sgemm. 2016-07-22 12:56:51 -05:00
Field G. Van Zee
f9214ced97 Merge pull request #85 from devinamatthews/qopenmp
Change -openmp to -fopenmp for icc.
2016-07-22 12:16:39 -05:00
Devin Matthews
ee2c139df6 Remove alignment restrictions on C in haswell kernel. 2016-07-22 12:06:03 -05:00
Devin Matthews
08666eaa20 Change -openmp to -fopenmp for icc. 2016-07-22 11:07:34 -05:00
Field G. Van Zee
d0dfe5b537 Merge branch 'master' into compose 2016-07-14 11:01:06 -05:00
Field G. Van Zee
413d62aca2 README update (use official ACM TOMS links). 2016-07-12 15:02:52 -05:00
Field G. Van Zee
dfa431f696 README update (BLIS2 TOMS article now in-print). 2016-07-12 14:21:19 -05:00
Field G. Van Zee
31def12e26 First phase of control tree redesign.
Details:
- These changes constitute the first set of changes in preparation to
  revamping the structure and use of control trees in BLIS. Modifications
  in this commit don't affect the control tree code yet, but rather lay
  the groundwork.
- Defined wrappers for the following functions, where the the wrappers
  each take a direction parameter of a new enumerated type (BLIS_BWD or
  BLIS_FWD), dir_t, and executes the correct underlying function.
  - bli_acquire_mpart_*() and _vpart_*()
  - bli_*_determine_kc_[fb]()
  - bli_thread_get_range_*() and bli_thread_get_range_weighted_*()
- Consolidated all 'f' (forwards-moving) and 'b' (backwards-moving)
  blocked variants for trmm and trsm, and renamed gemm and herk variants
  accordingly. The direction is now queried via routines such as
  bli_trmm_direct(), which deterines the direction from the implied side
  and uplo parameters. For gemm and herk, it is uncondtionally BLIS_FWD.
- Defined wrappers to parameter-specific macrokernels for herk, trmm, and
  trsm, e.g. bli_trmm_xx_ker_var2(), that execute the correct underlying
  macrokernel based on the implied parameters. The same logic used to
  choose the dir_t in _direct() functions is used here.
- Simplified the function pointer arrays in _int() functions given the
  consolidation and dir_t querying mentioned above.
- Function signature (whitespace) reformatting for various functions.
- Removed old code in various 'old' directories.
2016-06-30 15:19:20 -05:00
Field G. Van Zee
232754feec Fixed compiler warning in rand[vm], randn[vm].
Details:
- Fixed compiler warnings about unused variables related to the disabling
  of normalization in the structured cases of the rand[vm] and randn[vm]
  operations.
2016-06-21 14:25:39 -05:00
Field G. Van Zee
a89555d160 Added randn[vm] operations, support in testsuite.
Details:
- Defined a new randomization operation, randn, on vectors and matrices.
  The randnv and randnm operations randomize each element of the target
  object with values from a narrow range of values. Presently, those
  values are all integer powers of two, but they do not need to be powers
  of two in order to achieve the primary goal, which is to initialize
  objects that can be operated on with plenty of precision "slack"
  available to allow computations that avoid roundoff. Using this method
  of randomization makes it much more likely that testsuite residuals of
  properly-functioning operations are close to zero, if not exactly zero.
- Updated existing randomization operations randv and randm to skip
  special diagonal handling and normalization for matrices with structure.
  This is now handled by the testsuite modules by explicitly calling a
  testsuite function that loads the diagonal (and scales off-diagonal
  elements).
- Added support for randnv and randnm in the testsuite with a new switch
  in input.general that universally toggles between use of the classic
  randv/randm, which use real values on the interval [-1,1], and
  randnv/randnm, which use only values from a narrow range. Currently,
  the narrow range is: +/-{2^0, 2^-1, 2^-2, 2^-3, 2^-4, 2^-5, 2^-6}, as
  well as 0.0.
- Updated testsuite modules so that a testsutie wrapper function is called
  instead of directly calling the randomization operations (such as
  bli_randv() and bli_randm()). This wrapper also takes a bool_t that
  indicates whether the object's elements should be normalized. (NOTE: As
  alluded to above, in the test modules of triangular solve operations such
  as trsv and trsm, we perform the extra step of loading the diagonal.)
- Defined a new level-0 operation, invertsc, which inverts a scalar.
- Updated the abval2ris and sqrt2ris level-0 macros to avoid an unlikely
  but possible divide-by-zero.
- Updated function signature and prototype formatting in testsuite.
2016-06-17 14:08:35 -05:00
Field G. Van Zee
096895c5d5 Reorganized code, APIs related to multithreading.
Details:
- Reorganized code and renamed files defining APIs related to multithreading.
  All code that is not specific to a particular operation is now located in a
  new directory: frame/thread. Code is now organized, roughly, by the
  namespace to which it belongs (see below).
- Consolidated all operation-specific *_thrinfo_t object types into a single
  thrinfo_t object type. Operation-specific level-3 *_thrinfo_t APIs were
  also consolidated, leaving bli_l3_thrinfo_*() and bli_packm_thrinfo_*()
  functions (aside from a few general purpose bli_thrinfo_*() functions).
- Renamed thread_comm_t object type to thrcomm_t.
- Renamed many of the routines and functions (and macros) for multithreading.
  We now have the following API namespaces:
  - bli_thrinfo_*(): functions related to thrinfo_t objects
  - bli_thrcomm_*(): functions related to thrcomm_t objects.
  - bli_thread_*(): general-purpose functions, such as initialization,
    finalization, and computing ranges. (For now, some macros, such as
    bli_thread_[io]broadcast() and bli_thread_[io]barrier() use the
    bli_thread_ namespace prefix, even though bli_thrinfo_ may be more
    appropriate.)
- Renamed thread-related macros so that they use a bli_ prefix.
- Renamed control tree-related macros so that they use a bli_ prefix (to be
  consistent with the thread-related macros that were also renamed).
- Removed #undef BLIS_SIMD_ALIGN_SIZE from dunnington's bli_kernel.h. This
  #undef was a temporary fix to some macro defaults which were being applied
  in the wrong order, which was recently fixed.
2016-06-06 13:32:04 -05:00
Tyler Michael Smith
232530e88f Merge commit 'refs/pull/81/head' of https://github.com/flame/blis
Conflicts:
	frame/base/bli_threading_pthreads.c
	frame/base/bli_threading_pthreads.h
2016-06-01 15:14:10 -05:00
Tyler Michael Smith
4bcabd1bf6 Use spin locks instead of pthread barriers 2016-06-01 13:27:28 -05:00
Jeff Hammond
eef37f8b4d use GCC intrinsic instead of pthread_mutex for atomic increment and fetch 2016-05-29 22:28:13 -07:00
Field G. Van Zee
9dcd6f05c4 Implemented developer-configurable malloc()/free().
Details:
- Replaced all instances of bli_malloc() and bli_free() with one of:
  - bli_malloc_pool()/bli_free_pool()
  - bli_malloc_user()/bli_free_user()
  - bli_malloc_intl()/bli_free_intl()
  each of which can be configured to call malloc()/free() substitutes,
  so long as the substitute functions have the same function type
  signatures as malloc() and free() defined by C's stdlib.h. The _pool()
  function is called when allocating blocks for the memory pools (used
  for packing buffers, primarily), the _user() function is called when
  obj_t's are created (via bli_obj_create() and friends), and the _intl()
  function is called for internal use by BLIS, such as when creating
  control tree nodes or temporary buffers for manipulating internal data
  structures. Substitutes for any of the three types of bli_malloc() may
  be specified by #defining the following pairs of cpp macros in
  bli_kernel.h:
  - BLIS_MALLOC_POOL/BLIS_FREE_POOL
  - BLIS_MALLOC_USER/BLIS_FREE_USER
  - BLIS_MALLOC_INTL/BLIS_FREE_INTL
  to be the name of the substitute functions. (Obviously, the object
  code that contains these functions must be provided at link-time.)
  These macros default to malloc() and free(). Subsitute functions are
  also automatically prototyped by BLIS (in bli_malloc_prototypes.h).
- Removed definitions for bli_malloc() and bli_free().
- Note that bli_malloc_pool() and bli_malloc_user() are now defined in
  terms of a new function, bli_malloc_align(), which aligns memory to an
  arbitrary (power of two) alignment boundary, but does so manually,
  whereas before alignment was performed behind the scenes by
  posix_memalign(). Currently, bli_malloc_intl() is defined in terms
  of bli_malloc_noalign(), which serves as a simple wrapper to the
  designated function that is passed in (e.g. BLIS_MALLOC_INTL).
  Similarly, there are bli_free_align() and bli_free_noalign(), which
  are used in concert with their bli_malloc_*() counterparts.
2016-05-24 13:15:32 -05:00
Jeff Hammond
9dd440109a fix 404 link to BuildSystem
Google Code is dead.  Long live GitHub!
2016-05-21 15:21:58 -07:00
Field G. Van Zee
d309f20b73 Added alignment switch to testsuite.
Details:
- Added a new input parameter to input.general that globally toggles
  whether testsuite tests are performed on objects whose buffers and
  leading dimensions have been aligned, and changed the implementation
  of libblis_test_mobj_create() to employ alignment (or not) regardless
  of whether row, column, or general storage is being tested.
- Updated configure script's "--help" text to indicate default behavior
  for internal integer type size and BLAS/CBLAS integer type size
  options.
2016-05-18 15:13:53 -05:00
Field G. Van Zee
32db0adc21 Generate prototypes for user-defined packm kernels.
Details:
- Created template prototypes for packm kernels (in bli_l1m_ker.h), and
  then redefined reference packm kernels' prototyping headers in terms of
  this template, as is already done for level-1v, -1f, and -3 kernels.
- Automatically generate prototypes for user-defined packm kernels in
  bli_kernel_prototypes.h (using the new template prototypes in
  bli_l1m_ker.h).
- Defined packm kernel function types in bli_l1m_ft.h, including for
  packm kernels specific to induced methods, which are now used in
  bli_packm_cxk.c and friends rather than using a locally-defined
  function type.
- In bli_packm_cxk.c, extended function pointer for packm kernels array
  from out to index 31 (from previous maximum of 17). This allows us to
  store the unrolled 30xk kernel in the array for use (on knc, for
  example). Note: This should have been done a long time ago.
2016-05-17 15:20:16 -05:00
Field G. Van Zee
4bcf1b35ab Fixed bli_get_range_*() bugs in trsm variants.
Details:
- Fixed incorrect calls to bli_get_range_*() from within trsm blocked
  variants 1f, 2b, and 2f. The bug somehow went undetected since the
  big commit (537a1f4), and, strangely, did not manifest via the BLIS
  testsuite. The bug finally came to our attention when running thei
  libflame test suite while linking to BLIS. Thanks to Kiran Varaganti
  for submitting the initial report that led to this bug.
2016-05-11 16:09:49 -05:00
Field G. Van Zee
9cfa33023f Minor updates to bli_f2c.h.
Details:
- Added #undef guards to certain #define statements in bli_f2c.h,
  and renamed the file guard to BLIS_F2C_H. This helps when
  #including "blis.h" from an application or library that already
  #includes an "f2c.h" header.
2016-05-11 16:02:30 -05:00
Tyler Michael Smith
a09a2e23ea Merge pull request #76 from devinamatthews/move_simd_defs
Move default SIMD-related definitions to bli_kernel_macro_defs.h
2016-05-11 10:47:11 -05:00
Tyler Smith
4dcd37eb1b fixing knc simd align size 2016-05-10 16:28:59 -05:00
Devin Matthews
7c604e1cbc Move default SIMD-related definitions to bli_kernel_macro_defs.h. Otherwise, configurations which customize these fail as these are now defined in bli_kernel.h. 2016-05-10 12:11:55 -05:00