Commit Graph

707 Commits

Author SHA1 Message Date
Devin Matthews
8945a1512d This version gets ~1550 GFLOPs on KNL wuth 16x4. 2016-08-03 11:28:24 -05:00
Devin Matthews
6ce4c022eb Switch back to 24x8. I could only squeeze 24.5GFLOP out of 8x24, and scalability is not improved. 2016-07-27 16:26:36 -05:00
Devin Matthews
b8f2b55532 Try an 8x24 kernel for the hell of it. 2016-07-27 15:22:55 -05:00
Devin Matthews
7ede5863ae Allocate pack buffer on MCDRAM for KNL. 2016-07-27 13:42:32 -06:00
Devin Matthews
ad89ed2e82 Merge branch 'knl' of github.com:devinamatthews/blis into knl 2016-07-27 11:45:40 -05:00
Devin Matthews
2c9de740ed This version gets ~26GF on one core. 2016-07-27 11:44:54 -05:00
Devin Matthews
81e2b05f31 Add optimized packing kernels for KNL. 2016-07-27 11:39:05 -05:00
Devin Matthews
a7d8ca97b8 All fixed. 2016-07-25 15:15:13 -05:00
Devin Matthews
963d0393b0 Add 24xk pack kernel. 2016-07-25 14:40:53 -05:00
Devin Matthews
117b76739a In the midst of debugging. 2016-07-25 13:53:07 -05:00
Devin Matthews
8c0a4fd1d3 Fix some row/column confusion. 2016-07-25 13:09:24 -05:00
Devin Matthews
c44f9f9693 Simplify displacements -- clang assembler was badly botching EVEX compressed displacements giving false alarms for instruction length. 2016-07-25 12:02:24 -05:00
Devin Matthews
e0cce177cc Minor fixes for 8x24 KNL kernel. 2016-07-25 10:02:25 -05:00
Devin Matthews
65735bbedf Switch to 24x8 kernel, unrolled by 16. 2016-07-24 21:50:32 -05:00
Devin Matthews
45d5dc9717 Add 24x8 "KNC-style" kernel for KNL. 2016-07-24 14:25:26 -05:00
Devin Matthews
8ff2e069c4 Add 4x unrolled variant for KNL microkernel. 2016-07-22 16:22:26 -05:00
Devin Matthews
9cb2ed9b0c Git rid of one RBX update. 2016-07-22 16:10:30 -05:00
Devin Matthews
451bde076f Add some more knobs to twiddle for KNL microkernel. 2016-07-22 15:43:00 -05:00
Devin Matthews
8c6e621c09 Make knl conform to new kernel dir structure. 2016-07-22 15:05:15 -05:00
Devin Matthews
ce7214c661 Merge remote-tracking branch 'origin/master' into knl 2016-07-22 14:59:53 -05:00
Field G. Van Zee
ce59f81108 Merge pull request #88 from devinamatthews/32bit-dim_t
Handle 32-bit dim_t in 64-bit microkernels.
2016-07-22 14:48:14 -05:00
Devin Matthews
707a2b7fac Somehow forgot the most important microkernel. 2016-07-22 13:49:44 -05:00
Devin Matthews
47ec045056 Merge remote-tracking branch 'upstream/master' into 32bit-dim_t 2016-07-22 13:45:23 -05:00
Devin Matthews
08f1d6b6fa Use 64-bit intermediate variable for k for architectures that do 64-bit loads in case dim_t is 32-bit. 2016-07-22 13:44:37 -05:00
Field G. Van Zee
ff41153f4e Merge pull request #86 from devinamatthews/haswell-vmovups
Remove alignment restrictions on C in haswell kernel.
2016-07-22 13:21:03 -05:00
Devin Matthews
e0d2fa0d83 Relax alignment restrictions for haswell sgemm. 2016-07-22 12:56:51 -05:00
Field G. Van Zee
f9214ced97 Merge pull request #85 from devinamatthews/qopenmp
Change -openmp to -fopenmp for icc.
2016-07-22 12:16:39 -05:00
Devin Matthews
ee2c139df6 Remove alignment restrictions on C in haswell kernel. 2016-07-22 12:06:03 -05:00
Devin Matthews
08666eaa20 Change -openmp to -fopenmp for icc. 2016-07-22 11:07:34 -05:00
Devin Matthews
119d039942 Add 8x24 KNL kernel. 2016-07-22 10:23:31 -05:00
Devin Matthews
b58cda9eba Merge remote-tracking branch 'origin/master' into knl
# Conflicts:
#	frame/base/bli_threading.h
#	frame/include/blis.h
#	frame/thread/bli_thread.c
2016-07-19 14:09:09 -05:00
Field G. Van Zee
413d62aca2 README update (use official ACM TOMS links). 2016-07-12 15:02:52 -05:00
Field G. Van Zee
dfa431f696 README update (BLIS2 TOMS article now in-print). 2016-07-12 14:21:19 -05:00
Field G. Van Zee
232754feec Fixed compiler warning in rand[vm], randn[vm].
Details:
- Fixed compiler warnings about unused variables related to the disabling
  of normalization in the structured cases of the rand[vm] and randn[vm]
  operations.
2016-06-21 14:25:39 -05:00
Field G. Van Zee
a89555d160 Added randn[vm] operations, support in testsuite.
Details:
- Defined a new randomization operation, randn, on vectors and matrices.
  The randnv and randnm operations randomize each element of the target
  object with values from a narrow range of values. Presently, those
  values are all integer powers of two, but they do not need to be powers
  of two in order to achieve the primary goal, which is to initialize
  objects that can be operated on with plenty of precision "slack"
  available to allow computations that avoid roundoff. Using this method
  of randomization makes it much more likely that testsuite residuals of
  properly-functioning operations are close to zero, if not exactly zero.
- Updated existing randomization operations randv and randm to skip
  special diagonal handling and normalization for matrices with structure.
  This is now handled by the testsuite modules by explicitly calling a
  testsuite function that loads the diagonal (and scales off-diagonal
  elements).
- Added support for randnv and randnm in the testsuite with a new switch
  in input.general that universally toggles between use of the classic
  randv/randm, which use real values on the interval [-1,1], and
  randnv/randnm, which use only values from a narrow range. Currently,
  the narrow range is: +/-{2^0, 2^-1, 2^-2, 2^-3, 2^-4, 2^-5, 2^-6}, as
  well as 0.0.
- Updated testsuite modules so that a testsutie wrapper function is called
  instead of directly calling the randomization operations (such as
  bli_randv() and bli_randm()). This wrapper also takes a bool_t that
  indicates whether the object's elements should be normalized. (NOTE: As
  alluded to above, in the test modules of triangular solve operations such
  as trsv and trsm, we perform the extra step of loading the diagonal.)
- Defined a new level-0 operation, invertsc, which inverts a scalar.
- Updated the abval2ris and sqrt2ris level-0 macros to avoid an unlikely
  but possible divide-by-zero.
- Updated function signature and prototype formatting in testsuite.
2016-06-17 14:08:35 -05:00
Devin Matthews
318f063dcb Add new KNL microkernel derived from Haswell. 2016-06-08 17:46:50 -05:00
Field G. Van Zee
096895c5d5 Reorganized code, APIs related to multithreading.
Details:
- Reorganized code and renamed files defining APIs related to multithreading.
  All code that is not specific to a particular operation is now located in a
  new directory: frame/thread. Code is now organized, roughly, by the
  namespace to which it belongs (see below).
- Consolidated all operation-specific *_thrinfo_t object types into a single
  thrinfo_t object type. Operation-specific level-3 *_thrinfo_t APIs were
  also consolidated, leaving bli_l3_thrinfo_*() and bli_packm_thrinfo_*()
  functions (aside from a few general purpose bli_thrinfo_*() functions).
- Renamed thread_comm_t object type to thrcomm_t.
- Renamed many of the routines and functions (and macros) for multithreading.
  We now have the following API namespaces:
  - bli_thrinfo_*(): functions related to thrinfo_t objects
  - bli_thrcomm_*(): functions related to thrcomm_t objects.
  - bli_thread_*(): general-purpose functions, such as initialization,
    finalization, and computing ranges. (For now, some macros, such as
    bli_thread_[io]broadcast() and bli_thread_[io]barrier() use the
    bli_thread_ namespace prefix, even though bli_thrinfo_ may be more
    appropriate.)
- Renamed thread-related macros so that they use a bli_ prefix.
- Renamed control tree-related macros so that they use a bli_ prefix (to be
  consistent with the thread-related macros that were also renamed).
- Removed #undef BLIS_SIMD_ALIGN_SIZE from dunnington's bli_kernel.h. This
  #undef was a temporary fix to some macro defaults which were being applied
  in the wrong order, which was recently fixed.
2016-06-06 13:32:04 -05:00
Tyler Michael Smith
232530e88f Merge commit 'refs/pull/81/head' of https://github.com/flame/blis
Conflicts:
	frame/base/bli_threading_pthreads.c
	frame/base/bli_threading_pthreads.h
2016-06-01 15:14:10 -05:00
Tyler Michael Smith
4bcabd1bf6 Use spin locks instead of pthread barriers 2016-06-01 13:27:28 -05:00
Jeff Hammond
eef37f8b4d use GCC intrinsic instead of pthread_mutex for atomic increment and fetch 2016-05-29 22:28:13 -07:00
Field G. Van Zee
9dcd6f05c4 Implemented developer-configurable malloc()/free().
Details:
- Replaced all instances of bli_malloc() and bli_free() with one of:
  - bli_malloc_pool()/bli_free_pool()
  - bli_malloc_user()/bli_free_user()
  - bli_malloc_intl()/bli_free_intl()
  each of which can be configured to call malloc()/free() substitutes,
  so long as the substitute functions have the same function type
  signatures as malloc() and free() defined by C's stdlib.h. The _pool()
  function is called when allocating blocks for the memory pools (used
  for packing buffers, primarily), the _user() function is called when
  obj_t's are created (via bli_obj_create() and friends), and the _intl()
  function is called for internal use by BLIS, such as when creating
  control tree nodes or temporary buffers for manipulating internal data
  structures. Substitutes for any of the three types of bli_malloc() may
  be specified by #defining the following pairs of cpp macros in
  bli_kernel.h:
  - BLIS_MALLOC_POOL/BLIS_FREE_POOL
  - BLIS_MALLOC_USER/BLIS_FREE_USER
  - BLIS_MALLOC_INTL/BLIS_FREE_INTL
  to be the name of the substitute functions. (Obviously, the object
  code that contains these functions must be provided at link-time.)
  These macros default to malloc() and free(). Subsitute functions are
  also automatically prototyped by BLIS (in bli_malloc_prototypes.h).
- Removed definitions for bli_malloc() and bli_free().
- Note that bli_malloc_pool() and bli_malloc_user() are now defined in
  terms of a new function, bli_malloc_align(), which aligns memory to an
  arbitrary (power of two) alignment boundary, but does so manually,
  whereas before alignment was performed behind the scenes by
  posix_memalign(). Currently, bli_malloc_intl() is defined in terms
  of bli_malloc_noalign(), which serves as a simple wrapper to the
  designated function that is passed in (e.g. BLIS_MALLOC_INTL).
  Similarly, there are bli_free_align() and bli_free_noalign(), which
  are used in concert with their bli_malloc_*() counterparts.
2016-05-24 13:15:32 -05:00
Jeff Hammond
9dd440109a fix 404 link to BuildSystem
Google Code is dead.  Long live GitHub!
2016-05-21 15:21:58 -07:00
Field G. Van Zee
d309f20b73 Added alignment switch to testsuite.
Details:
- Added a new input parameter to input.general that globally toggles
  whether testsuite tests are performed on objects whose buffers and
  leading dimensions have been aligned, and changed the implementation
  of libblis_test_mobj_create() to employ alignment (or not) regardless
  of whether row, column, or general storage is being tested.
- Updated configure script's "--help" text to indicate default behavior
  for internal integer type size and BLAS/CBLAS integer type size
  options.
2016-05-18 15:13:53 -05:00
Field G. Van Zee
32db0adc21 Generate prototypes for user-defined packm kernels.
Details:
- Created template prototypes for packm kernels (in bli_l1m_ker.h), and
  then redefined reference packm kernels' prototyping headers in terms of
  this template, as is already done for level-1v, -1f, and -3 kernels.
- Automatically generate prototypes for user-defined packm kernels in
  bli_kernel_prototypes.h (using the new template prototypes in
  bli_l1m_ker.h).
- Defined packm kernel function types in bli_l1m_ft.h, including for
  packm kernels specific to induced methods, which are now used in
  bli_packm_cxk.c and friends rather than using a locally-defined
  function type.
- In bli_packm_cxk.c, extended function pointer for packm kernels array
  from out to index 31 (from previous maximum of 17). This allows us to
  store the unrolled 30xk kernel in the array for use (on knc, for
  example). Note: This should have been done a long time ago.
2016-05-17 15:20:16 -05:00
Devin Matthews
e3bd5ca64a Fix SIMD definitions in KNL config, and a couple of fixes to C update. 2016-05-12 20:54:13 -05:00
Devin Matthews
4fe02e3d49 Move bli_kernel.h before bli_threading.h in order of inclusion in blis.h. 2016-05-12 20:53:58 -05:00
Field G. Van Zee
4bcf1b35ab Fixed bli_get_range_*() bugs in trsm variants.
Details:
- Fixed incorrect calls to bli_get_range_*() from within trsm blocked
  variants 1f, 2b, and 2f. The bug somehow went undetected since the
  big commit (537a1f4), and, strangely, did not manifest via the BLIS
  testsuite. The bug finally came to our attention when running thei
  libflame test suite while linking to BLIS. Thanks to Kiran Varaganti
  for submitting the initial report that led to this bug.
2016-05-11 16:09:49 -05:00
Field G. Van Zee
9cfa33023f Minor updates to bli_f2c.h.
Details:
- Added #undef guards to certain #define statements in bli_f2c.h,
  and renamed the file guard to BLIS_F2C_H. This helps when
  #including "blis.h" from an application or library that already
  #includes an "f2c.h" header.
2016-05-11 16:02:30 -05:00
Tyler Michael Smith
a09a2e23ea Merge pull request #76 from devinamatthews/move_simd_defs
Move default SIMD-related definitions to bli_kernel_macro_defs.h
2016-05-11 10:47:11 -05:00
Tyler Smith
4dcd37eb1b fixing knc simd align size 2016-05-10 16:28:59 -05:00