Commit Graph

676 Commits

Author SHA1 Message Date
praveeng
50a2f2efcb Merge master code as on 2016_07_25 to amd-staging branch by praveeng
Change-Id: I84886ae241db2aac0bef6b7ef399f04aa8bca16d
2016-07-25 17:07:38 +05:30
praveeng
cfd46c88d5 Merge remote-tracking branch 'publicrepo/master' 2016-07-25 15:38:13 +05:30
praveeng
f493bf4d70 removed changes from readme file which are giving confilcts
Change-Id: Ic71ad1313e1404fed444e899466043704d875af6
2016-07-25 14:14:00 +05:30
Field G. Van Zee
a017062fdf Integrated "memory broker" (membrk_t) abstraction.
Details:
- Integrated a patch originally authored and submitted by Ricardo Magana
  of HP Enterprise. The changeset inserts use of a new object type, membrk_t,
  (memory broker) that allows multiple sets of memory pools on, for example,
  separate NUMA nodes, each of which has a separate memory space.
- Added membrk field to cntx_t and defined corresponding accessor macros.
- Added membrk field to mem_t object and defined corresponding accessor macros.
- Created new bli_membrk.c file, which contains the new memory broker API,
  including:
    bli_membrk_init(), bli_membrk_finalize()
    bli_membrk_acquire_[mv](), bli_membrk_release(),
    bli_membrk_init_pools(), bli_membrk_reinit_pools(),
    bli_membrk_finalize_pools(),
    bli_membrk_pool_size()
- In bli_mem.c, changed function calls to
    bli_mem_init_pools()     -> bli_membrk_init()
    bli_mem_reinit_pools()   -> bli_membrk_reinit()
    bli_mem_finalize_pools() -> bli_membrk_finalize()
- In bli_packv_init.c, bli_packm_init.c, changed function calls to:
    bli_mem_acquire_[mv]() -> bli_membrk_acquire_[mv]()
    bli_mem_release()      -> bli_membrk_release()
- Added bli_mutex.c and related files to frame/thread. These files define
  abstract mutexes (locks) and corresponding APIs for pthreads, openmp, or
  single-threaded execution. This new API is employed within functions
  such as bli_membrk_acquire_[mv]() and bli_membrk_release().
2016-07-22 17:02:59 -05:00
Field G. Van Zee
ce59f81108 Merge pull request #88 from devinamatthews/32bit-dim_t
Handle 32-bit dim_t in 64-bit microkernels.
2016-07-22 14:48:14 -05:00
Devin Matthews
707a2b7fac Somehow forgot the most important microkernel. 2016-07-22 13:49:44 -05:00
Devin Matthews
47ec045056 Merge remote-tracking branch 'upstream/master' into 32bit-dim_t 2016-07-22 13:45:23 -05:00
Devin Matthews
08f1d6b6fa Use 64-bit intermediate variable for k for architectures that do 64-bit loads in case dim_t is 32-bit. 2016-07-22 13:44:37 -05:00
Field G. Van Zee
ff41153f4e Merge pull request #86 from devinamatthews/haswell-vmovups
Remove alignment restrictions on C in haswell kernel.
2016-07-22 13:21:03 -05:00
Devin Matthews
e0d2fa0d83 Relax alignment restrictions for haswell sgemm. 2016-07-22 12:56:51 -05:00
Field G. Van Zee
f9214ced97 Merge pull request #85 from devinamatthews/qopenmp
Change -openmp to -fopenmp for icc.
2016-07-22 12:16:39 -05:00
Devin Matthews
ee2c139df6 Remove alignment restrictions on C in haswell kernel. 2016-07-22 12:06:03 -05:00
Devin Matthews
08666eaa20 Change -openmp to -fopenmp for icc. 2016-07-22 11:07:34 -05:00
praveeng
1aa77dfc1d Merge master code as on 2016_07_21 to amd-staging branch by praveeng
Change-Id: Ic7d0a21101358f08147736e7f1884e7409937344
2016-07-21 14:23:41 +05:30
praveeng
ec9f59836b Merge branch 'master' of https://github.com/clMathLibraries/blis-amd 2016-07-18 12:56:25 +05:30
praveeng
197e182fcb first commit
Change-Id: Ib50c81acda3b2c1583da3d421efc0ca547ef68e2
2016-07-18 12:55:10 +05:30
praveeng
41fb327110 small modification to readme for git push test
Change-Id: I68506a49586b07eaa907f3f85304ee40d4c92d0a
2016-07-18 12:55:10 +05:30
sthangar
9101a9c880 Checked in optimized 1V kernels along with benchmark codes. Also incorporated review comments for 1F kernels
Change-Id: I035c0d39e6b0bed28e6e2041242186c49f6ed55b
2016-07-13 16:51:14 +05:30
praveeng
763babe488 Merge remote-tracking branch 'publirepo/master' 2016-07-13 11:57:19 +05:30
Field G. Van Zee
413d62aca2 README update (use official ACM TOMS links). 2016-07-12 15:02:52 -05:00
Field G. Van Zee
dfa431f696 README update (BLIS2 TOMS article now in-print). 2016-07-12 14:21:19 -05:00
praveeng
357c990bdd first commit
Change-Id: Ib50c81acda3b2c1583da3d421efc0ca547ef68e2
2016-07-05 16:51:23 +05:30
praveeng
8aee306300 small modification to readme for git push test
Change-Id: I68506a49586b07eaa907f3f85304ee40d4c92d0a
2016-07-05 15:00:31 +05:30
sthangar
405c9d4634 Check-in the fused kernels optimized for Zen
Change-Id: I7b2f467b960e7b9a285f06e47be87de122e5fa24
2016-06-22 12:18:54 +05:30
Field G. Van Zee
232754feec Fixed compiler warning in rand[vm], randn[vm].
Details:
- Fixed compiler warnings about unused variables related to the disabling
  of normalization in the structured cases of the rand[vm] and randn[vm]
  operations.
2016-06-21 14:25:39 -05:00
Field G. Van Zee
a89555d160 Added randn[vm] operations, support in testsuite.
Details:
- Defined a new randomization operation, randn, on vectors and matrices.
  The randnv and randnm operations randomize each element of the target
  object with values from a narrow range of values. Presently, those
  values are all integer powers of two, but they do not need to be powers
  of two in order to achieve the primary goal, which is to initialize
  objects that can be operated on with plenty of precision "slack"
  available to allow computations that avoid roundoff. Using this method
  of randomization makes it much more likely that testsuite residuals of
  properly-functioning operations are close to zero, if not exactly zero.
- Updated existing randomization operations randv and randm to skip
  special diagonal handling and normalization for matrices with structure.
  This is now handled by the testsuite modules by explicitly calling a
  testsuite function that loads the diagonal (and scales off-diagonal
  elements).
- Added support for randnv and randnm in the testsuite with a new switch
  in input.general that universally toggles between use of the classic
  randv/randm, which use real values on the interval [-1,1], and
  randnv/randnm, which use only values from a narrow range. Currently,
  the narrow range is: +/-{2^0, 2^-1, 2^-2, 2^-3, 2^-4, 2^-5, 2^-6}, as
  well as 0.0.
- Updated testsuite modules so that a testsutie wrapper function is called
  instead of directly calling the randomization operations (such as
  bli_randv() and bli_randm()). This wrapper also takes a bool_t that
  indicates whether the object's elements should be normalized. (NOTE: As
  alluded to above, in the test modules of triangular solve operations such
  as trsv and trsm, we perform the extra step of loading the diagonal.)
- Defined a new level-0 operation, invertsc, which inverts a scalar.
- Updated the abval2ris and sqrt2ris level-0 macros to avoid an unlikely
  but possible divide-by-zero.
- Updated function signature and prototype formatting in testsuite.
2016-06-17 14:08:35 -05:00
Field G. Van Zee
096895c5d5 Reorganized code, APIs related to multithreading.
Details:
- Reorganized code and renamed files defining APIs related to multithreading.
  All code that is not specific to a particular operation is now located in a
  new directory: frame/thread. Code is now organized, roughly, by the
  namespace to which it belongs (see below).
- Consolidated all operation-specific *_thrinfo_t object types into a single
  thrinfo_t object type. Operation-specific level-3 *_thrinfo_t APIs were
  also consolidated, leaving bli_l3_thrinfo_*() and bli_packm_thrinfo_*()
  functions (aside from a few general purpose bli_thrinfo_*() functions).
- Renamed thread_comm_t object type to thrcomm_t.
- Renamed many of the routines and functions (and macros) for multithreading.
  We now have the following API namespaces:
  - bli_thrinfo_*(): functions related to thrinfo_t objects
  - bli_thrcomm_*(): functions related to thrcomm_t objects.
  - bli_thread_*(): general-purpose functions, such as initialization,
    finalization, and computing ranges. (For now, some macros, such as
    bli_thread_[io]broadcast() and bli_thread_[io]barrier() use the
    bli_thread_ namespace prefix, even though bli_thrinfo_ may be more
    appropriate.)
- Renamed thread-related macros so that they use a bli_ prefix.
- Renamed control tree-related macros so that they use a bli_ prefix (to be
  consistent with the thread-related macros that were also renamed).
- Removed #undef BLIS_SIMD_ALIGN_SIZE from dunnington's bli_kernel.h. This
  #undef was a temporary fix to some macro defaults which were being applied
  in the wrong order, which was recently fixed.
2016-06-06 13:32:04 -05:00
Tyler Michael Smith
232530e88f Merge commit 'refs/pull/81/head' of https://github.com/flame/blis
Conflicts:
	frame/base/bli_threading_pthreads.c
	frame/base/bli_threading_pthreads.h
2016-06-01 15:14:10 -05:00
Tyler Michael Smith
4bcabd1bf6 Use spin locks instead of pthread barriers 2016-06-01 13:27:28 -05:00
Jeff Hammond
eef37f8b4d use GCC intrinsic instead of pthread_mutex for atomic increment and fetch 2016-05-29 22:28:13 -07:00
Field G. Van Zee
9dcd6f05c4 Implemented developer-configurable malloc()/free().
Details:
- Replaced all instances of bli_malloc() and bli_free() with one of:
  - bli_malloc_pool()/bli_free_pool()
  - bli_malloc_user()/bli_free_user()
  - bli_malloc_intl()/bli_free_intl()
  each of which can be configured to call malloc()/free() substitutes,
  so long as the substitute functions have the same function type
  signatures as malloc() and free() defined by C's stdlib.h. The _pool()
  function is called when allocating blocks for the memory pools (used
  for packing buffers, primarily), the _user() function is called when
  obj_t's are created (via bli_obj_create() and friends), and the _intl()
  function is called for internal use by BLIS, such as when creating
  control tree nodes or temporary buffers for manipulating internal data
  structures. Substitutes for any of the three types of bli_malloc() may
  be specified by #defining the following pairs of cpp macros in
  bli_kernel.h:
  - BLIS_MALLOC_POOL/BLIS_FREE_POOL
  - BLIS_MALLOC_USER/BLIS_FREE_USER
  - BLIS_MALLOC_INTL/BLIS_FREE_INTL
  to be the name of the substitute functions. (Obviously, the object
  code that contains these functions must be provided at link-time.)
  These macros default to malloc() and free(). Subsitute functions are
  also automatically prototyped by BLIS (in bli_malloc_prototypes.h).
- Removed definitions for bli_malloc() and bli_free().
- Note that bli_malloc_pool() and bli_malloc_user() are now defined in
  terms of a new function, bli_malloc_align(), which aligns memory to an
  arbitrary (power of two) alignment boundary, but does so manually,
  whereas before alignment was performed behind the scenes by
  posix_memalign(). Currently, bli_malloc_intl() is defined in terms
  of bli_malloc_noalign(), which serves as a simple wrapper to the
  designated function that is passed in (e.g. BLIS_MALLOC_INTL).
  Similarly, there are bli_free_align() and bli_free_noalign(), which
  are used in concert with their bli_malloc_*() counterparts.
2016-05-24 13:15:32 -05:00
Jeff Hammond
9dd440109a fix 404 link to BuildSystem
Google Code is dead.  Long live GitHub!
2016-05-21 15:21:58 -07:00
Field G. Van Zee
d309f20b73 Added alignment switch to testsuite.
Details:
- Added a new input parameter to input.general that globally toggles
  whether testsuite tests are performed on objects whose buffers and
  leading dimensions have been aligned, and changed the implementation
  of libblis_test_mobj_create() to employ alignment (or not) regardless
  of whether row, column, or general storage is being tested.
- Updated configure script's "--help" text to indicate default behavior
  for internal integer type size and BLAS/CBLAS integer type size
  options.
2016-05-18 15:13:53 -05:00
Field G. Van Zee
32db0adc21 Generate prototypes for user-defined packm kernels.
Details:
- Created template prototypes for packm kernels (in bli_l1m_ker.h), and
  then redefined reference packm kernels' prototyping headers in terms of
  this template, as is already done for level-1v, -1f, and -3 kernels.
- Automatically generate prototypes for user-defined packm kernels in
  bli_kernel_prototypes.h (using the new template prototypes in
  bli_l1m_ker.h).
- Defined packm kernel function types in bli_l1m_ft.h, including for
  packm kernels specific to induced methods, which are now used in
  bli_packm_cxk.c and friends rather than using a locally-defined
  function type.
- In bli_packm_cxk.c, extended function pointer for packm kernels array
  from out to index 31 (from previous maximum of 17). This allows us to
  store the unrolled 30xk kernel in the array for use (on knc, for
  example). Note: This should have been done a long time ago.
2016-05-17 15:20:16 -05:00
Field G. Van Zee
4bcf1b35ab Fixed bli_get_range_*() bugs in trsm variants.
Details:
- Fixed incorrect calls to bli_get_range_*() from within trsm blocked
  variants 1f, 2b, and 2f. The bug somehow went undetected since the
  big commit (537a1f4), and, strangely, did not manifest via the BLIS
  testsuite. The bug finally came to our attention when running thei
  libflame test suite while linking to BLIS. Thanks to Kiran Varaganti
  for submitting the initial report that led to this bug.
2016-05-11 16:09:49 -05:00
Field G. Van Zee
9cfa33023f Minor updates to bli_f2c.h.
Details:
- Added #undef guards to certain #define statements in bli_f2c.h,
  and renamed the file guard to BLIS_F2C_H. This helps when
  #including "blis.h" from an application or library that already
  #includes an "f2c.h" header.
2016-05-11 16:02:30 -05:00
Tyler Michael Smith
a09a2e23ea Merge pull request #76 from devinamatthews/move_simd_defs
Move default SIMD-related definitions to bli_kernel_macro_defs.h
2016-05-11 10:47:11 -05:00
Tyler Smith
4dcd37eb1b fixing knc simd align size 2016-05-10 16:28:59 -05:00
Devin Matthews
7c604e1cbc Move default SIMD-related definitions to bli_kernel_macro_defs.h. Otherwise, configurations which customize these fail as these are now defined in bli_kernel.h. 2016-05-10 12:11:55 -05:00
Field G. Van Zee
a7be2d28e8 Merge pull request #74 from devinamatthews/fix_common_symbols
Default-initialize all extern global variables to avoid generating common symbols.
2016-05-10 11:48:51 -05:00
Devin Matthews
4b1e55edbf Default-initialize all extern global variables to avoid generating common symbols. Fixes #73. 2016-05-10 10:08:47 -05:00
Field G. Van Zee
97b512ef62 Include headers from cblas.h to pull in f77_int.
Details:
- Added #include statements for certain key BLIS headers so that the
  definition of f77_int is pulled in when a user compiles application
  code with only #include "cblas.h" (and no other BLIS header). This
  is necessary since f77_int is now used within the cblas API.
2016-05-06 10:24:30 -05:00
Field G. Van Zee
c3a4d39d03 Updates to haswell gemm micro-kernels.
Details:
- Added two new sets of [sd]gemm micro-kernels for haswell architectures,
  one that is 4x24/4x12 (s and d) and one that is 6x16/6x8.
- Changed the haswell configuration to use the 6x16/6x8 micro-kernels
  by default.
- Updated various Makefiles, in test, test/3m4m, and testsuite.
2016-05-04 17:22:56 -05:00
Field G. Van Zee
0b01d355ae Miscellaneous cleanups, fixes to recent commits.
Details:
- Fixed a typo in bli_l1f_ref.h, introduced into bbb8569, that only
  manifested when non-reference level-1f kernels were used.
- Added an #undef BLIS_SIMD_ALIGN_SIZE to bli_kernel.h of dunnington
  configuration to prevent a compile-time warning until I can figure out
  the proper permanent fix.
- Moved frame/1f/kernels/bli_dotxaxpyf_ref_var1.c out of the compilation
  path (into 'other' directory). _ref_var2 is used by default, which is
  the variant that is built on axpyf and dotxf instead of dotaxpyv.
- Removed section of frame/include/bli_config_macro_defs.h pertaining to
  mixed datatype support.
2016-04-27 15:21:10 -05:00
Field G. Van Zee
ed7326c836 Added 'restrict' to l1v/l1f code in 'kernels' dir.
Details:
- Added 'restrict' keyword to existing kernel definitions in 'kernels'
  directory. These changes were meant for inclusion in bbb8569.
2016-04-27 14:57:40 -05:00
Field G. Van Zee
bbb8569b2a Use 'restrict' in all kernel APIs; wspace changes.
Details:
- Updated level-1v, level-1f kernel function types (bli_l1?_ft.h) and
  generic kernel prototypes (bli_l1?_ker.h) to use 'restrict' for all
  numerical operand pointers (ie: all pointers except the cntx_t).
- Updated level-1f reference kernel definitions to use 'restrict' for
  all numerical operand pointers. (Level-1v reference kernel definitions
  were already updated in bdbda6e.)
- Rewrote the level-1v and level-1f reference kernel prototypes in
  bli_l1v_ref.h and bli_l1f_ref.h, respectively, to simply #include
  bli_l1v_ker.h and bli_l1f_ker.h with redefined function base names
  (as was already being done for the level-3 micro-kernel prototypes
  in bli_l3_ref.h), rather than duplicate the signatures from the
  _ker.h files.
- Added definitions to frame/include/bli_kernel_prototypes.h for axpbyv
  and xpbyv, which were probably meant for inclusion in bdbda6e.
- Converted a number of instances of four spaces, as introduced in
  bdbda6e, to tabs.
2016-04-27 14:13:46 -05:00
Field G. Van Zee
4ea419c72c Merge pull request #70 from devinamatthews/daxpby
Give the level1v operations some love
2016-04-26 12:50:45 -05:00
Devin Matthews
bdbda6e6ac Give the level1v operations some love:
- Add missing axpby and xpby operations (plus test cases).
- Add special case for scal2v with alpha=1.
- Add restrict qualifiers.
- Add special-case algorithms for incx=incy=1.
2016-04-25 11:05:57 -05:00
Field G. Van Zee
f1e9be2aba Minor tweak to test/Makefile.
Details:
- Just committing a minor change to test/Makefile that has been lingering
  in my local working copy for longer than I can remember.
2016-04-22 15:34:02 -05:00
Field G. Van Zee
aa0bceec27 Merge branch 'master' of github.com:flame/blis 2016-04-22 12:01:31 -05:00