Commit Graph

828 Commits

Author SHA1 Message Date
sthangar
d6863e851a checked-in DNRM2 optimizations
Change-Id: I3b31d768bd7f4fbf43042aa5a0762995c73c4522
2016-11-21 11:30:30 +05:30
sthangar
9772218cae Added optimized DAMAX routines for Zen
Change-Id: I499c0c8f0f4ce6c19235c47b86d5608db6ba50f8
2016-11-16 15:19:19 +05:30
Santanu Thangaraj
9c448e3017 Merge "Added new optimized micro-kernel for dotxv routine" into amd-staging 2016-11-16 04:18:57 -05:00
praveeng
998d824044 Merge master code till devinamatthews/omp_num_thrds 2016_11_16 to amd-staging
Change-Id: I601ff1d3ec8a680e1be039ffc7b299744e8a27c5
2016-11-16 14:24:15 +05:30
Field G. Van Zee
6b5a4032d2 Merge pull request #109 from devinamatthews/omp_num_threads
Add automatic loop thread assignment.
2016-11-10 15:28:24 -06:00
Devin Matthews
a8220e3a86 - Fix typo in bli_cntx.c
- Bump BLIS_DEFAULT_NR_THREAD_MAX to 4
2016-11-10 14:19:34 -06:00
Kiran Varaganti
e35d3c23f2 Added new optimized micro-kernel for dotxv routine
Change-Id: I2c544e9b25a454d971ad690353502a55cd668391
2016-11-10 14:30:53 +05:30
praveeng
0d13e9a4f6 bli_kernel.h
Change-Id: I425d089f79497a0de7d1622e829c3ca9edf7f091
2016-11-07 14:40:41 +05:30
Devin Matthews
c05b3862f6 Add automatic loop thread assignment.
- Number of threads is determined by BLIS_NUM_THREADS or OMP_NUM_THREADS, but can be overridden by BLIS_XX_NT as before.
- Threads are assigned to loops (ic, jc, ir, and jc) automatically by weighted partitioning and heuristics, both of which are tunable via bli_kernel.h.
- All level-3 BLAS covered.
2016-11-04 15:48:02 -05:00
Field G. Van Zee
3b524a08e3 Consolidated 3m1/4m1 gemmtrsm, trsm ukernel code.
Details:
- Consolidated the macros that define the lower and upper versions of the
  gemmtrsm microkernels into a single macro that is instantiated twice.
  Did this for both 3m1 and 4m1 microkernels.
- Consolidated lower and upper versions of the trsm microkernels for 3m1
  and 4m1 into single files (each).
2016-11-02 17:45:18 -05:00
Field G. Van Zee
ead231aca6 Merge pull request #108 from devinamatthews/patch-2
Update .travis.yml with additional tests
2016-11-02 13:03:50 -05:00
Devin Matthews
62987f60a6 Allow KNL to fail 2016-11-02 11:20:37 -05:00
Devin Matthews
8f9010542c Fix some problems with OSX builds:
- Update CPU detection for Intel archs (esp. Skylake)
- Allow clang for the reference config
2016-11-02 11:18:32 -05:00
Field G. Van Zee
d25e6f8b63 Can disable trsm_r-specific blocksize constraints.
Details:
- Added cpp guards around the constraints in bli_kernel_macro_defs.h
  that enforce MC % NR = 0 and NC % MR = 0. These constraints are ONLY
  needed when handling right-side trsm by allowing the matrix on the
  right (matrix B) to be triangular, because it involves swapping
  register, but not cache, blocksizes (packing A by NR and B by MR)
  and then swapping the operands to gemmtrsm just before that kernel
  is called. It may be useful to disable these constraints if, for
  example, the developer wishes to test the configuration with
  a different set of cache blocksizes where only MC % MR = 0 and
  NC % NR = 0 are enforced.
- In summary, #defining BLIS_RELAX_MCNR_NCMR_CONSTRAINTS will bypass
  the enforcement of MC % NR = 0 and NC % MR = 0.
2016-11-01 14:35:15 -05:00
Devin Matthews
1a67e3688e Bogus commit
Need to trigger another Travis build.
2016-11-01 13:53:18 -05:00
Devin Matthews
2cd82d67b3 Some fixes for .travis.yml
- Switch to gcc-5 to support knl
- Don't run tests in parallel -- it is super slow.
- Use clang on OSX since gcc is only a zombie husk.
2016-11-01 13:25:50 -05:00
Devin Matthews
a3db4e6bdf Update .travis.yml with additional tests
- Test knl configuration (without running of course).
- Test openmp and pthreads threading for auto configuration with 4 threads.
- Test auto configuration with and without pthreads on OSX.
- Also, run make in parallel.

I don't know how the `addons:` section works on OSX; hopefully it is just ignored.
2016-11-01 10:33:18 -05:00
Field G. Van Zee
8a11a2174a Updates to non-default haswell microkernels.
Details:
- Updated s and d microkernels in bli_gemm_asm_d8x6.c to relax alignment
  constraints.
- Added missing c and z microkernels, which are based on the corresponding
  kernels in the d6x8 set.
- This completes the d8x6 set (which may be used for situations when it
  is desirable to have a microkernel with a column preference).
2016-10-31 19:07:55 -05:00
Field G. Van Zee
618f4331eb Align strides of ct in macrokernels to that of c.
Details:
- Previously, rs_ct and cs_ct, the strides of the temporary microtile used
  primarily in the macrokernels' edge case handling, were unconditionally
  set to 1 and MR, respectively. However, Devin Matthews noted that this
  ought to be changed so that the strides of ct were in agreement with the
  strides of C. (That is, if C was row-stored, then ct should be accessed
  as by rows as well.) The implicit assumption is that the strides of C
  have already been adjusted, via induced transposition, if the storage
  preference of the microkernel is at odds with the storage of C. So, if
  the microkernel prefers row storage, the macrokernel's interior cases
  would present row-stored (ideal) microkernel subproblems to the
  microkernel, but for edge cases, it would still see column-stored
  subproblems (not ideal). This commit fixes this issue. Thanks to Devin
  for his suggestion.
2016-10-31 14:40:51 -05:00
Field G. Van Zee
6303910023 Merge pull request #105 from devinamatthews/knl
Support for Intel Knight's Landing.
2016-10-25 19:34:51 -05:00
Devin Matthews
216206c1d3 Fix up for merge to master. 2016-10-25 13:56:18 -05:00
Devin Matthews
11eb7957ab Merge branch 'master' into knl
# Conflicts:
#	frame/thread/bli_thread.h
2016-10-25 13:51:07 -05:00
Devin Matthews
cd5b668183 Don't use %rbp in KNL packing kernels. 2016-10-25 13:49:27 -05:00
Field G. Van Zee
956b3edf8e Merge pull request #104 from devinamatthews/misspellings
Add flexible options for thread model (pthread/posix for pthreads etc.).
2016-10-25 13:02:57 -05:00
Devin Matthews
0662a3c1b1 Add flexible options for thread model (pthread/posix for pthreads etc.). 2016-10-25 12:42:44 -05:00
Kiran Varaganti
e044fa6240 Changed double precision trsm kernel macro definition to bli_dtrsm_l_int_6x8 from 6x16 : it fixes the seg fault
Change-Id: Ia8c1de5fe13a370d691570a50136d55ffb18908a
2016-10-25 13:03:05 +05:30
Field G. Van Zee
b7e41d71b0 Merge pull request #103 from devinamatthews/patch-1
Change .align to .p2align in Bulldozer ukernels.
2016-10-24 16:47:46 -05:00
Devin Matthews
5117d444f7 Change .align to .p2align in Bulldozer ukernels
Apparently OSX doesn't allow .align directives for >16B, so I've changed these to their .p2align counterparts.
2016-10-24 16:20:47 -05:00
Field G. Van Zee
4bd905bd45 Merge pull request #93 from ShadenSmith/config_check
Adds sanity check to configuration choice.
2016-10-21 14:48:44 -05:00
Field G. Van Zee
936d5fdc26 Fixed multithreading compilation bug in 970745a.
Details:
- Moved the definition of the cpp macro BLIS_ENABLE_MULTITHREADING
  from bli_thread.h to bli_config_macro_defs.h. Also moved the
  sanity check that OpenMP and POSIX threads are not both enabled.
- Thanks to Krzysztof Drewniak for reporting this bug.
2016-10-21 14:34:27 -05:00
Kiran Varaganti
d250e6a3af Merged TRSM and scalv routines into zen folder
Change-Id: Ice897bc83e8fb70b90f23cc3ce892c39883aceb9
2016-10-20 14:34:39 +05:30
Field G. Van Zee
8feb0f85a6 Removed auto-prototyping of malloc()/free() substitutes.
Details:
- Removed the header file, bli_malloc_prototypes.h, which automatically
  generated prototypes for the functions specified by the following
  cpp macros:
    BLIS_MALLOC_INTL
    BLIS_FREE_INTL
    BLIS_MALLOC_POOL
    BLIS_FREE_POOL
    BLIS_MALLOC_USER
    BLIS_FREE_USER
  These prototypes were originally provided primarily as a convenience
  to those developers who specified their own malloc()/free() substitutes
  for one or more of the following. However, we generated these prototypes
  regardless, even when the default values (malloc and free) of the
  macros above were used. A problem arose under certain circumstances
  (e.g., gcc in C++ mode on Linux with glibc) when including blis.h that
  stemmed from the "throw" specification which was added to the glibc's
  malloc() prototype, resulting in a prototype mismatch. Therefore, going
  forward, developers who specify their own custom malloc()/free()
  substitutes must also prototype those substitutes via bli_kernel.h.
  Thanks to Krzysztof Drewniak for reporting this bug, and Devin Matthews
  for researching the nature and potential solutions.
2016-10-19 16:05:41 -05:00
Field G. Van Zee
970745a5fc Reorganized typedefs to avoid compiler warnings.
Details:
- Relocated membrk_t definition from bli_membrk.h to bli_type_defs.h.
- Moved #include of bli_malloc.h from blis.h to bli_type_defs.h.
- Removed standalone mtx_t and mutex_t typedefs in bli_type_defs.h.
- Moved #include of bli_mutex.h from bli_thread.h to bli_typedefs.h.
- The redundant typedefs of membrk_t and mtx_t caused a warning on some C
  compilers. Thanks to Tyler Smith for reporting this issue.
2016-10-19 15:58:03 -05:00
sthangar
1c2f7b57d5 Removed symlinks to zen kernels from haswell kernel folder and also modified the bli_kernel.h file accordingly
Change-Id: Ib3736af48e851c8243bbe10d937fb942c49ad048
2016-10-18 15:06:35 +05:30
praveeng
d864ea9f4f Merge master code 2016_10_14 till Added disabled code thrinfo_t structures
Change-Id: If7db98d286c1471fcd30f00757abee9b253ef987
2016-10-14 17:01:31 +05:30
Field G. Van Zee
28b2af8a71 Added disabled code to print thrinfo_t structures.
Details:
- Added cpp-guarded code to bli_thrcomm_openmp.c that allows a curious
  developer to print the contents of the thrinfo_t structures of each
  thread, for verification purposes or just to study the way thread
  information and communicators are used in BLIS.
- Enabled some previously-disabled code in bli_l3_thrinfo.c for freeing
  an array of thrinfo_t* values that is used in the new, cpp-guarde code
  mentioned above.
- Removed some old commented lines from bli_gemm_front.c.
2016-10-13 14:50:08 -05:00
Field G. Van Zee
11eed3f683 Fixed a configure -t omp/openmp bug from fd04869.
Details:
- Forgot to update certain occurrences of "omp" in common.mk during
  commit fd04869, which changed the preferred configure option string
  for enabling OpenMP from "omp" to "openmp".
2016-10-13 14:23:23 -05:00
praveeng
7045fcbf0b Merge master code 2016_10_13 Removed previously renamed/old files
Change-Id: I8106d371afaa0af474a8967388d44481b05de923
2016-10-13 12:03:24 +05:30
sthangar
7e04490002 Checked in the SAMAX optimizations
Change-Id: I7faf8c3adf52ff01432188ad3b9866ee4b9a9dfd
2016-10-13 10:07:51 +05:30
Field G. Van Zee
9cda6057ea Removed previously renamed/old files.
Details:
- Removed frame/base/bli_mem.c and frame/include/bli_auxinfo_macro_defs.h,
  both of which were renamed/removed in 701b9aa. For some reason, these
  files survived when the compose branch was merged back into master.
  (Clearly, git's merging algorithm is not perfect.)
- Removed frame/base/bli_mem.c.prev (an artifact of the long-ago changed
  memory allocator that I was keeping around for no particular reason).
2016-10-11 13:21:26 -05:00
Field G. Van Zee
22377abd84 Fixed bli_gemm() segfault on empty C matrices.
Details:
- Fixed a bug that would manifest in the form of a segmentation fault
  in bli_cntl_free() when calling any level-3 operation on an empty
  output matrix (ie: m = n = 0). Specifically, the code previously
  assumed that the entire control tree was built prior to it being
  freed. However, if the level-3 operation performs an early exit, the
  control tree will be incomplete, and this scenario is now handled.
  Thanks to Elmar Peise for reporting this bug.
2016-10-10 13:43:56 -05:00
Field G. Van Zee
0b571cd94d Fixed segfault in bli_free_align() for NULL ptrs.
Details:
- Fixed a bug in bli_free_align() caused by failing to handle NULL pointers
  up-front, which led to performing pointer arithmetic on NULL pointers in
  order to free the address immediately before the pointer. Thanks to Devin
  Matthews for reporting this bug.
2016-10-06 14:48:15 -05:00
praveeng
cd84fb9518 syntax erros in configure file
Change-Id: Ibe8a6071aad97df550df64c009fec33a9d8f43a1
2016-10-06 15:08:21 +05:30
praveeng
f2e7ea113a conflicts merge for bli_kernel.h
Change-Id: I15d846bd34e11f86ebfd7ed091ff671a1f3366a0
2016-10-06 12:35:30 +05:30
sthangar
133983c36f code clean up in bli_kernel.h
Change-Id: I11d9cdf2af8e8199209eb084f6c3a7c910b83d5d
2016-10-06 11:26:22 +05:30
Field G. Van Zee
4fb9b4ef2e CHANGELOG update (0.2.1) 2016-10-05 14:41:35 -05:00
Field G. Van Zee
866b2dde3f Version file update (0.2.1) 0.2.1 2016-10-05 14:41:34 -05:00
Field G. Van Zee
87fddeab3c Merge branch 'compose' 2016-10-05 13:35:01 -05:00
Field G. Van Zee
6f71cd3449 Merge pull request #94 from flame/distcomm
Implemented distributed thrinfo_t management.
2016-10-04 15:53:46 -05:00
Field G. Van Zee
86969873b5 Reclassified amaxv operation as a level-1v kernel.
Details:
- Moved amaxv from being a utility operation to being a level-1v operation.
  This includes the establishment of a new amaxv kernel to live beside all
  of the other level-1v kernels.
- Added two new functions to bli_part.c:
    bli_acquire_mij()
    bli_acquire_vi()
  The first acquires a scalar object for the (i,j) element of a matrix,
  and the second acquires a scalar object for the ith element of a vector.
- Added integer support to bli_getsc level-0 operation. This involved
  adding integer support to the bli_*gets level-0 scalar macros.
- Added a new test module to test amaxv as a level-1v operation. The test
  module works by comparing the value identified by bli_amaxv() to the
  the value found from a reference-like code local to the test module
  source file. In other words, it (intentionally) does not guarantee the
  same index is found; only the same value. This allows for different
  implementations in the case where a vector contains two or more elements
  containing exactly the same floating point value (or values, in the case
  of the complex domain).
- Removed the directory frame/include/old/.
2016-10-04 14:24:59 -05:00