Commit Graph

119 Commits

Author SHA1 Message Date
kdevraje
13806ba3b0 This check in has changes w.r.t Copyright information, which is changed to (start year) - 2019
Change-Id: Ide3c8f7172210b8d3538d3c36e88634ab1ba9041
2019-05-27 16:24:43 +05:30
Field G. Van Zee
6b83273126 Generalized ref kernels' pragma omp simd usage.
Details:
- Replaced direct usage of _Pragma( "omp simd" ) in reference kernels
  with PRAGMA_SIMD, which is defined as a function of the compiler being
  used in a new bli_pragma_macro_defs.h file. That definition is cleared
  when BLIS detects that the -fopenmp-simd command line option is
  unsupported. Thanks to Devin Matthews and Jeff Hammond for suggestions
  that guided this commit.
- Updated configure and bli_config.h.in so that the appropriate anchor
  is substituted in (when the corresponding pragma omp simd support is
  present).
2019-02-12 16:01:28 -06:00
Field G. Van Zee
bdd46f9ee8 Rewrote reference kernels to use #pragma omp simd.
Details:
- Rewrote level-1v, -1f, and -3 reference kernels in terms of simplified
  indexing annotated by the #pragma omp simd directive, which a compiler
  can use to vectorize certain constant-bounded loops. (The new kernels
  actually use _Pragma("omp simd") since the kernels are defined via
  templatizing macros.) Modest speedup was observed in most cases using
  gcc 5.4.0, which may improve with newer versions. Thanks to Devin
  Matthews for suggesting this via issue #286 and #259.
- Updated default blocksizes defined in ref_kernels/bli_cntx_ref.c to
  be 4x16, 4x8, 4x8, and 4x4 for single, double, scomplex and dcomplex,
  respectively, with a default row preference for the gemm ukernel. Also
  updated axpyf, dotxf, and dotxaxpyf fusing factors to 8, 6, and 4,
  respectively, for all datatypes.
- Modified configure to verify that -fopenmp-simd is a valid compiler
  option (via a new detect/omp_simd/omp_simd_detect.c file).
- Added a new header in which prefetch macros are defined according to
  which compiler is detected (via macros such as __GNUC__). These
  prefetch macros are not yet employed anywhere, though.
- Updated the year in copyrights of template license headers in
  build/templates and removed AMD as a default copyright holder.
2019-01-24 17:23:18 -06:00
Field G. Van Zee
eec2e183a7 Added escaping to '/' in os_name in configure.
Details:
- Add os_name to the list of variables into which the '/' character is
  escaped. This is meant to address (or at least make progress toward
  addressing) #293. Thanks to Isuru Fernando for spotting this as the
  potential fix, and also thanks to M. Zhou for the original report.
2019-01-21 12:12:18 -06:00
Field G. Van Zee
daacfe6840 Allow running configure with python 3.4.
Details:
- Relax version blacklisting of python3 to allow 3.4 or later instead
  of 3.5 or later. Thanks to Dave Love for pointing out that 3.4 was
  sufficient for the purpose of BLIS's build system. (It should be
  noted that we're not sure which, if any, python3 versions prior to
  3.4 are insufficient, and that the only thing stopping us from
  determining this is the fact that these earlier versions of python3
  are not readily available for us to test with.)
- Updated docs/BuildSystem.md to be explicit about current python2 vs
  python3 version requirements.
2019-01-07 12:12:47 -06:00
Field G. Van Zee
f272c2899a Add error message to malloc() check for NULL.
Details:
- Output an error message if and when the malloc()-equivalent called by
  bli_fmalloc_align() ever returns NULL. Everything was already in place
  for this to happen, including the error return code, the error string
  sprintf(), the error checking function bli_check_valid_malloc_buf()
  definition, and its prototype. Thanks to Minh Quan Ho for pointing out
  the missing error message.
- Increased the default block_ptrs_len for each inner pool stored in the
  small block allocator from 10 to 25. Under normal execution, each
  thread uses only 21 blocks, so this change will prevent the sba from
  needing to resize the block_ptrs array of any given inner pool as
  threads initially populate the pool with small blocks upon first
  execution of a level-3 operation.
- Nix stray newline echo in configure.
2019-01-02 12:34:15 -06:00
Field G. Van Zee
2f3174330f Implemented a pool-based small block allocator.
Details:
- Implemented a sophisticated data structure and set of APIs that track
  the small blocks of memory (around 80-100 bytes each) used when
  creating nodes for control and thread trees (cntl_t and thrinfo_t) as
  well as thread communicators (thrcomm_t). The purpose of the small
  block allocator, or sba, is to allow the library to transition into a
  runtime state in which it does not perform any calls to malloc() or
  free() during normal execution of level-3 operations, regardless of
  the threading environment (potentially multiple application threads
  as well as multiple BLIS threads). The functionality relies on a new
  data structure, apool_t, which is (roughly speaking) a pool of
  arrays, where each array element is a pool of small blocks. The outer
  pool, which is protected by a mutex, provides separate arrays for each
  application thread while the arrays each handle multiple BLIS threads
  for any given application thread. The design minimizes the potential
  for lock contention, as only concurrent application threads would
  need to fight for the apool_t lock, and only if they happen to begin
  their level-3 operations at precisely the same time. Thanks to Kiran
  Varaganti and AMD for requesting this feature.
- Added a configure option to disable the sba pools, which are enabled
  by default; renamed the --[dis|en]able-packbuf-pools option to
  --[dis|en]able-pba-pools; and rewrote the --help text associated with
  this new option and consolidated it with the --help text for the
  option associated with the sba (--[dis|en]able-sba-pools).
- Moved the membrk field from the cntx_t to the rntm_t. We now pass in
  a rntm_t* to the bli_membrk_acquire() and _release() APIs, just as we
  do for bli_sba_acquire() and _release().
- Replaced all calls to bli_malloc_intl() and bli_free_intl() that are
  used for small blocks with calls to bli_sba_acquire(), which takes a
  rntm (in addition to the bytes requested), and bli_sba_release().
  These latter two functions reduce to the former two when the sba pools
  are disabled at configure-time.
- Added rntm_t* arguments to various cntl_t and thrinfo_t functions, as
  required by the new usage of bli_sba_acquire() and _release().
- Moved the freeing of "old" blocks (those allocated prior to a change
  in the block_size) from bli_membrk_acquire_m() to the implementation
  of the pool_t checkout function.
- Miscellaneous improvements to the pool_t API.
- Added a block_size field to the pblk_t.
- Harmonized the way that the trsm_ukr testsuite module performs packing
  relative to that of gemmtrsm_ukr, in part to avoid the need to create
  a packm control tree node, which now requires a rntm_t that has been
  initialized with an sba and membrk.
- Re-enable explicit call bli_finalize() in testsuite so that users who
  run the testsuite with memory tracing enabled can check for memory
  leaks.
- Manually imported the compact/minor changes from 61441b24 that cause
  the rntm to be copied locally when it is passed in via one of the
  expert APIs.
- Reordered parameters to various bli_thrcomm_*() functions so that the
  thrcomm_t* to the comm being modified is last, not first.
- Added more descriptive tracing for allocating/freeing small blocks and
  formalized via a new configure option: --[dis|en]able-mem-tracing.
- Moved some unused scalm code and headers into frame/1m/other.
- Whitespace changes to bli_pthread.c.
- Regenerated build/libblis-symbols.def.
2018-12-25 19:35:01 -06:00
Field G. Van Zee
0645f239fb Remove UT-Austin from copyright headers' clause 3.
Details:
- Removed explicit reference to The University of Texas at Austin in the
  third clause of the license comment blocks of all relevant files and
  replaced it with a more all-encompassing "copyright holder(s)".
- Removed duplicate words ("derived") from a few kernels' license
  comment blocks.
- Homogenized license comment block in kernels/zen/3/bli_gemm_small.c
  with format of all other comment blocks.
2018-12-04 14:31:06 -06:00
Field G. Van Zee
f19c33af4c Disallow 64b BLAS integers + 32b BLIS integers.
Details:
- Print an error message from configure if the user attempts to
  explicitly configure BLIS for simultaneous use of 64-bit integers in
  the BLAS API with 32-bit integers in the BLIS API.
- Added cpp macro conditional to bli_type_defs.h to mandate that BLIS
  integers be 64 bits if the BLAS integers are 64 bits. This and the
  above item take care of issue #274. Thanks to Devin Matthews and
  Jeff Hammond for suggesting these safeguards.
- Slight reorganization and relabeling (for clarity) of BLAS/CBLAS
  sections and BLIS integer size line of the testsuite configuration
  output.
- Very minor edits to docs/MixedDatatypes.md.
2018-10-26 17:07:15 -05:00
Field G. Van Zee
06c23954e6 Defined unified bli_pthreads_*() API for all OSes.
Details:
- Expanded the bli_pthread_*() -> pthread_*() wrappers in
  frame/thread/bli_pthread.c to include cases for Windows taken from
  frame/base/bli_pthread_wrap.c. Now, bli_thread_*() is always defined
  and always used by BLIS and the BLIS testsuite (in lieu of calling
  pthreads directly, as before). The implementation used in this new
  API depends on whether we are building for Windows, and to a lesser
  extent, whether we are building on OS X. For the core API, Windows
  uses Windows threads, non-Windows (Linux, OS X) uses pthreads.
  OS X and Windows get barriers implemented in terms of other
  bli_pthread_*() functions, and Linux gets barriers implemented in
  terms of pthread_barrier*(). This commit addresses issue #273.
- Fixed a bug in the Linux definition of bli_pthread_mutex_unlock(),
  which was erroneously calling pthread_mutex_lock().
- Minor changes to configure so that the auto-detection executable
  can be built given the above changes (most notably, turning on
  POSIX extensions via -D_GNU_SOURCE).
- Removed temporary play-test code for shiftd that accidentally got
  committed into test/3m4m/test_gemm.c.
2018-10-23 19:16:54 -05:00
Field G. Van Zee
73a222c0d9 Minor edits to 'configure --help' text. 2018-10-20 14:13:04 -05:00
Field G. Van Zee
090e4f08fc Merge branch 'master' into dev 2018-10-19 18:41:10 -05:00
Field G. Van Zee
343a2715eb Whitespace changes to configure, bli_pthread_wrap.
Details:
- Mostly whitespace changes (spaces to tabs) to configure and
  bli_pthread_wrap.c and .h.
2018-10-19 16:59:19 -05:00
Field G. Van Zee
3678a1cd51 Merge branch 'master' into win-pthreads 2018-10-19 16:11:31 -05:00
Field G. Van Zee
4e38a8d4ee Implemented python version checking in configure.
Details:
- Added python version checking to configure script. (Recall that python
  is needed to execute the flatten-headers.py script.) Minimum versions
  of python needed are currently as follows:
    python2: 2.7 or later
    python3: 3.5 or later
  The standard search order for python interpeters is:
    python python3 python2
  The PYTHON environment variable is also supported and will be checked
  before the standard search order list.
- Updated BuildSystem.md to include: a minimum make version; mention
  that the C compiler must actually be a C99 compiler; and the caveat
  that Windows builds do not require pthreads since BLIS can provide
  an implementation of pthreads internally.
2018-10-19 15:54:15 -05:00
Field G. Van Zee
49d3f9fcbb Merge branch 'master' into dev 2018-10-17 18:00:40 -05:00
Devin Matthews
29e6245816 Merge branch 'master' into win-pthreads 2018-10-16 10:12:25 -05:00
Devin Matthews
0b73209f6b Add missing argument to WaitForSingleObject and use $is_win in configure
to turn off pthreads.
2018-10-16 10:02:06 -05:00
Field G. Van Zee
5fec95b99f Implemented mixed-datatype support for gemm.
Details:
- Implemented support for gemm where A, B, and C may have different
  storage datatypes, as well as a computational precision (and implied
  computation domain) that may be different from the storage precision
  of either A or B. This results in 128 different combinations, all
  which are implemented within this commit. (For now, the mixed-datatype
  functionality is only supported via the object API.) If desired, the
  mixed-datatype support may be disabled at configure-time.
- Added a memory-intensive optimization to certain mixed-datatype cases
  that requires a single m-by-n matrix be allocated (temporarily) per
  call to gemm. This optimization aims to avoid the overhead involved in
  repeatedly updating C with general stride, or updating C after a
  typecast from the computation precision. This memory optimization may
  be disabled at configure-time (provided that the mixed-datatype
  support is enabled in the first place).
- Added support for testing mixed-datatype combinations to testsuite.
  The user may test gemm with mixed domains, precisions, both, or
  neither.
- Added a standalone test driver directory for building and running
  mixed-datatype performance experiments.
- Defined a new variation of castm, castnzm, which operates like castm
  except that imaginary values are not touched when casting a real
  operand to a complex operand. (By contrast, in these situations castm
  sets the imaginary components of the destination matrix to zero.)
- Defined bli_obj_imag_is_zero() and substituted calls in lieu of all
  usages of bli_obj_imag_equals() that tested against BLIS_ZERO, and
  also simplified the implementation of bli_obj_imag_equals().
- Fixed bad behavior from bli_obj_is_real() and bli_obj_is_complex()
  when given BLIS_CONSTANT objects.
- Disabled dt_on_output field in auxinfo_t structure as well as all
  accessor functions. Also commented out all usage of accessor
  functions within macrokernels. (Typecasting in the microkernel is
  still feasible, though probably unrealistic for now given the
  additional complexity required.)
- Use void function pointer type (instead of void*) for storing function
  pointers in bli_l0_fpa.c.
- Added documentation for using gemm with mixed datatypes in
  docs/MixedDatatypes.md and example code in examples/oapi/11gemm_md.c.
- Defined level-1d operation xpbyd and level-1m operation xpbym.
- Added xpbym test module to testsuite.
- Updated frame/include/bli_x86_asm_macros.h with additional macros
  (courtsey of Devin Matthews).
2018-10-15 16:37:39 -05:00
Field G. Van Zee
c244a716c9 Added missing -r option to configure --help output.
Details:
- Added inadvertantly-omitted mention of -r option-equivalent to
  --thread-part-jrir to the output for 'configure --help'. Also made
  minor edits to the same text.
2018-10-07 20:59:40 -05:00
Field G. Van Zee
c92762ecdc Added option of slab or rr partitioning in jr/ir.
Details:
- Updated existing macrokernel function names and definitions to
  explicitly use slab assignment of micropanels to threads, then created
  duplicate versions of macrokernels that explicitly use round-robin
  assignment instead of slab. NOTE: As in ac18949, trsm_r macrokernels
  were not substantially updated in this commit because they are
  currently disabled in bli_trsm_front.c.
- Updated existing packing function (in blk_packm_blk_var1.c) to
  explicitly use slab partitioning, and then duplicated for round-robin.
- Updated control tree initialization to use the appropriate macrokernel
  and packm function pointers depending on which method (slab or rr) was
  enabled at configure-time.
- Updated configure script to accept new --thread-part-jrir=[slab|rr]
  option (-m [slab|rr] for short), which allows the user to explicitly
  request either slab or round-robin assignment (partitioning) of
  micropanels to threads.
- Updated sandbox/ref99 according to above changes.
- Minor updates to build/add-copyright.py.
2018-10-07 20:30:32 -05:00
Field G. Van Zee
743a1a6dec Fixed misleading version query from gcc 7+.
Details:
- gcc 7 introduced new behavior to the -dumpversion option whereby only
  the major version component is output. However, as part of this
  change, gcc 7 also introduced a new option, -dumpfullversion, which is
  guaranteed to always output the major, minor, and revision numbers. If
  we are using gcc 7 or later, we re-query the version string with this
  new option and then re-parse the result so as to avoid misleading
  output from configure (e.g. using gcc 7.3.0 is reported as 7.7.7).
2018-10-03 14:40:10 -05:00
Devin Matthews
d33f130ea6 Some configure changes:
1) Allow environment variables to be set anywhere in the argument list.
2) Allow any environment variable to be set.
3) Allow LIBPHTREAD to be set to null without getting defaulted to -lpthread.
2018-10-02 11:45:43 -05:00
Field G. Van Zee
7d96fc437e Allow slashes ('/') in version tags.
Details:
- Updated the configure script to allow slashes in version string. This
  is needed so that downstream maintainers (such as those for Debian)
  can create local tags such as "upstream/0.4.1". Thanks to M. Zhou for
  reporting this issue via PR #256 and providing me the information
  needed to debug the problem.
2018-09-28 15:40:45 -05:00
Field G. Van Zee
807a654888 Fixed confusing configure message for libmemkind.
Details:
- Corrected feedback echoed to user by configure when libmemkind is
  found but not explicitly requested. In these cases, configure would
  echo a message that it had received an explicit request to enable
  libmemkind, which was not accurate, even if the end result was the
  same--that libmemkind is enabled by default when it is found. Thanks
  To Devangi Parikh for reporting this issue.
2018-09-20 15:41:05 -05:00
Field G. Van Zee
c03728f1f4 Various minor cleanups.
Details:
- Rewrote bli_winsys.c to define bli_setenv() and bli_sleep()
  unconditionally, but differently for Windows and non-Windows, but
  then disabled the definition of bli_setenv() entirely since BLIS
  no longer needs to set environment variables. Updated bli_winsys.h
  accordingly, and call bli_sleep() from within testsuite instead of
  sleep() directly.
- Use
    #if !defined(_POSIX_BARRIERS) || (_POSIX_BARRIERS != 200809L)
  instead of
    #if !defined(_POSIX_BARRIERS) || (_POSIX_BARRIERS < 0)
  when guarding against local definition of pthread barrier in
  testsuite. (The description for unistd.h implies that _POSIX_BARRIERS
  should always be set to 200809L when barriers are supported, though I
  won't be surprised if we encounter a case in the future where it is
  set to something else such as 1 while still supported.)
- Removed old _VERS_CONF_INST definitions and installation rules in
  top-level Makefile. These are no longer needed because we no longer
  output libraries with the version and configuration name as
  substrings.
- Comment/whitespace updates in Makefile, config.mk.in, common.mk,
  configure, bli_extern_defs.h, and test_libblis.h.
- Added mention of 1m to README.md and other trivial tweaks.
2018-09-10 17:54:27 -05:00
Isuru Fernando
e93b01ff60 Windows DLL support (#246)
* Enable shared

* Enable rdp

* Add support for dll

* Use libblis-symbols.def

* Fix building dlls

* Fix libblis-symbols.def

* Fix soname

* Fix Makefile error

* Fix install target

* Fix missing symbols

* Add BLIS_MINUS_TWO

* Add path to dll

* Fix OSX soname

* Add declspec for dll

* Add -DBLIS_BUILD_DLL

* Replace @enable_shared@ in config

* switch to auto for now

* blis_ -> bli_

* Remove BLIS_BUILD_DLL in make check

* change auto->haswell

* enable_shared_01

* Add wno-macro-redefined

* print out.cblat3

* BLIS_BUILD_DLL -> BLIS_IS_BUILDING_LIBRARY

* Use V=1

* Remove fpic for windows

* Remember LIBPTHREAD

* Remove libm for windows

* Remember AR

* Fix remembering libpthread

* Add Wno-maybe-uninitialized in only gcc

* Don't do blastest for shared for now

* Fix install target

And remove unnecessary change

* test auto and x86_64

* Fix install target again

* Use IS_WIN variable

* Remove leading dot from LIBBLIS_SO_MAJ_EXT

* Make is_win yes/no

* Add comments for windows builds

* Change if else blocks location
2018-09-09 15:57:43 -05:00
Field G. Van Zee
b051ffb815 Merge branch 'dev' 2018-08-29 17:06:48 -05:00
Field G. Van Zee
10d07357af Better thread safety; added threading to testsuite.
Details:
- Replaced critical sections that were conditional upon multithreading
  being enabled (via pthreads or OpenMP) with unconditional use of
  pthreads mutexes. (Why pthreads? Because BLIS already requires it
  for its initialization mechanism: pthread_once().) This was done in
  bli_error.c, bli_gks.c, bli_l3_ind.c. Also, replaced usage of BLIS's
  mtx_t object and bli_mutex_*() API with pthread mutexes in
  bli_thread.c. The previous status quo could result in a race condition
  if the application called BLIS from more than one thread. The new
  pthread-based code should be completely agnostic to the application's
  threading configuration. Thanks to AMD for bringing to our attention
  the need for a thread-safety review.
- Added an option to the testsuite to simulate application-level
  multithreading. Specifically, each thread maintains a counter that is
  incremented after each experiment. The thread only executes the
  experiment if: counter % n_threads == thread_id. In other words, the
  threads simply take turns executing each problem experiment. Also,
  POSIX guarantees that fprintf() will not intermingle output, so
  output was switched to fprintf() instead of libblis_test_fprintf().
- Changed membrk_t objects to use pthread_mutex_t intead of mtx_t and
  replaced use of bli_mutex_init()/_finalize() in bli_membrk.c with
  wrappers to pthread_mutex_init()/_destroy().
- Changed the implementation of bli_l3_ind_oper_enable_only() to fix
  a race condition; specifically, two threads calling the function with
  the same parameters could lead to a non-deterministic outcome.
- Added #include <pthread.h> to bli_cpuid.c and moved the same in
  bli_arch.c.
- Added 'const' to declaration of OPT_MARKER in bli_getopt.c.
- Added #include <pthread.h> to bli_system.h.
- Added add-copyright.py script to automate adding new copyright lines
  to (and updating existing lines of) source files.
2018-08-26 20:34:30 -05:00
Field G. Van Zee
aaa549f4d1 Minor update to configure --help (--sharedir option).
Details:
- Fixed/tweaked description for --sharedir=SHAREDIR option.
2018-08-26 20:13:51 -05:00
Field G. Van Zee
62ea1d33d3 Fixed broken out-of-tree builds.
Details:
- Fixed stale filepaths to check-blastest.sh and check-blistest.sh in
  travis/do_testsuite.sh and travis/do_sde.sh.
- Create a symbolic link to the 'config' directory so that the top-level
  Makefile can find the configs' make_defs.mk files during out-of-tree
  builds.
- Added additional case handling to out-of-tree scenario to handle
  situations where files 'Makefile', 'common.mk', or 'config' exist but
  are not symbolic links. In such cases, configure warns the user and
  exits.
- Homogenized various error messages throughout configure.
- Belated thanks to Victor Eijkhout for requesting the feature added
  in 0f491e9 whereby lesser Makefiles can compile and link against
  an existing installation of BLIS.
2018-08-26 13:35:53 -05:00
Field G. Van Zee
0f491e994a Allow lesser Makefiles to reference installed BLIS.
Details:
- Updated the build system so that "lesser" Makefiles, such as those in
  belonging to example code or the testsuite, may be run even if the
  directory is orphaned from the original build tree. This allows a
  user to configure, compile, and install BLIS, delete the build tree
  (that is, the source distribution, or the build directory for out-
  of-tree builds) and then compile example or testsuite code and link
  against the installed copy of BLIS (provided the example or testsuite
  directory was preserved or obtained from another source). The only
  requirement is that make be invoked while setting the
  BLIS_INSTALL_PATH variable to the same installation prefix used when
  BLIS was configured. The easiest syntax is:

    make BLIS_INSTALL_PATH=/install/prefix

  though it's also permissible to set BLIS_INSTALL_PATH as an
  environment variable prior to running 'make'.
- Updated all lesser Makefiles to implement the new aforementioned build
  behavior.
- Relocated check-blastest.sh and check-blistest.sh from build to
  blastest and testsuite, respectively, so that if those directories are
  copied elsewhere the user can still run 'make check' locally.
- Updated docs/Testsuite.md with language that mentions this new option
  of building/linking against an installed copy of BLIS.
2018-08-25 20:12:36 -05:00
Field G. Van Zee
36ff92ce0d Missing C++ compiler no longer fatal to configure.
Details:
- Changed configure so that the absence of any C++ compiler from the
  pre-defined search list does not result in an exit. Instead, in this
  situation, the found_cxx variable is assigned 'c++notfound' and the
  error message is changed to remind the user that C++ will not be
  available in the sandbox. Thanks to Devangi Parikh for reporting this
  issue.
- Also tweaked the message when a C++ compiler *is* found to remind any
  would-be confused user that BLIS will only use C++ if it is needed by
  code in the sandbox.
2018-08-24 18:26:09 -05:00
Field G. Van Zee
ffb57242f3 Cosmetic output changes to configure.
Details:
- Disable sandbox-related obj directory creation, directory mirroring,
  and makefile fragment generation when a sandbox is not enabled.
- Prevent various duplicate actions by configure (such as those
  mentioned above for sandboxes above).
2018-08-22 18:22:41 -05:00
Field G. Van Zee
ac17454aae Merge branch 'master' into dev 2018-08-22 15:34:53 -05:00
Field G. Van Zee
a77bec766a Whitespace changes, minor renames in build system.
Details:
- Minor whitespace cleanup, mostly in the form of spaces -> tabs.
- Shortened certain variables' _FRAGMENT_ infixes to _FRAG_ in
  common.mk.
2018-08-22 15:31:29 -05:00
Devin Matthews
1b0f8d60d1 Generate makefile fragments in build tree (#240)
* Make src dir read-only in out-of-tree build test.

* Generate makefile fragments in the build tree.
2018-08-22 15:19:29 -05:00
Field G. Van Zee
65c9096c6e Fixed broken -p option to configure.
Details:
- Fixed some stale code that was preventing the -p option to configure
  from working as expected (though the --prefix option was unaffected).
  This bug was was most likely introduced in  7e5648c (May 7 2018).
  Thanks to Dave Love for reporting this issue.
2018-08-17 11:44:12 -05:00
Field G. Van Zee
2c7960c841 Implemented ARG_MAX hack in configure, Makefile.
Details:
- Added support for --enable-arg-max-hack to configure, which will
  change the behavior of make when building BLIS so that rather than
  invoke the archiver/linker with all of the object files as command
  line arguments, those object files are echoed to a temporary file
  and then the archiver/linker is fed that temporary file via the @
  notation. An example of this can be found in the GNU make docs at
  https://www.gnu.org/software/make/manual/make.html#File-Function
- Thanks to Isuru Fernando for prompting this feature.
2018-07-05 14:38:33 -05:00
Field G. Van Zee
89e178ce38 Merge branch 'master' into dev 2018-07-04 17:51:16 -05:00
Isuru Fernando
14648e1376 Native windows support using clang (#227)
* Add appveyor file

* Build script

* Remove fPIC for now

* copy as

* set CC and CXX

* Change the order of immintrin.h

* Fix testsuite header

* Move testsuite defs to .c

* Fix appveyor file

* Remove fPIC again and fix strerror_r missing bug

* Remove appveyor script

* cd to blis directory

* Fix sleep implementation

* Add f2c_types_win.h

* Fix f2c compilation

* Remove rdp and rename appveyor.yml

* Remove setenv declaration in test header

* set CPICFLAGS to empty

* Fix another immintrin.h issue

* Escape CFLAGS and LDFLAGS

* Fix more ?mmintrin.h issues

* Build x86_64 in appveyor

* override LIBM LIBPTHREAD AR AS

* override pthreads in configure

* Move windows definitions to bli_winsys.h

* Fix LIBPTHREAD default value

* Build intel64 in appveyor for now
2018-07-04 17:48:42 -05:00
Field G. Van Zee
195480beb5 Merge branch 'master' into dev 2018-06-25 13:24:21 -05:00
Field G. Van Zee
3f387ca35e Fixed bugs in configure's select_cc() function.
Details:
- This commit fixes several bugs in configure relating to selecting a C
  compiler. By dumb luck, two of the two bugs sort of cancelled each
  other out in most use cases, which manifested as the expected behavior.
  Thanks to Mathieu Poumeyrol for bringing this issue to our attention,
  and to Devin Matthews for suggesting the more portable way of
  capturing both stdout and stderr and suggesting a return code check
  instead of testing stdout/stderr.
- The first bug: As the values of the compiler search list are iterated
  over, only stderr is captured when querying a compiler with --version
  rather than both stdout and stderr.
- The second bug: After each query, a conditional attempted to test
  whether the query resulted in anything being output. That conditional
  erroneously was using "-z" instead of "-n" for non-emptiness. Thus,
  most of the time, stderr was empty (because the --version info was
  being output on stdout), and since it was empty, the -z conditional
  (intended to execute only when a compiler was found to be responsive)
  executed.
- A third bug was also fixed in the way that the merged stdout/stderr
  output was tested for non-emptiness (moving the 'cat' invocation to
  another line and testing the contents of a variable instead).
- The three bugs above have been fixed as part of a partial rewrite of
  the select_cc() function in terms of a return code check, which
  obviated the need to save the output of stdout and stderr.
- The fourth bug involved a misnamed variable in the right-hand side
  of a statement intended to prepend CC to search_list when CC was
  non-empty. This typically did not manifest as a bug since usually CC
  (if it was set) was set to a value that was known to work.
2018-06-25 12:32:03 -05:00
Field G. Van Zee
f986396c2a Added 'configure --help' text for CFLAGS, LDFLAGS.
Details:
- Added mention of the new support for preset CFLAGS, LDFLAGS to the
  bottom of the text output by './configure --help'.
- Updated usage example to use 'haswell' instead of 'sandybridge'.
2018-06-22 18:12:40 -05:00
Field G. Van Zee
884175d9ff Added configure support for preset CFLAGS, LDFLAGS.
Details:
- Any preexisting values set to the CFLAGS environment variable (or the
  CFLAGS variable if given on the command line) are saved by configure
  for later inclusion (prepending, to be precise) along with the
  compiler flags automatically determined by the BLIS build system.
  LDFLAGS is treated in a similar manner.) Thanks to Dave Love for
  requesting this feature in issue #223 and Mathieu Poumeyrol for his
  support on this and a previous related issue.
- Comment updates to build/config.mk.in.
- Strip whitespace from return value of various cflags functions in
  common.mk.
2018-06-22 18:08:43 -05:00
Field G. Van Zee
3f48c38164 Cosmetic fix to configure output in config.mk.
Details:
- Fixed configure so that MK_ENABLE_MEMKIND is assigned "no" when the
  option is disabled due to libmemkind not being present. This wasn't
  affecting anything since the one use of the variable (in common.mk)
  was formulated as "ifeq ($(MK_ENABLE_MEMKIND),yes)". That is, the
  variable being empty was effectively equivalent to it being set to
  "no".
- Comment updates to build/config.mk.in, common.mk.
2018-06-05 16:52:35 -05:00
Field G. Van Zee
5df201260f Merge branch 'master' into dev 2018-06-05 16:14:19 -05:00
Field G. Van Zee
7a207e8f2c Disabled indirect blacklisting (issue #214).
Details:
- Return early from function, pass_config_kernel_registries(), that
  implements indirect blacklisting of subconfigurations (during pass 0).
  In short, I realized that indirect blacklisting is not needed in the
  situations I envisioned, and can actually cause problems under certain
  circumstances. Thanks to Tony Skjellum for reporting the issue (#214)
  that led to this commit, and to Devin Matthews for prompting me to
  realize that indirect blacklisting was unnecessary, at least as
  originally envisioned.
2018-06-03 18:04:27 -05:00
Field G. Van Zee
22deef2f54 Support alternative gemm implementation sandboxes.
Detail:
- configure:
  - add support for --enable-sandbox=NAME to configure script, where NAME
    is a subdirectory of a new 'sandbox' directory that contains an
    alternative implementation of gemm. (For now, only implementations of
    gemm may be provided via a sandbox.);
  - add support for C++ compiler. C++ compilers are handled in a manner
    similar to that of C compilers, in that a default search order is
    used, and that CXX is searched for first, if the variable is set. In
    practice, the C++ compiler that is selected should correspond to the
    selected C compiler. (Example: If gcc is selected for C, g++ should
    be selected for C++.) The result of the search is output to config.mk
    via build/config.mk.in. NOTE: The use of C++ in BLIS is still
    hypothetical, but may eventually move to being experimental. This
    support was intended only for use of C++ within a gemm sandbox.
- build/config.mk.in:
  - define SANDBOX variable containing sandbox subdirectory name.
- build/bli_config.in:
  - define either of the BLIS_ENABLE_SANDBOX or BLIS_DISABLE_SANDBOX
    macros in bli_config.h.
- common.mk:
  - include makefile fragments that were propagated into the specified
    sandbox subdirectory;
  - generate different CFLAGS for sandboxes, as well as a separate
    CXXFLAGS variable for sandboxes when C++ source files are compiled;
  - isolate into a single location lists of file suffixes for various
    purposes.
  - reorganized/clean up code related to identifying header files and
    paths.
- Makefile:
  - generate object filepaths for and compile source code files found in
    sandbox sub-directory;
  - remove makefile fragments placed in sandbox sub-directory (cleanmk);
  - various other cleanups.
- Added .cc, .cpp, and .cxx to list of suffixes of files to recognize in
  makefile fragments (via build/gen-make-frags/suffix_list).
- Updated blis.h to conditionally #include bli_sandbox.h (via a new file,
  bli_sbox.h), which each sandbox is assumed to use for any type
  definitions and function prototypes it wishes to export out to blis.h.
- Conditionally disable bli_gemmnat() implementation in frame/3 when
  BLIS_ENABLE_SANDBOX is defined.
2018-05-24 14:28:55 -05:00
Field G. Van Zee
10c9e8f952 Cache hardware's arch_t id after querying once.
Details:
- Added logic to bli_arch.c that will call what was previously the body
  of bli_arch_query_id() only once and then cache the value in a static
  variable local to the file. (Previously, the arch_t associated with
  the hardware/configuration was queried every time bli_arch_query_id()
  was called, which was at least once per level-3 function call. Thanks
  to Devin Matthews for suggesting this feature via issue #175.
- Added -lpthread to the compile/link command line of the compiler
  invocation that compiles build/detect/config/config_detect.c, which
  prints the string identifying the detected configuration, since it
  is now needed due to new pthread_once() logic in bli_arch.c.
- Implementation note: I chose to implement this arch_t caching feature
  via pthread_once(), using a separate pthread_once_t variable local to
  the file, rather than calling bli_init_once(). The reason is that I
  did not want to require bli_init() as a prerequisite to this function.
  bli_init() already calls several sub-components, some of which make use
  of bli_arch_query_id(), and therefore it would be easy to fall into a
  circular self-init situation (which usually causes pthreads to hang
  indefinitely).
2018-05-17 15:22:51 -05:00