Commit Graph

104 Commits

Author SHA1 Message Date
Field G. Van Zee
7a0ba4194f Added support for addons.
Details:
- Implemented a new feature called addons, which are similar to
  sandboxes except that there is no requirement to define gemm or any
  other particular operation.
- Updated configure to accept --enable-addon=<name> or -a <name> syntax
  for requesting an addon be included within a BLIS build. configure now
  outputs the list of enabled addons into config.mk. It also outputs the
  corresponding #include directives for the addons' headers to a new
  companion to the bli_config.h header file named bli_addon.h. Because
  addons may wish to make use of existing BLIS types within their own
  definitions, the addons' headers must be included sometime after that
  of bli_config.h (which currently is #included before bli_type_defs.h).
  This is why the #include directives needed to go into a new top-level
  header file rather than the existing bli_config.h file.
- Added a markdown document, docs/Addons.md, to explain addons, how to
  build with them, and what assumptions their authors should keep in
  mind as they create them.
- Added a gemmlike-like implementation of sandwich gemm called 'gemmd'
  as an addon in addon/gemmd. The code uses a 'bao_' prefix for local
  functions, including the user-level object and typed APIs.
- Updated .gitignore so that git ignores bli_addon.h files.

Change-Id: Ie7efdea366481ce25075cb2459bdbcfd52309717
2022-03-31 12:03:27 +05:30
Dipal M Zambare
3364c0e4eb Binary and dynamic dispatch configuration name change
-- Reverted changes made to include lp/ilp info in binary name
     This reverts commit c5e6f885f0.

  -- Included BLAS int size in 'make showconfig'

  -- Renamed amdepyc configuration to amdzen

Change-Id: Ie87ec1c03e105f606aef1eac397ba0d8338906a6
2021-11-12 08:58:56 +05:30
Dipal M Zambare
5d287fdba0 Include LP64/ILP64 in BLIS binary name
Binary name will be chosen based on multi-threading and BLAS
  integer size configuration as given below.

  libblis-[mt]-lp64 - when configured to use 32 bit integers
  libblis-[mt]-ilp64 - when configured to use 64 bit integers

AMD-Internal: [CPUPL-1879]
Change-Id: I865023c63235a0a72bdfce7057b2cfb8158b1d87
2021-11-12 08:58:51 +05:30
Dipal M Zambare
d1fb770f1c Removed duplicate cpp and testcpp folders.
The cpp and testcpp folder exists in root directory as well as vendor
directory. Only folder in vendor directory are needed.

Removed duplicate directories and updated makefiles to pick the
sources from vendor folder.

AMD-Internal: [CPUPL-1834]
Change-Id: I178043a09fd746660938b89ecce73c53d6c53409
2021-11-12 08:58:49 +05:30
Dipal M Zambare
333fe4ca8b Makefile cleanup
Removed unused function rm-dupls() from common.mk
    Removed code from patch-ld-so.py which is not needed for AMD codebase.

AMD-Internal: [CPUPL-1539]
Change-Id: If1812d5aa87c1e3a9d0c4706d571223d56f2fc20
2021-07-02 01:20:01 -04:00
lcpu
7401effc03 BLIS:merge:
Merge conflicts araised has been fixed while downstreaming BLIS code from master to milan-3.1 branch

Implemented an automatic reduction in the number of threads when the user requests parallelism via a single number (ie: the automatic way) and (a) that number of threads is prime, and (b) that number exceeds a minimum threshold defined by the macro BLIS_NT_MAX_PRIME, which defaults to 11. If prime numbers are really desired, this feature may be suppressed by defining the macro BLIS_ENABLE_AUTO_PRIME_NUM_THREADS in the appropriate configuration family's bli_family_*.h. (Jeff Diamond)

Changed default value of BLIS_THREAD_RATIO_M from 2 to 1, which leads to slightly different automatic thread factorizations.

Enable the 1m method only if the real domain microkernel is not a reference kernel. BLIS now forgoes use of 1m if both the real and complex domain kernels are reference implementations.

Relocated the general stride handling for gemmsup. This fixed an issue whereby gemm would fail to trigger to conventional code path for cases that use general stride even after gemmsup rejected the problem. (RuQing Xu)

Fixed an incorrect function signature (and prototype) of bli_?gemmt(). (RuQing Xu)

Redefined BLIS_NUM_ARCHS to be part of the arch_t enum, which means it will be updated automatically when defining future subconfigs.

Minor code consolidation in all level-3 _front() functions.

Reorganized Windows cpp branch of bli_pthreads.c.

Implemented bli_pthread_self() and _equals(), but left them commented out (via cpp guards) due to issues with getting the Windows versions working. Thankfully, these functions aren't yet needed by BLIS.

Allow disabling of trsm diagonal pre-inversion at compile time via --disable-trsm-preinversion.

Fixed obscure testsuite bug for the gemmt test module that relates to its dependency on gemv.

AMD-internal-[CPUPL-1523]

Change-Id: I0d1df018e2df96a23dc4383d01d98b324d5ac5cd
2021-04-27 11:09:48 +05:30
dzambare
48f2366b6f Updated BLIS version string to "AOCL BLIS X.x" format
AMD-Internal : [CPUPL-1394]

Change-Id: Ifebcb14d9eb064d231b831f5a1e151853ad5a009
2021-01-07 12:38:32 +05:30
Field G. Van Zee
9bb23e6c2a Added support for systemless build (no pthreads).
Details:
- Added a configure option, --[enable|disable]-system, which determines
  whether the modest operating system dependencies in BLIS are included.
  The most notable example of this on Linux and BSD/OSX is the use of
  POSIX threads to ensure thread safety for when application-level
  threads call BLIS. When --disable-system is given, the bli_pthreads
  implementation is dummied out entirely, allowing the calling code
  within BLIS to remain unchanged. Why would anyone want to build BLIS
  like this? The motivating example was submitted via #454 in which a
  user wanted to build BLIS for a simulator such as gem5 where thread
  safety may not be a concern (and where the operating system is largely
  absent anyway). Thanks to Stepan Nassyr for suggesting this feature.
- Another, more minor side effect of the --disable-system option is that
  the implementation of bli_clock() unconditionally returns 0.0 instead
  of the time elapsed since some fixed point in the past. The reasoning
  for this is that if the operating system is truly minimal, the system
  function call upon which bli_clock() would normally be implemented
  (e.g. clock_gettime()) may not be available.
- Refactored preprocess-guarded code in bli_pthread.c and bli_pthread.h
  to remove redundancies.
- Removed old comments and commented #include of "bli_pthread_wrap.h"
  from bli_system.h.
- Documented bli_clock() and bli_clock_min_diff() in BLISObjectAPI.md
  and BLISTypedAPI.md, with a note that both are non-functional when
  BLIS is configured with --disable-system.
2020-11-16 15:55:45 -06:00
Field G. Van Zee
88ad841434 Squash-merge 'pr' into 'squash'. (#457)
Merged contributions from AMD's AOCL BLIS (#448).
  
Details:
- Added support for level-3 operation gemmt, which performs a gemm on
  only the lower or upper triangle of a square matrix C. For now, only
  the conventional/large code path will be supported (in vanilla BLIS).
  This was accomplished by leveraging the existing variant logic for
  herk. However, some of the infrastructure to support a gemmtsup is
  included in this commit, including
  - A bli_gemmtsup() front-end, similar to bli_gemmsup().
  - A bli_gemmtsup_ref() reference handler function.
  - A bli_gemmtsup_int() variant chooser function (with variant calls
    commented out).
- Added support for inducing complex domain gemmt via the 1m method.
- Added gemmt APIs to the BLAS and CBLAS compatiblity layers.
- Added gemmt test module to testsuite.
- Added standalone gemmt test driver to 'test' directory.
- Documented gemmt APIs in BLISObjectAPI.md and BLISTypedAPI.md.
- Added a C++ template header (blis.hh) containing a BLAS-inspired
  wrapper to a set of polymorphic CBLAS-like function wrappers defined
  in another header (cblas.hh). These two headers are installed if
  running the 'install' target with INSTALL_HH is set to 'yes'. (Also
  added a set of unit tests that exercise blis.hh, although they are
  disabled for now because they aren't compatible with out-of-tree
  builds.) These files now live in the 'vendor' top-level directory.
- Various updates to 'zen' and 'zen2' subconfigurations, particularly
  within the context initialization functions.
- Added s and d copyv, setv, and swapv kernels to kernels/zen/1, and
  various minor updates to dotv and scalv kernels. Also added various
  sup kernels contributed by AMD to kernels/zen/3. However, these
  kernels are (for now) not yet used, in part because they caused
  AppVeyor clang failures, and also because I have not found time to
  review and vet them.
- Output the python found during configure into the definition of PYTHON
  in build/config.mk (via build/config.mk.in).
- Added early-return checks (A, B, or C with zero dimension; alpha = 0)
  to bli_gemm_front.c.
- Implemented explicit beta = 0 handling in for the sgemm ukernel in
  bli_gemm_armv7a_int_d4x4.c, which was previously missing. This latent
  bug surfaced because the gemmt module verifies its computation using
  gemm with its beta parameter set to zero, which, on a cortexa15 system
  caused the gemm kernel code to unconditionally multiply the
  uninitialized C data by beta. The C matrix likely contained
  non-numeric values such as NaN, which then would have resulted in a
  false failure.
- Fixed a bug whereby the implementation for bli_herk_determine_kc(),
  in bli_l3_blocksize.c, was inadvertantly being defined in terms of
  helper functions meant for trmm. This bug was probably harmless since
  the trmm code should have also done the right thing for herk.
- Used cpp macros to neutralize the various AOCL_DTL_TRACE_ macros in
  kernels/zen/3/bli_gemm_small.c since those macros are not used in
  vanilla BLIS.
- Added cpp guard to definition of bli_mem_clear() in bli_mem.h to
  accommodate C++'s stricter type checking.
- Added cpp guard to test/*.c drivers that facilitate compilation on
  Windows systems.
- Various whitespace changes.
2020-11-14 09:39:48 -06:00
phakumar
ccf0772d6e BLIS library porting on to Windows:
This library ported on Windows 10 using CMake scripts and Visual Studio 2019 with clang compiler
 AMD internal:[CPUPL-657]

Change-Id: Ie701f52ebc0e0585201ba703b6284ac94fc0feb9
2020-06-16 18:29:00 +05:30
Satish Balay
d560d105d2 OSX: specify the full path to the location of libblis.dylib (#390)
* OSX: specify the full path to the location of libblis.dylib so that it can be found at runtime

Before this change:

Appication gives runtime error [when linked with blis]
dyld: Library not loaded: libblis.3.dylib

balay@kpro lib % otool -L libblis.dylib
libblis.dylib:
        libblis.3.dylib (compatibility version 0.0.0, current version 0.0.0)
        /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1281.0.0)

After this change:
balay@kpro lib % otool -L libblis.dylib
libblis.dylib:
	/Users/balay/petsc/arch-darwin-c-debug/lib/libblis.3.dylib (compatibility version 0.0.0, current version 0.0.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1281.0.0)

* INSTALL_LIBDIR -> libdir as INSTALL_LIBDIR has DESTDIR

Co-Authored-By: Jed Brown <jed@jedbrown.org>

* CREDITS file update.

Co-authored-by: Jed Brown <jed@jedbrown.org>
Co-authored-by: Field G. Van Zee <field@cs.utexas.edu>
2020-05-21 11:54:54 +05:30
Field G. Van Zee
08709d4117 Removed sorting on LDFLAGS in common.mk (#373).
Details:
- Removed a line of code in common.mk that passed LDFLAGS through the
  sort function. The purpose was not to sort the contents, but rather
  to remove duplicates. However, there is valid syntax in a string of
  linker flags that, when sorted, yields different/broken behavior.
  So I've removed the line in common.mk that sorts LDFLAGS. Also, for
  future use, I've added a new function, rm-dupls, that removes
  duplicates without sorting. (This function was based on code from a
  stackoverflow thread that is linked to in the comments for that
  code.) Thanks to Isuru Fernando for reporting this issue (#373).

Change-Id: Ie355cc111fd2c6669f0c3088e8fa5dc7c407a3b9
2020-05-21 11:45:48 +05:30
Meghana Vankadari
4fcc4e499d Optimized DGEMV kernel and changed BLAS interface call
Details:
- Optimized daxpyf kernel with fuse_factor=5 and iter_unroll=2.
- Modified framework files of dgemv to remove dependency on cntx variable.
- Updated cntx_init file of zen2 to choose optimized kernels.
- Modified BLAS interface call for DGEMV to reduce framework overhread.
- Currently these changes are applicable for zen2 configuration.
  They will be enabled for zen family processors in future.
- Changed naming convention for new BLAS macros to indicate their use.
- Added new optimized kernel for axpyf under zen2 folder.
- Implemented basic GEMV kernel without using axpyv or axpyf.
  This kernel is chosen for small sizes.

Change-Id: I4278d37e494854879c71499b8b9da8c5dbe3bf5b
Signed-off-by: Meghana Vankadari <Meghana.Vankadari@amd.com>
AMD-Internal: [CPUPL-885]
2020-05-19 06:40:44 -04:00
dzambare
d40edf7dac Execution and Debug trace support.
Added support add debug logging, execution trace and decode.

Change-Id: I024bf6165daa9e23a62423f2401c0f1c5de459ba
AMD-Internal: [CPUPL-806]
2020-04-07 08:48:59 +05:30
Satish Balay
da0c086f46 OSX: specify the full path to the location of libblis.dylib (#390)
* OSX: specify the full path to the location of libblis.dylib so that it can be found at runtime

Before this change:

Appication gives runtime error [when linked with blis]
dyld: Library not loaded: libblis.3.dylib

balay@kpro lib % otool -L libblis.dylib
libblis.dylib:
        libblis.3.dylib (compatibility version 0.0.0, current version 0.0.0)
        /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1281.0.0)

After this change:
balay@kpro lib % otool -L libblis.dylib
libblis.dylib:
	/Users/balay/petsc/arch-darwin-c-debug/lib/libblis.3.dylib (compatibility version 0.0.0, current version 0.0.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1281.0.0)

* INSTALL_LIBDIR -> libdir as INSTALL_LIBDIR has DESTDIR

Co-Authored-By: Jed Brown <jed@jedbrown.org>

* CREDITS file update.

Co-authored-by: Jed Brown <jed@jedbrown.org>
Co-authored-by: Field G. Van Zee <field@cs.utexas.edu>
2020-03-31 17:09:41 -05:00
Field G. Van Zee
d626112b8d Removed sorting on LDFLAGS in common.mk (#373).
Details:
- Removed a line of code in common.mk that passed LDFLAGS through the
  sort function. The purpose was not to sort the contents, but rather
  to remove duplicates. However, there is valid syntax in a string of
  linker flags that, when sorted, yields different/broken behavior.
  So I've removed the line in common.mk that sorts LDFLAGS. Also, for
  future use, I've added a new function, rm-dupls, that removes
  duplicates without sorting. (This function was based on code from a
  stackoverflow thread that is linked to in the comments for that
  code.) Thanks to Isuru Fernando for reporting this issue (#373).
2020-01-15 13:27:02 -06:00
Kiran Varaganti
82ec21f1c7 Fix for CPUPL-541,When threading is enabled blis-mt library gets generated otherwise blis.a gets generated for sequential builds. However blis.h header file will differ for both type of libraries. The difference is about enable/disable of defining BLIS_ENABLE_OPENMP or BLIS_ENABLE_PTHREADS when threading is enabled. Appropriate header file needs to be included in the application.
Change-Id: I82a4ae2bf529eedd83868e059a43749714cbe246
2019-12-10 15:40:27 +05:30
prangana
3d20128aea Merge branch 'amd-staging-rome2.1' of ssh://git.amd.com:29418/cpulibraries/er/blis into amd-blis-cpp
Change-Id: I97a10ab7546d475474b0ff733bafb8248843c352
2019-11-21 00:54:16 +05:30
prangana
d63f9b7d7f checkcpp test rule in Makefile
Change-Id: If01fe55e258e563a96cd8da9ea93d21063b730c2
2019-11-21 00:52:47 +05:30
prangana
49c27040d1 Instll CPP Template headers
Change-Id: Ib15dc9bda08d1f3fdc68e31520daee90a287357c
2019-11-20 21:52:22 +05:30
Field G. Van Zee
6e9b1e0244 Adjust -fopenmp-simd for icc's preferred syntax.
Details:
- Use -qopenmp-simd instead of -fopenmp-simd when compiling with Intel
  icc. Recall that this option is used for SIMD auto-vectorization in
  reference kernels only. Support for the -f option has been completely
  deprecated and removed in newer versions of icc in favor of -q. Thanks
  to Victor Eijkhout for reporting this issue and suggesting the fix.
2019-08-23 14:18:09 +05:30
Field G. Van Zee
c90a1a8dca Inadvertantly hidden xerbla_() in blastest (#313).
Details:
- Attempted a fix to issue #313, which reports that when building only
  a shared library (ie: static library build is disabled), running the
  BLAS test drivers can fail because those drivers provide their own
  local version of xerbla_() as a clever (albeit still rather hackish)
  way of checking the error codes that result from the individual tests.
  This local xerbla_() function is never found at link-time because the
  BLAS test drivers' Makefile imports BLIS compilation flags via the
  get-user-cflags-for() function, which currently conveys the
  -fvisibility=hidden flag, which hides symbols unless they are
  explicitly annotated for export. The -fvisibility=hidden flag was
  only ever intended for use when building BLIS (not for applications),
  and so the attempted solution here is to omit the symbol export
  flag(s) from get-user-cflags-for() by storing the symbol export
  flag(s) to a new BULID_SYMFLAGS variable instead of appending it
  to the subconfigurations' CMISCFLAGS variable (which is returned by
  every get-*-cflags-for() function). Thanks to M. Zhou for reporting
  this issue and also to Isuru Fernando for suggesting the fix.
- Renamed BUILD_FLAGS to BUILD_CPPFLAGS to harmonize with the newly
  created BUILD_SYMFLAGS.
- Fixed typo in entry for --export-shared flag in 'configure --help'
  text.
2019-08-23 14:18:08 +05:30
Isuru Fernando
aa9d8e4a81 Test with shared on windows (#306)
Export macros can't support both shared and static at the same time.
When blis is built with both shared and static, headers assume that
shared is used at link time and dllimports the symbols with __imp_
prefix.

To use the headers with static libraries a user can give
-DBLIS_EXPORT= to import the symbol without the __imp_ prefix
2019-08-23 14:18:08 +05:30
Field G. Van Zee
2d1cd32c0f Renamed --enable-export-all to --export-shared=[].
Details:
- Replaced the existing --enable-export-all / --disable-export-all
  configure option with --export-shared=[public|all], with the 'public'
  instance of the latter corresponding to --disable-export-all and the
  'all' instance corresponding to --enable-export-all. Nothing else
  semantically about the option, or its default, has changed.
2019-08-23 14:18:07 +05:30
Field G. Van Zee
9ea223df2d Use -fvisibility=[...] with clang on Linux/BSD/OSX.
Details:
- Modified common.mk to use the -fvisibility=[hidden|default] option
  when compiling with clang on non-Windows platforms (Linux, BSD, OS X,
  etc.). Thanks to Isuru Fernando for pointing out this option works
  with clang on these OSes.
2019-08-23 14:18:07 +05:30
Field G. Van Zee
d8f6d17bce Support shared lib export of only public symbols.
Details:
- Introduced a new configure option, --enable-export-all, which will
  cause all shared library symbols to be exported by default, or,
  alternatively, --disable-export-all, which will cause all symbols to
  be hidden by default, with only those symbols that are annotated for
  visibility, via BLIS_EXPORT_BLIS (and BLIS_EXPORT_BLAS for BLAS
  symbols), to be exported. The default for this configure option is
  --disable-export-all. Thanks to Isuru Fernando for consulting on
  this commit.
- Removed BLIS_EXPORT_BLIS annotations from frame/1m/bli_l1m_unb_var1.h,
  which was intended for 5a5f494.
- Relocated BLIS_EXPORT-related cpp logic from bli_config.h.in to
  frame/include/bli_config_macro_defs.h.
- Provided appropriate logic within common.mk to implement variable
  symbol visibility for gcc, clang, and icc (to the extend that each of
  these compilers allow).
- Relocated --help text associated with debug option (-d) to configure
  slightly further down in the list.
2019-08-23 14:18:07 +05:30
Isuru Fernando
dcb0f50455 Export functions without def file (#303)
* Revert "restore bli_extern_defs exporting for now"

This reverts commit 09fb07c350b2acee17645e8e9e1b8d829c73dca8.

* Remove symbols not intended to be public

* No need of def file anymore

* Fix whitespace

* No need of configure option

* Remove export macro from definitions

* Remove blas export macro from definitions
2019-08-23 14:18:07 +05:30
Field G. Van Zee
50dc5d9576 Adjust -fopenmp-simd for icc's preferred syntax.
Details:
- Use -qopenmp-simd instead of -fopenmp-simd when compiling with Intel
  icc. Recall that this option is used for SIMD auto-vectorization in
  reference kernels only. Support for the -f option has been completely
  deprecated and removed in newer versions of icc in favor of -q. Thanks
  to Victor Eijkhout for reporting this issue and suggesting the fix.
2019-06-07 13:10:16 -05:00
Field G. Van Zee
abd8a9fa7d Inadvertantly hidden xerbla_() in blastest (#313).
Details:
- Attempted a fix to issue #313, which reports that when building only
  a shared library (ie: static library build is disabled), running the
  BLAS test drivers can fail because those drivers provide their own
  local version of xerbla_() as a clever (albeit still rather hackish)
  way of checking the error codes that result from the individual tests.
  This local xerbla_() function is never found at link-time because the
  BLAS test drivers' Makefile imports BLIS compilation flags via the
  get-user-cflags-for() function, which currently conveys the
  -fvisibility=hidden flag, which hides symbols unless they are
  explicitly annotated for export. The -fvisibility=hidden flag was
  only ever intended for use when building BLIS (not for applications),
  and so the attempted solution here is to omit the symbol export
  flag(s) from get-user-cflags-for() by storing the symbol export
  flag(s) to a new BULID_SYMFLAGS variable instead of appending it
  to the subconfigurations' CMISCFLAGS variable (which is returned by
  every get-*-cflags-for() function). Thanks to M. Zhou for reporting
  this issue and also to Isuru Fernando for suggesting the fix.
- Renamed BUILD_FLAGS to BUILD_CPPFLAGS to harmonize with the newly
  created BUILD_SYMFLAGS.
- Fixed typo in entry for --export-shared flag in 'configure --help'
  text.
2019-05-28 12:49:44 -05:00
Isuru Fernando
044df9506f Test with shared on windows (#306)
Export macros can't support both shared and static at the same time.
When blis is built with both shared and static, headers assume that
shared is used at link time and dllimports the symbols with __imp_
prefix.

To use the headers with static libraries a user can give
-DBLIS_EXPORT= to import the symbol without the __imp_ prefix
2019-03-27 12:39:31 -05:00
Field G. Van Zee
d202d008d5 Renamed --enable-export-all to --export-shared=[].
Details:
- Replaced the existing --enable-export-all / --disable-export-all
  configure option with --export-shared=[public|all], with the 'public'
  instance of the latter corresponding to --disable-export-all and the
  'all' instance corresponding to --enable-export-all. Nothing else
  semantically about the option, or its default, has changed.
2019-03-18 18:18:25 -05:00
Field G. Van Zee
6bfe3812e2 Use -fvisibility=[...] with clang on Linux/BSD/OSX.
Details:
- Modified common.mk to use the -fvisibility=[hidden|default] option
  when compiling with clang on non-Windows platforms (Linux, BSD, OS X,
  etc.). Thanks to Isuru Fernando for pointing out this option works
  with clang on these OSes.
2019-03-15 13:57:49 -05:00
Field G. Van Zee
e095926c64 Support shared lib export of only public symbols.
Details:
- Introduced a new configure option, --enable-export-all, which will
  cause all shared library symbols to be exported by default, or,
  alternatively, --disable-export-all, which will cause all symbols to
  be hidden by default, with only those symbols that are annotated for
  visibility, via BLIS_EXPORT_BLIS (and BLIS_EXPORT_BLAS for BLAS
  symbols), to be exported. The default for this configure option is
  --disable-export-all. Thanks to Isuru Fernando for consulting on
  this commit.
- Removed BLIS_EXPORT_BLIS annotations from frame/1m/bli_l1m_unb_var1.h,
  which was intended for 5a5f494.
- Relocated BLIS_EXPORT-related cpp logic from bli_config.h.in to
  frame/include/bli_config_macro_defs.h.
- Provided appropriate logic within common.mk to implement variable
  symbol visibility for gcc, clang, and icc (to the extend that each of
  these compilers allow).
- Relocated --help text associated with debug option (-d) to configure
  slightly further down in the list.
2019-03-13 17:35:18 -05:00
Isuru Fernando
766769eeb9 Export functions without def file (#303)
* Revert "restore bli_extern_defs exporting for now"

This reverts commit 09fb07c350b2acee17645e8e9e1b8d829c73dca8.

* Remove symbols not intended to be public

* No need of def file anymore

* Fix whitespace

* No need of configure option

* Remove export macro from definitions

* Remove blas export macro from definitions
2019-03-11 19:05:32 -05:00
Field G. Van Zee
bdd46f9ee8 Rewrote reference kernels to use #pragma omp simd.
Details:
- Rewrote level-1v, -1f, and -3 reference kernels in terms of simplified
  indexing annotated by the #pragma omp simd directive, which a compiler
  can use to vectorize certain constant-bounded loops. (The new kernels
  actually use _Pragma("omp simd") since the kernels are defined via
  templatizing macros.) Modest speedup was observed in most cases using
  gcc 5.4.0, which may improve with newer versions. Thanks to Devin
  Matthews for suggesting this via issue #286 and #259.
- Updated default blocksizes defined in ref_kernels/bli_cntx_ref.c to
  be 4x16, 4x8, 4x8, and 4x4 for single, double, scomplex and dcomplex,
  respectively, with a default row preference for the gemm ukernel. Also
  updated axpyf, dotxf, and dotxaxpyf fusing factors to 8, 6, and 4,
  respectively, for all datatypes.
- Modified configure to verify that -fopenmp-simd is a valid compiler
  option (via a new detect/omp_simd/omp_simd_detect.c file).
- Added a new header in which prefetch macros are defined according to
  which compiler is detected (via macros such as __GNUC__). These
  prefetch macros are not yet employed anywhere, though.
- Updated the year in copyrights of template license headers in
  build/templates and removed AMD as a default copyright holder.
2019-01-24 17:23:18 -06:00
Field G. Van Zee
0645f239fb Remove UT-Austin from copyright headers' clause 3.
Details:
- Removed explicit reference to The University of Texas at Austin in the
  third clause of the license comment blocks of all relevant files and
  replaced it with a more all-encompassing "copyright holder(s)".
- Removed duplicate words ("derived") from a few kernels' license
  comment blocks.
- Homogenized license comment block in kernels/zen/3/bli_gemm_small.c
  with format of all other comment blocks.
2018-12-04 14:31:06 -06:00
Isuru Fernando
2acd8dcd23 Fix install path of dll.a 2018-11-21 02:02:18 -06:00
Isuru Fernando
bafe521ed0 Fixes for mingw 2018-11-21 01:54:36 -06:00
Isuru Fernando
f6b924648c Don't use .def for gcc 2018-11-21 01:39:19 -06:00
Field G. Van Zee
6fbc456fb3 Added SALT testing to Travis CI.
Details:
- Modified .travis.yml to automatically employ the simulation of
  application-level threading within the testsuite, with supporting
  changes to common.mk, the top-level Makefile, and
  travis/do_testsuite.sh.
- Added a new pair of input files to testsuite directory with the
  '.salt' suffix (similar to those with the '.fast' suffix) for
  testing application-level threading.
- Updated docs/BuildSystem.md to document the new make targets
  'testblis-salt' and 'checkblis-salt'.
2018-10-25 13:20:25 -05:00
Field G. Van Zee
4ee986f0a7 Added mixed-datatype testing to Travis CI (#271).
Details:
- Modified .travis.yml to automatically test the mixed-datatype support
  of the gemm operation, with supporting changes to common.mk, the
  top-level Makefile, and travis/do_testsuite.sh.
- Added a new pair of input files to testsuite directory with the
  '.mixed' suffix (similar to those with the '.fast' suffix) for testing
  mixed-datatype gemm.
- Updated docs/BuildSystem.md to document the new make targets
  'testblis-md' and 'checkblis-md'.
2018-10-22 14:09:44 -05:00
Field G. Van Zee
ec67679990 Refreshed Windows symbol list; added regen script.
Details:
- Moved windows/build/libblis-symbols.def to build/libblis-symbols.def.
  Updated link commands in common.mk accordingly.
- Added a new script build/regen-symbols.sh that will regenerate the
  libblis-symbols.def file in its new location after building a
  haswell-targeted shared library. Thanks to Isuru Fernando for
  providing the symbol generation command.
- Ran the new script to refresh the symbols file.
2018-10-18 14:27:02 -05:00
Field G. Van Zee
5a1e461ffe Execute flatten-headers.py via $(PYTHON).
Details:
- Execute build/flatten-headers.py python script via $(PYTHON) in
  common.mk. This allows distributions that define the current/preferred
  python interpreter in the PYTHON environment variable to use that
  interpreter when executing flatten-headers.py. Thanks to Isuru
  Fernando for this suggestion, and for Dave Love for submitting the
  initial issue/request.
2018-10-16 14:21:45 -05:00
Field G. Van Zee
c03728f1f4 Various minor cleanups.
Details:
- Rewrote bli_winsys.c to define bli_setenv() and bli_sleep()
  unconditionally, but differently for Windows and non-Windows, but
  then disabled the definition of bli_setenv() entirely since BLIS
  no longer needs to set environment variables. Updated bli_winsys.h
  accordingly, and call bli_sleep() from within testsuite instead of
  sleep() directly.
- Use
    #if !defined(_POSIX_BARRIERS) || (_POSIX_BARRIERS != 200809L)
  instead of
    #if !defined(_POSIX_BARRIERS) || (_POSIX_BARRIERS < 0)
  when guarding against local definition of pthread barrier in
  testsuite. (The description for unistd.h implies that _POSIX_BARRIERS
  should always be set to 200809L when barriers are supported, though I
  won't be surprised if we encounter a case in the future where it is
  set to something else such as 1 while still supported.)
- Removed old _VERS_CONF_INST definitions and installation rules in
  top-level Makefile. These are no longer needed because we no longer
  output libraries with the version and configuration name as
  substrings.
- Comment/whitespace updates in Makefile, config.mk.in, common.mk,
  configure, bli_extern_defs.h, and test_libblis.h.
- Added mention of 1m to README.md and other trivial tweaks.
2018-09-10 17:54:27 -05:00
Isuru Fernando
e93b01ff60 Windows DLL support (#246)
* Enable shared

* Enable rdp

* Add support for dll

* Use libblis-symbols.def

* Fix building dlls

* Fix libblis-symbols.def

* Fix soname

* Fix Makefile error

* Fix install target

* Fix missing symbols

* Add BLIS_MINUS_TWO

* Add path to dll

* Fix OSX soname

* Add declspec for dll

* Add -DBLIS_BUILD_DLL

* Replace @enable_shared@ in config

* switch to auto for now

* blis_ -> bli_

* Remove BLIS_BUILD_DLL in make check

* change auto->haswell

* enable_shared_01

* Add wno-macro-redefined

* print out.cblat3

* BLIS_BUILD_DLL -> BLIS_IS_BUILDING_LIBRARY

* Use V=1

* Remove fpic for windows

* Remember LIBPTHREAD

* Remove libm for windows

* Remember AR

* Fix remembering libpthread

* Add Wno-maybe-uninitialized in only gcc

* Don't do blastest for shared for now

* Fix install target

And remove unnecessary change

* test auto and x86_64

* Fix install target again

* Use IS_WIN variable

* Remove leading dot from LIBBLIS_SO_MAJ_EXT

* Make is_win yes/no

* Add comments for windows builds

* Change if else blocks location
2018-09-09 15:57:43 -05:00
Field G. Van Zee
4b5437ec7a Define a cpp macro specific to BLIS compilation.
Details:
- Tweaked the cflags functions in common.mk so that a new preprocessor
  macro, BLIS_IS_BUILDING_LIBRARY, is defined, but only when BLIS
  itself is being built. This macro will not be defined when, for
  example, the testsuite or example code compiles code local to those
  applications. This was done in part by defining a new cflags function
  get-user-cflags-for(), which is now the designated function for
  application Makefiles if they wish to inherit a basic set of CFLAGS
  from BLIS. (The compiler flags returned are identical to that of
  get-frame-cflags-for() except that -DBLIS_IS_BUILDING_LIBRARY is
  omitted.)
- Updated all test driver-like makefiles to call get-user-cflags-for()
  instead of get-frame-cflags-for().
2018-09-07 17:24:32 -05:00
Field G. Van Zee
0f491e994a Allow lesser Makefiles to reference installed BLIS.
Details:
- Updated the build system so that "lesser" Makefiles, such as those in
  belonging to example code or the testsuite, may be run even if the
  directory is orphaned from the original build tree. This allows a
  user to configure, compile, and install BLIS, delete the build tree
  (that is, the source distribution, or the build directory for out-
  of-tree builds) and then compile example or testsuite code and link
  against the installed copy of BLIS (provided the example or testsuite
  directory was preserved or obtained from another source). The only
  requirement is that make be invoked while setting the
  BLIS_INSTALL_PATH variable to the same installation prefix used when
  BLIS was configured. The easiest syntax is:

    make BLIS_INSTALL_PATH=/install/prefix

  though it's also permissible to set BLIS_INSTALL_PATH as an
  environment variable prior to running 'make'.
- Updated all lesser Makefiles to implement the new aforementioned build
  behavior.
- Relocated check-blastest.sh and check-blistest.sh from build to
  blastest and testsuite, respectively, so that if those directories are
  copied elsewhere the user can still run 'make check' locally.
- Updated docs/Testsuite.md with language that mentions this new option
  of building/linking against an installed copy of BLIS.
2018-08-25 20:12:36 -05:00
Field G. Van Zee
ac17454aae Merge branch 'master' into dev 2018-08-22 15:34:53 -05:00
Field G. Van Zee
a77bec766a Whitespace changes, minor renames in build system.
Details:
- Minor whitespace cleanup, mostly in the form of spaces -> tabs.
- Shortened certain variables' _FRAGMENT_ infixes to _FRAG_ in
  common.mk.
2018-08-22 15:31:29 -05:00
Devin Matthews
1b0f8d60d1 Generate makefile fragments in build tree (#240)
* Make src dir read-only in out-of-tree build test.

* Generate makefile fragments in the build tree.
2018-08-22 15:19:29 -05:00