Commit Graph

112 Commits

Author SHA1 Message Date
Edward Smyth
c6f3340125 Merge commit '5013a6cb' into amd-main
* commit '5013a6cb':
  More edits and fixes to docs/FAQ.md.
  Fixed newly broken link to CREDITS in FAQ.md.
  More minor fixes to FAQ.md and Sandboxes.md.
  Updates to FAQ.md, Sandboxes.md, and README.md.
  Safelist 'master', 'dev', 'amd' branches.
  Re-enable and fix fb93d24.
  Reverted fb93d24.
  Re-enable and fix 8e0c425 (BLIS_ENABLE_SYSTEM).
  Removed last vestige of #define BLIS_NUM_ARCHS.
  Added new packm var3 to 'gemmlike'.
  Fix problem where uninitialized registers are included in vhaddpd in the Mx1 gemmsup kernels for haswell.
  Fix more copy-paste errors in the haswell gemmsup code.
  Do a fast test on OSX. [ci skip]
  Fix AArch64 tests and consolidate some other tests.
  Use C++ cross-compiler for ARM tests.
  Attempt to fix cxx-test for OOT builds.
  Updated travis-ci.org link in README.md to .com.
  Disabled (at least temporarily) commit 8e0c425.
  Define BLIS_OS_NONE when using --disable-system.
  Updated stale calls to malloc_intl() in gemmlike.
  Blacklist clang10/gcc9 and older for 'armsve'.
  Add test to Travis using C++ compiler to make sure blis.h is C++-compatible.
  Moved lang defs from _macro_def.h to _lang_defs.h.
  Minor tweaks to gemmlike sandbox.
  Added local _check() code to gemmlike sandbox.
  README.md citation updates (e.g. BLIS7 bibtex).
  Tweaks to gemmlike to facilitate 3rd party mods.
  Whitespace tweaks.
  Add row- and column-strides for A/B in obj_ukr_fn_t.
  Clean up some warnings that show up on clang/OSX.
  Remove schema field on obj_t (redundant) and add new API functions.
  Add dependency on the "flat" blis.h file for the BLIS and BLAS testsuite objects.
  Disabled sanity check in bli_pool_finalize().
  Implement proposed new function pointer fields for obj_t.

AMD-Internal: [CPUPL-2698]
Change-Id: I6fc33351fa824580cf4f25b63f0370383cd9422d
2023-11-10 13:05:12 -05:00
Edward Smyth
834bf604c1 Option to use shared library for BLIS tests
Current BLIS makefile always uses the static library on Linux for
all BLIS test programs. This commit adds the option to use the shared
library instead by specifying e.g.

make checkblis USE_SHARED=yes

Executables are generated in different sub-directories for static
and shared libraries.

AMD-Internal: [CPUPL-4107]
Change-Id: I3ab5d505cfbc5f6ef47aa28fcbb846c52d56c3f2
2023-10-27 11:32:00 -04:00
Edward Smyth
f5505be9f3 Merge commit 'e366665c' into amd-main
* commit 'e366665c':
  Fixed stale API calls to membrk API in gemmlike.
  Fixed bli_init.c compile-time error on OSX clang.
  Fixed configure breakage on OSX clang.
  Fixed one-time use property of bli_init() (#525).
  CREDITS file update.
  Added Graviton2 Neoverse N1 performance results.
  Remove unnecesary windows/zen2 directory.
  Add vzeroupper to Haswell microkernels. (#524)
  Fix Win64 AVX512 bug.
  Add comment about make checkblas on Windows
  CREDITS file update.
  Test installation in Travis CI
  Add symlink to blis.pc.in for out-of-tree builds
  Revert "Always run `make check`."
  Always run `make check`.
  Fixed configure script bug. Details: - Fixed kernel list string substitution error by adding function substitute_words in configure script.   if the string contains zen and zen2, and zen need to be replaced with another string, then zen2   also be incorrectly replaced.
  Update POWER10.md
  Rework POWER10 sandbox
  Skip clearing temp microtile in gemmlike sandbox.
  Fix asm warning
  Sandbox header edits trigger full library rebuild.
  Add vhsubpd/vhsubpd.
  Fixed bugs in cpackm kernels, gemmlike code.
  Armv8A Rename Regs for Safe Darwin Compile
  Armv8A Rename Regs for Clang Compile: FP32 Part
  Armv8A Rename Regs for Clang Compile: FP64 Part
  Asm Flag Mingling for Darwin_Aarch64
  Added a new 'gemmlike' sandbox.
  Updated Fugaku (a64fx) performance results.
  Add explicit compiler check for Windows.
  Remove `rm-dupls` function in common.mk.
  Travis CI Revert Unnecessary Extras from 91d3636
  Adjust TravisCI
  Travis Support Arm SVE
  Added 512b SVE-based a64fx subconfig + SVE kernels.
  Replace bli_dlamch with something less archaic (#498)
  Allow clang for ThunderX2 config

AMD-Internal: [CPUPL-2698]
Change-Id: I561ca3959b7049a00cc128dee3617be51ae11bc4
2023-10-18 09:09:54 -04:00
mkadavil
f4702debb9 Zen4 compilation flag updates to support low precision gemm.
- BFloat16 flags added to zen4 make_defs in order to enable
compilation of low precision gemm by using zen4 config.
- Avoid -ftree-partial-pre optimization flag with gcc due to
non optimal code generation for intrinsics based kernels in
low precision gemm.
- Enable only Zen3 specific low precision gemm kernels (s16)
compilation when aocl_gemm addon is compiled on Zen3 machines.

AMD-Internal: [CPUPL-1545]
Change-Id: Id3be3410bfbf141bb6fc4b4e3391115a4e0bb79f
2022-09-29 08:19:40 -04:00
Field G. Van Zee
4c3dd93eba Use kernel CFLAGS for 'kernels' subdirs in addons. (#658)
Details:
- Updated Makefile and common.mk so that the targeted configuration's
  kernel CFLAGS are applied to source files that are found in a
  'kernels' subdirectory within an enabled addon. For now, this
  behavior only applies when the 'kernels' directory is at the top
  level of the addon directory structure. For example, if there is an
  addon named 'foobar', the source code must be located in
  addon/foobar/kernels/ in order for it to be compiled with the target
  configurations's kernel CFLAGS. Any other source code within
  addon/foobar/ will be compiled with general-purpose CFLAGS (the same
  ones that were used on all addon code prior to this commit). Thanks
  to AMD (esp. Mithun Mohan) for suggesting this change and catching an
  intermediate bug in the PR.
- Comment/whitespace updates.

(cherry picked from commit fd885cf98f)
Change-Id: I9a678f78bde90b23a6293ce90377004876f51067
2022-09-27 06:37:38 -04:00
Dipal M. Zambare
c85bbfdb50 Updated BLIS version string format
- Updated version string to match the recommended format
  “AOCL-BLIS 3.2.1 Build 20220727”.
- Fixed issues with include paths which was preventing compile time
  version sting definition passing via build commands.
- Removed version string determination based on git tag
  using ‘git describe’, version string will always be taken from
  the version file.

AMD-Internal: [CPUPL-2324]
Change-Id: Idc7edf1211f66d348ec3b5b43f2507c2b810f088
2022-08-12 05:53:35 +00:00
Field G. Van Zee
7a0ba4194f Added support for addons.
Details:
- Implemented a new feature called addons, which are similar to
  sandboxes except that there is no requirement to define gemm or any
  other particular operation.
- Updated configure to accept --enable-addon=<name> or -a <name> syntax
  for requesting an addon be included within a BLIS build. configure now
  outputs the list of enabled addons into config.mk. It also outputs the
  corresponding #include directives for the addons' headers to a new
  companion to the bli_config.h header file named bli_addon.h. Because
  addons may wish to make use of existing BLIS types within their own
  definitions, the addons' headers must be included sometime after that
  of bli_config.h (which currently is #included before bli_type_defs.h).
  This is why the #include directives needed to go into a new top-level
  header file rather than the existing bli_config.h file.
- Added a markdown document, docs/Addons.md, to explain addons, how to
  build with them, and what assumptions their authors should keep in
  mind as they create them.
- Added a gemmlike-like implementation of sandwich gemm called 'gemmd'
  as an addon in addon/gemmd. The code uses a 'bao_' prefix for local
  functions, including the user-level object and typed APIs.
- Updated .gitignore so that git ignores bli_addon.h files.

Change-Id: Ie7efdea366481ce25075cb2459bdbcfd52309717
2022-03-31 12:03:27 +05:30
Dipal M Zambare
3364c0e4eb Binary and dynamic dispatch configuration name change
-- Reverted changes made to include lp/ilp info in binary name
     This reverts commit c5e6f885f0.

  -- Included BLAS int size in 'make showconfig'

  -- Renamed amdepyc configuration to amdzen

Change-Id: Ie87ec1c03e105f606aef1eac397ba0d8338906a6
2021-11-12 08:58:56 +05:30
Dipal M Zambare
5d287fdba0 Include LP64/ILP64 in BLIS binary name
Binary name will be chosen based on multi-threading and BLAS
  integer size configuration as given below.

  libblis-[mt]-lp64 - when configured to use 32 bit integers
  libblis-[mt]-ilp64 - when configured to use 64 bit integers

AMD-Internal: [CPUPL-1879]
Change-Id: I865023c63235a0a72bdfce7057b2cfb8158b1d87
2021-11-12 08:58:51 +05:30
Dipal M Zambare
d1fb770f1c Removed duplicate cpp and testcpp folders.
The cpp and testcpp folder exists in root directory as well as vendor
directory. Only folder in vendor directory are needed.

Removed duplicate directories and updated makefiles to pick the
sources from vendor folder.

AMD-Internal: [CPUPL-1834]
Change-Id: I178043a09fd746660938b89ecce73c53d6c53409
2021-11-12 08:58:49 +05:30
Devin Matthews
4f70eb7913 Clean up some warnings that show up on clang/OSX. 2021-08-13 11:12:43 -05:00
Dipal M Zambare
333fe4ca8b Makefile cleanup
Removed unused function rm-dupls() from common.mk
    Removed code from patch-ld-so.py which is not needed for AMD codebase.

AMD-Internal: [CPUPL-1539]
Change-Id: If1812d5aa87c1e3a9d0c4706d571223d56f2fc20
2021-07-02 01:20:01 -04:00
Devin Matthews
859fb77a32 Remove rm-dupls function in common.mk.
AMD requested removal due to unclear licensing terms; original code was from stackoverflow. The function is unused but could easily be replaced by new implementation.
2021-05-23 18:15:23 -05:00
lcpu
7401effc03 BLIS:merge:
Merge conflicts araised has been fixed while downstreaming BLIS code from master to milan-3.1 branch

Implemented an automatic reduction in the number of threads when the user requests parallelism via a single number (ie: the automatic way) and (a) that number of threads is prime, and (b) that number exceeds a minimum threshold defined by the macro BLIS_NT_MAX_PRIME, which defaults to 11. If prime numbers are really desired, this feature may be suppressed by defining the macro BLIS_ENABLE_AUTO_PRIME_NUM_THREADS in the appropriate configuration family's bli_family_*.h. (Jeff Diamond)

Changed default value of BLIS_THREAD_RATIO_M from 2 to 1, which leads to slightly different automatic thread factorizations.

Enable the 1m method only if the real domain microkernel is not a reference kernel. BLIS now forgoes use of 1m if both the real and complex domain kernels are reference implementations.

Relocated the general stride handling for gemmsup. This fixed an issue whereby gemm would fail to trigger to conventional code path for cases that use general stride even after gemmsup rejected the problem. (RuQing Xu)

Fixed an incorrect function signature (and prototype) of bli_?gemmt(). (RuQing Xu)

Redefined BLIS_NUM_ARCHS to be part of the arch_t enum, which means it will be updated automatically when defining future subconfigs.

Minor code consolidation in all level-3 _front() functions.

Reorganized Windows cpp branch of bli_pthreads.c.

Implemented bli_pthread_self() and _equals(), but left them commented out (via cpp guards) due to issues with getting the Windows versions working. Thankfully, these functions aren't yet needed by BLIS.

Allow disabling of trsm diagonal pre-inversion at compile time via --disable-trsm-preinversion.

Fixed obscure testsuite bug for the gemmt test module that relates to its dependency on gemv.

AMD-internal-[CPUPL-1523]

Change-Id: I0d1df018e2df96a23dc4383d01d98b324d5ac5cd
2021-04-27 11:09:48 +05:30
dzambare
48f2366b6f Updated BLIS version string to "AOCL BLIS X.x" format
AMD-Internal : [CPUPL-1394]

Change-Id: Ifebcb14d9eb064d231b831f5a1e151853ad5a009
2021-01-07 12:38:32 +05:30
Field G. Van Zee
9bb23e6c2a Added support for systemless build (no pthreads).
Details:
- Added a configure option, --[enable|disable]-system, which determines
  whether the modest operating system dependencies in BLIS are included.
  The most notable example of this on Linux and BSD/OSX is the use of
  POSIX threads to ensure thread safety for when application-level
  threads call BLIS. When --disable-system is given, the bli_pthreads
  implementation is dummied out entirely, allowing the calling code
  within BLIS to remain unchanged. Why would anyone want to build BLIS
  like this? The motivating example was submitted via #454 in which a
  user wanted to build BLIS for a simulator such as gem5 where thread
  safety may not be a concern (and where the operating system is largely
  absent anyway). Thanks to Stepan Nassyr for suggesting this feature.
- Another, more minor side effect of the --disable-system option is that
  the implementation of bli_clock() unconditionally returns 0.0 instead
  of the time elapsed since some fixed point in the past. The reasoning
  for this is that if the operating system is truly minimal, the system
  function call upon which bli_clock() would normally be implemented
  (e.g. clock_gettime()) may not be available.
- Refactored preprocess-guarded code in bli_pthread.c and bli_pthread.h
  to remove redundancies.
- Removed old comments and commented #include of "bli_pthread_wrap.h"
  from bli_system.h.
- Documented bli_clock() and bli_clock_min_diff() in BLISObjectAPI.md
  and BLISTypedAPI.md, with a note that both are non-functional when
  BLIS is configured with --disable-system.
2020-11-16 15:55:45 -06:00
Field G. Van Zee
88ad841434 Squash-merge 'pr' into 'squash'. (#457)
Merged contributions from AMD's AOCL BLIS (#448).
  
Details:
- Added support for level-3 operation gemmt, which performs a gemm on
  only the lower or upper triangle of a square matrix C. For now, only
  the conventional/large code path will be supported (in vanilla BLIS).
  This was accomplished by leveraging the existing variant logic for
  herk. However, some of the infrastructure to support a gemmtsup is
  included in this commit, including
  - A bli_gemmtsup() front-end, similar to bli_gemmsup().
  - A bli_gemmtsup_ref() reference handler function.
  - A bli_gemmtsup_int() variant chooser function (with variant calls
    commented out).
- Added support for inducing complex domain gemmt via the 1m method.
- Added gemmt APIs to the BLAS and CBLAS compatiblity layers.
- Added gemmt test module to testsuite.
- Added standalone gemmt test driver to 'test' directory.
- Documented gemmt APIs in BLISObjectAPI.md and BLISTypedAPI.md.
- Added a C++ template header (blis.hh) containing a BLAS-inspired
  wrapper to a set of polymorphic CBLAS-like function wrappers defined
  in another header (cblas.hh). These two headers are installed if
  running the 'install' target with INSTALL_HH is set to 'yes'. (Also
  added a set of unit tests that exercise blis.hh, although they are
  disabled for now because they aren't compatible with out-of-tree
  builds.) These files now live in the 'vendor' top-level directory.
- Various updates to 'zen' and 'zen2' subconfigurations, particularly
  within the context initialization functions.
- Added s and d copyv, setv, and swapv kernels to kernels/zen/1, and
  various minor updates to dotv and scalv kernels. Also added various
  sup kernels contributed by AMD to kernels/zen/3. However, these
  kernels are (for now) not yet used, in part because they caused
  AppVeyor clang failures, and also because I have not found time to
  review and vet them.
- Output the python found during configure into the definition of PYTHON
  in build/config.mk (via build/config.mk.in).
- Added early-return checks (A, B, or C with zero dimension; alpha = 0)
  to bli_gemm_front.c.
- Implemented explicit beta = 0 handling in for the sgemm ukernel in
  bli_gemm_armv7a_int_d4x4.c, which was previously missing. This latent
  bug surfaced because the gemmt module verifies its computation using
  gemm with its beta parameter set to zero, which, on a cortexa15 system
  caused the gemm kernel code to unconditionally multiply the
  uninitialized C data by beta. The C matrix likely contained
  non-numeric values such as NaN, which then would have resulted in a
  false failure.
- Fixed a bug whereby the implementation for bli_herk_determine_kc(),
  in bli_l3_blocksize.c, was inadvertantly being defined in terms of
  helper functions meant for trmm. This bug was probably harmless since
  the trmm code should have also done the right thing for herk.
- Used cpp macros to neutralize the various AOCL_DTL_TRACE_ macros in
  kernels/zen/3/bli_gemm_small.c since those macros are not used in
  vanilla BLIS.
- Added cpp guard to definition of bli_mem_clear() in bli_mem.h to
  accommodate C++'s stricter type checking.
- Added cpp guard to test/*.c drivers that facilitate compilation on
  Windows systems.
- Various whitespace changes.
2020-11-14 09:39:48 -06:00
phakumar
ccf0772d6e BLIS library porting on to Windows:
This library ported on Windows 10 using CMake scripts and Visual Studio 2019 with clang compiler
 AMD internal:[CPUPL-657]

Change-Id: Ie701f52ebc0e0585201ba703b6284ac94fc0feb9
2020-06-16 18:29:00 +05:30
Satish Balay
d560d105d2 OSX: specify the full path to the location of libblis.dylib (#390)
* OSX: specify the full path to the location of libblis.dylib so that it can be found at runtime

Before this change:

Appication gives runtime error [when linked with blis]
dyld: Library not loaded: libblis.3.dylib

balay@kpro lib % otool -L libblis.dylib
libblis.dylib:
        libblis.3.dylib (compatibility version 0.0.0, current version 0.0.0)
        /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1281.0.0)

After this change:
balay@kpro lib % otool -L libblis.dylib
libblis.dylib:
	/Users/balay/petsc/arch-darwin-c-debug/lib/libblis.3.dylib (compatibility version 0.0.0, current version 0.0.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1281.0.0)

* INSTALL_LIBDIR -> libdir as INSTALL_LIBDIR has DESTDIR

Co-Authored-By: Jed Brown <jed@jedbrown.org>

* CREDITS file update.

Co-authored-by: Jed Brown <jed@jedbrown.org>
Co-authored-by: Field G. Van Zee <field@cs.utexas.edu>
2020-05-21 11:54:54 +05:30
Field G. Van Zee
08709d4117 Removed sorting on LDFLAGS in common.mk (#373).
Details:
- Removed a line of code in common.mk that passed LDFLAGS through the
  sort function. The purpose was not to sort the contents, but rather
  to remove duplicates. However, there is valid syntax in a string of
  linker flags that, when sorted, yields different/broken behavior.
  So I've removed the line in common.mk that sorts LDFLAGS. Also, for
  future use, I've added a new function, rm-dupls, that removes
  duplicates without sorting. (This function was based on code from a
  stackoverflow thread that is linked to in the comments for that
  code.) Thanks to Isuru Fernando for reporting this issue (#373).

Change-Id: Ie355cc111fd2c6669f0c3088e8fa5dc7c407a3b9
2020-05-21 11:45:48 +05:30
Meghana Vankadari
4fcc4e499d Optimized DGEMV kernel and changed BLAS interface call
Details:
- Optimized daxpyf kernel with fuse_factor=5 and iter_unroll=2.
- Modified framework files of dgemv to remove dependency on cntx variable.
- Updated cntx_init file of zen2 to choose optimized kernels.
- Modified BLAS interface call for DGEMV to reduce framework overhread.
- Currently these changes are applicable for zen2 configuration.
  They will be enabled for zen family processors in future.
- Changed naming convention for new BLAS macros to indicate their use.
- Added new optimized kernel for axpyf under zen2 folder.
- Implemented basic GEMV kernel without using axpyv or axpyf.
  This kernel is chosen for small sizes.

Change-Id: I4278d37e494854879c71499b8b9da8c5dbe3bf5b
Signed-off-by: Meghana Vankadari <Meghana.Vankadari@amd.com>
AMD-Internal: [CPUPL-885]
2020-05-19 06:40:44 -04:00
dzambare
d40edf7dac Execution and Debug trace support.
Added support add debug logging, execution trace and decode.

Change-Id: I024bf6165daa9e23a62423f2401c0f1c5de459ba
AMD-Internal: [CPUPL-806]
2020-04-07 08:48:59 +05:30
Satish Balay
da0c086f46 OSX: specify the full path to the location of libblis.dylib (#390)
* OSX: specify the full path to the location of libblis.dylib so that it can be found at runtime

Before this change:

Appication gives runtime error [when linked with blis]
dyld: Library not loaded: libblis.3.dylib

balay@kpro lib % otool -L libblis.dylib
libblis.dylib:
        libblis.3.dylib (compatibility version 0.0.0, current version 0.0.0)
        /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1281.0.0)

After this change:
balay@kpro lib % otool -L libblis.dylib
libblis.dylib:
	/Users/balay/petsc/arch-darwin-c-debug/lib/libblis.3.dylib (compatibility version 0.0.0, current version 0.0.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1281.0.0)

* INSTALL_LIBDIR -> libdir as INSTALL_LIBDIR has DESTDIR

Co-Authored-By: Jed Brown <jed@jedbrown.org>

* CREDITS file update.

Co-authored-by: Jed Brown <jed@jedbrown.org>
Co-authored-by: Field G. Van Zee <field@cs.utexas.edu>
2020-03-31 17:09:41 -05:00
Field G. Van Zee
d626112b8d Removed sorting on LDFLAGS in common.mk (#373).
Details:
- Removed a line of code in common.mk that passed LDFLAGS through the
  sort function. The purpose was not to sort the contents, but rather
  to remove duplicates. However, there is valid syntax in a string of
  linker flags that, when sorted, yields different/broken behavior.
  So I've removed the line in common.mk that sorts LDFLAGS. Also, for
  future use, I've added a new function, rm-dupls, that removes
  duplicates without sorting. (This function was based on code from a
  stackoverflow thread that is linked to in the comments for that
  code.) Thanks to Isuru Fernando for reporting this issue (#373).
2020-01-15 13:27:02 -06:00
Kiran Varaganti
82ec21f1c7 Fix for CPUPL-541,When threading is enabled blis-mt library gets generated otherwise blis.a gets generated for sequential builds. However blis.h header file will differ for both type of libraries. The difference is about enable/disable of defining BLIS_ENABLE_OPENMP or BLIS_ENABLE_PTHREADS when threading is enabled. Appropriate header file needs to be included in the application.
Change-Id: I82a4ae2bf529eedd83868e059a43749714cbe246
2019-12-10 15:40:27 +05:30
prangana
3d20128aea Merge branch 'amd-staging-rome2.1' of ssh://git.amd.com:29418/cpulibraries/er/blis into amd-blis-cpp
Change-Id: I97a10ab7546d475474b0ff733bafb8248843c352
2019-11-21 00:54:16 +05:30
prangana
d63f9b7d7f checkcpp test rule in Makefile
Change-Id: If01fe55e258e563a96cd8da9ea93d21063b730c2
2019-11-21 00:52:47 +05:30
prangana
49c27040d1 Instll CPP Template headers
Change-Id: Ib15dc9bda08d1f3fdc68e31520daee90a287357c
2019-11-20 21:52:22 +05:30
Field G. Van Zee
6e9b1e0244 Adjust -fopenmp-simd for icc's preferred syntax.
Details:
- Use -qopenmp-simd instead of -fopenmp-simd when compiling with Intel
  icc. Recall that this option is used for SIMD auto-vectorization in
  reference kernels only. Support for the -f option has been completely
  deprecated and removed in newer versions of icc in favor of -q. Thanks
  to Victor Eijkhout for reporting this issue and suggesting the fix.
2019-08-23 14:18:09 +05:30
Field G. Van Zee
c90a1a8dca Inadvertantly hidden xerbla_() in blastest (#313).
Details:
- Attempted a fix to issue #313, which reports that when building only
  a shared library (ie: static library build is disabled), running the
  BLAS test drivers can fail because those drivers provide their own
  local version of xerbla_() as a clever (albeit still rather hackish)
  way of checking the error codes that result from the individual tests.
  This local xerbla_() function is never found at link-time because the
  BLAS test drivers' Makefile imports BLIS compilation flags via the
  get-user-cflags-for() function, which currently conveys the
  -fvisibility=hidden flag, which hides symbols unless they are
  explicitly annotated for export. The -fvisibility=hidden flag was
  only ever intended for use when building BLIS (not for applications),
  and so the attempted solution here is to omit the symbol export
  flag(s) from get-user-cflags-for() by storing the symbol export
  flag(s) to a new BULID_SYMFLAGS variable instead of appending it
  to the subconfigurations' CMISCFLAGS variable (which is returned by
  every get-*-cflags-for() function). Thanks to M. Zhou for reporting
  this issue and also to Isuru Fernando for suggesting the fix.
- Renamed BUILD_FLAGS to BUILD_CPPFLAGS to harmonize with the newly
  created BUILD_SYMFLAGS.
- Fixed typo in entry for --export-shared flag in 'configure --help'
  text.
2019-08-23 14:18:08 +05:30
Isuru Fernando
aa9d8e4a81 Test with shared on windows (#306)
Export macros can't support both shared and static at the same time.
When blis is built with both shared and static, headers assume that
shared is used at link time and dllimports the symbols with __imp_
prefix.

To use the headers with static libraries a user can give
-DBLIS_EXPORT= to import the symbol without the __imp_ prefix
2019-08-23 14:18:08 +05:30
Field G. Van Zee
2d1cd32c0f Renamed --enable-export-all to --export-shared=[].
Details:
- Replaced the existing --enable-export-all / --disable-export-all
  configure option with --export-shared=[public|all], with the 'public'
  instance of the latter corresponding to --disable-export-all and the
  'all' instance corresponding to --enable-export-all. Nothing else
  semantically about the option, or its default, has changed.
2019-08-23 14:18:07 +05:30
Field G. Van Zee
9ea223df2d Use -fvisibility=[...] with clang on Linux/BSD/OSX.
Details:
- Modified common.mk to use the -fvisibility=[hidden|default] option
  when compiling with clang on non-Windows platforms (Linux, BSD, OS X,
  etc.). Thanks to Isuru Fernando for pointing out this option works
  with clang on these OSes.
2019-08-23 14:18:07 +05:30
Field G. Van Zee
d8f6d17bce Support shared lib export of only public symbols.
Details:
- Introduced a new configure option, --enable-export-all, which will
  cause all shared library symbols to be exported by default, or,
  alternatively, --disable-export-all, which will cause all symbols to
  be hidden by default, with only those symbols that are annotated for
  visibility, via BLIS_EXPORT_BLIS (and BLIS_EXPORT_BLAS for BLAS
  symbols), to be exported. The default for this configure option is
  --disable-export-all. Thanks to Isuru Fernando for consulting on
  this commit.
- Removed BLIS_EXPORT_BLIS annotations from frame/1m/bli_l1m_unb_var1.h,
  which was intended for 5a5f494.
- Relocated BLIS_EXPORT-related cpp logic from bli_config.h.in to
  frame/include/bli_config_macro_defs.h.
- Provided appropriate logic within common.mk to implement variable
  symbol visibility for gcc, clang, and icc (to the extend that each of
  these compilers allow).
- Relocated --help text associated with debug option (-d) to configure
  slightly further down in the list.
2019-08-23 14:18:07 +05:30
Isuru Fernando
dcb0f50455 Export functions without def file (#303)
* Revert "restore bli_extern_defs exporting for now"

This reverts commit 09fb07c350b2acee17645e8e9e1b8d829c73dca8.

* Remove symbols not intended to be public

* No need of def file anymore

* Fix whitespace

* No need of configure option

* Remove export macro from definitions

* Remove blas export macro from definitions
2019-08-23 14:18:07 +05:30
Field G. Van Zee
50dc5d9576 Adjust -fopenmp-simd for icc's preferred syntax.
Details:
- Use -qopenmp-simd instead of -fopenmp-simd when compiling with Intel
  icc. Recall that this option is used for SIMD auto-vectorization in
  reference kernels only. Support for the -f option has been completely
  deprecated and removed in newer versions of icc in favor of -q. Thanks
  to Victor Eijkhout for reporting this issue and suggesting the fix.
2019-06-07 13:10:16 -05:00
Field G. Van Zee
abd8a9fa7d Inadvertantly hidden xerbla_() in blastest (#313).
Details:
- Attempted a fix to issue #313, which reports that when building only
  a shared library (ie: static library build is disabled), running the
  BLAS test drivers can fail because those drivers provide their own
  local version of xerbla_() as a clever (albeit still rather hackish)
  way of checking the error codes that result from the individual tests.
  This local xerbla_() function is never found at link-time because the
  BLAS test drivers' Makefile imports BLIS compilation flags via the
  get-user-cflags-for() function, which currently conveys the
  -fvisibility=hidden flag, which hides symbols unless they are
  explicitly annotated for export. The -fvisibility=hidden flag was
  only ever intended for use when building BLIS (not for applications),
  and so the attempted solution here is to omit the symbol export
  flag(s) from get-user-cflags-for() by storing the symbol export
  flag(s) to a new BULID_SYMFLAGS variable instead of appending it
  to the subconfigurations' CMISCFLAGS variable (which is returned by
  every get-*-cflags-for() function). Thanks to M. Zhou for reporting
  this issue and also to Isuru Fernando for suggesting the fix.
- Renamed BUILD_FLAGS to BUILD_CPPFLAGS to harmonize with the newly
  created BUILD_SYMFLAGS.
- Fixed typo in entry for --export-shared flag in 'configure --help'
  text.
2019-05-28 12:49:44 -05:00
Isuru Fernando
044df9506f Test with shared on windows (#306)
Export macros can't support both shared and static at the same time.
When blis is built with both shared and static, headers assume that
shared is used at link time and dllimports the symbols with __imp_
prefix.

To use the headers with static libraries a user can give
-DBLIS_EXPORT= to import the symbol without the __imp_ prefix
2019-03-27 12:39:31 -05:00
Field G. Van Zee
d202d008d5 Renamed --enable-export-all to --export-shared=[].
Details:
- Replaced the existing --enable-export-all / --disable-export-all
  configure option with --export-shared=[public|all], with the 'public'
  instance of the latter corresponding to --disable-export-all and the
  'all' instance corresponding to --enable-export-all. Nothing else
  semantically about the option, or its default, has changed.
2019-03-18 18:18:25 -05:00
Field G. Van Zee
6bfe3812e2 Use -fvisibility=[...] with clang on Linux/BSD/OSX.
Details:
- Modified common.mk to use the -fvisibility=[hidden|default] option
  when compiling with clang on non-Windows platforms (Linux, BSD, OS X,
  etc.). Thanks to Isuru Fernando for pointing out this option works
  with clang on these OSes.
2019-03-15 13:57:49 -05:00
Field G. Van Zee
e095926c64 Support shared lib export of only public symbols.
Details:
- Introduced a new configure option, --enable-export-all, which will
  cause all shared library symbols to be exported by default, or,
  alternatively, --disable-export-all, which will cause all symbols to
  be hidden by default, with only those symbols that are annotated for
  visibility, via BLIS_EXPORT_BLIS (and BLIS_EXPORT_BLAS for BLAS
  symbols), to be exported. The default for this configure option is
  --disable-export-all. Thanks to Isuru Fernando for consulting on
  this commit.
- Removed BLIS_EXPORT_BLIS annotations from frame/1m/bli_l1m_unb_var1.h,
  which was intended for 5a5f494.
- Relocated BLIS_EXPORT-related cpp logic from bli_config.h.in to
  frame/include/bli_config_macro_defs.h.
- Provided appropriate logic within common.mk to implement variable
  symbol visibility for gcc, clang, and icc (to the extend that each of
  these compilers allow).
- Relocated --help text associated with debug option (-d) to configure
  slightly further down in the list.
2019-03-13 17:35:18 -05:00
Isuru Fernando
766769eeb9 Export functions without def file (#303)
* Revert "restore bli_extern_defs exporting for now"

This reverts commit 09fb07c350b2acee17645e8e9e1b8d829c73dca8.

* Remove symbols not intended to be public

* No need of def file anymore

* Fix whitespace

* No need of configure option

* Remove export macro from definitions

* Remove blas export macro from definitions
2019-03-11 19:05:32 -05:00
Field G. Van Zee
bdd46f9ee8 Rewrote reference kernels to use #pragma omp simd.
Details:
- Rewrote level-1v, -1f, and -3 reference kernels in terms of simplified
  indexing annotated by the #pragma omp simd directive, which a compiler
  can use to vectorize certain constant-bounded loops. (The new kernels
  actually use _Pragma("omp simd") since the kernels are defined via
  templatizing macros.) Modest speedup was observed in most cases using
  gcc 5.4.0, which may improve with newer versions. Thanks to Devin
  Matthews for suggesting this via issue #286 and #259.
- Updated default blocksizes defined in ref_kernels/bli_cntx_ref.c to
  be 4x16, 4x8, 4x8, and 4x4 for single, double, scomplex and dcomplex,
  respectively, with a default row preference for the gemm ukernel. Also
  updated axpyf, dotxf, and dotxaxpyf fusing factors to 8, 6, and 4,
  respectively, for all datatypes.
- Modified configure to verify that -fopenmp-simd is a valid compiler
  option (via a new detect/omp_simd/omp_simd_detect.c file).
- Added a new header in which prefetch macros are defined according to
  which compiler is detected (via macros such as __GNUC__). These
  prefetch macros are not yet employed anywhere, though.
- Updated the year in copyrights of template license headers in
  build/templates and removed AMD as a default copyright holder.
2019-01-24 17:23:18 -06:00
Field G. Van Zee
0645f239fb Remove UT-Austin from copyright headers' clause 3.
Details:
- Removed explicit reference to The University of Texas at Austin in the
  third clause of the license comment blocks of all relevant files and
  replaced it with a more all-encompassing "copyright holder(s)".
- Removed duplicate words ("derived") from a few kernels' license
  comment blocks.
- Homogenized license comment block in kernels/zen/3/bli_gemm_small.c
  with format of all other comment blocks.
2018-12-04 14:31:06 -06:00
Isuru Fernando
2acd8dcd23 Fix install path of dll.a 2018-11-21 02:02:18 -06:00
Isuru Fernando
bafe521ed0 Fixes for mingw 2018-11-21 01:54:36 -06:00
Isuru Fernando
f6b924648c Don't use .def for gcc 2018-11-21 01:39:19 -06:00
Field G. Van Zee
6fbc456fb3 Added SALT testing to Travis CI.
Details:
- Modified .travis.yml to automatically employ the simulation of
  application-level threading within the testsuite, with supporting
  changes to common.mk, the top-level Makefile, and
  travis/do_testsuite.sh.
- Added a new pair of input files to testsuite directory with the
  '.salt' suffix (similar to those with the '.fast' suffix) for
  testing application-level threading.
- Updated docs/BuildSystem.md to document the new make targets
  'testblis-salt' and 'checkblis-salt'.
2018-10-25 13:20:25 -05:00
Field G. Van Zee
4ee986f0a7 Added mixed-datatype testing to Travis CI (#271).
Details:
- Modified .travis.yml to automatically test the mixed-datatype support
  of the gemm operation, with supporting changes to common.mk, the
  top-level Makefile, and travis/do_testsuite.sh.
- Added a new pair of input files to testsuite directory with the
  '.mixed' suffix (similar to those with the '.fast' suffix) for testing
  mixed-datatype gemm.
- Updated docs/BuildSystem.md to document the new make targets
  'testblis-md' and 'checkblis-md'.
2018-10-22 14:09:44 -05:00
Field G. Van Zee
ec67679990 Refreshed Windows symbol list; added regen script.
Details:
- Moved windows/build/libblis-symbols.def to build/libblis-symbols.def.
  Updated link commands in common.mk accordingly.
- Added a new script build/regen-symbols.sh that will regenerate the
  libblis-symbols.def file in its new location after building a
  haswell-targeted shared library. Thanks to Isuru Fernando for
  providing the symbol generation command.
- Ran the new script to refresh the symbols file.
2018-10-18 14:27:02 -05:00