Commit Graph

270 Commits

Author SHA1 Message Date
Field G. Van Zee
3ee2bc0f7a Renamed files that distinguish basic/expert APIs.
Details:
- Renamed various files that were previously named according to a
  "with context" or "without context" convention. For example, the
  following files in frame/3 were renamed:

    frame/3/bli_l3_oapi_woc.c -> frame/3/bli_l3_oapi_ba.c
    frame/3/bli_l3_oapi_wc.c  -> frame/3/bli_l3_oapi_ex.c
    frame/3/bli_l3_tapi_woc.c -> frame/3/bli_l3_tapi_ba.c
    frame/3/bli_l3_tapi_wc.c  -> frame/3/bli_l3_tapi_ex.c

  Here, the "ba" is for "basic" and "ex" is for "expert". This new
  naming scheme will make more sense especially if/when additional
  expert parameters are added to the expert APIs (typed and object).
2018-07-07 16:02:16 -05:00
Field G. Van Zee
e88aedae73 Separated expert, non-expert typed APIs.
Details:
- Split existing typed APIs into two subsets of interfaces: one for use
  with expert parameters, such as the cntx_t*, and one without. This
  separation was already in place for the object APIs, and after this
  commit the typed and object APIs will have similar expert and non-
  expert APIs. The expert functions will be suffixed with "_ex" just as
  is the case for expert interfaces in the object APIs.
- Updated internal invocations of typed APIs (functions such as
  bli_?setm() and bli_?scalv()) throughout BLIS to reflect use of the
  new explictly expert APIs.
- Updated example code in examples/tapi to reflect the existence (and
  usage) of non-expert APIs.
- Bumped the major soname version number in 'so_version'. While code
  compiled against a previous version/commit will likely still work
  (since the old typed function symbol names still exist in the new API,
  just with one less function argument) the semantics of the function
  have changed if the cntx_t* parameter the application passes in is
  non-NULL. For example, calling bli_daxpyv() with a non-NULL context
  does not behave the same way now as it did before; before, the
  context would be used in the computation, and now the context would
  be ignored since the interace for that function no longer expects a
  context argument.
2018-07-06 19:14:02 -05:00
Isuru Fernando
331694e524 Fix windows build and enable x86_64 on appveyor (#230)
* Upload artifacts built on appveyor (#228)

* Upload artifacts

* Fix install in appveyor

* Remove windows.h in bli_winsys.c (#229)

Looks like it is unneeded.

* Implemented ARG_MAX hack in configure, Makefile.

Details:
- Added support for --enable-arg-max-hack to configure, which will
  change the behavior of make when building BLIS so that rather than
  invoke the archiver/linker with all of the object files as command
  line arguments, those object files are echoed to a temporary file
  and then the archiver/linker is fed that temporary file via the @
  notation. An example of this can be found in the GNU make docs at
  https://www.gnu.org/software/make/manual/make.html#File-Function
- Thanks to Isuru Fernando for prompting this feature.

* Enable x86_64 and arg-max-hack on appveyor

* Use gas style assembly for clang on windows
2018-07-06 10:07:38 -05:00
Field G. Van Zee
89e178ce38 Merge branch 'master' into dev 2018-07-04 17:51:16 -05:00
Isuru Fernando
14648e1376 Native windows support using clang (#227)
* Add appveyor file

* Build script

* Remove fPIC for now

* copy as

* set CC and CXX

* Change the order of immintrin.h

* Fix testsuite header

* Move testsuite defs to .c

* Fix appveyor file

* Remove fPIC again and fix strerror_r missing bug

* Remove appveyor script

* cd to blis directory

* Fix sleep implementation

* Add f2c_types_win.h

* Fix f2c compilation

* Remove rdp and rename appveyor.yml

* Remove setenv declaration in test header

* set CPICFLAGS to empty

* Fix another immintrin.h issue

* Escape CFLAGS and LDFLAGS

* Fix more ?mmintrin.h issues

* Build x86_64 in appveyor

* override LIBM LIBPTHREAD AR AS

* override pthreads in configure

* Move windows definitions to bli_winsys.h

* Fix LIBPTHREAD default value

* Build intel64 in appveyor for now
2018-07-04 17:48:42 -05:00
Field G. Van Zee
52d80b5f09 Fixed static funcs related to target and exec dts.
Details:
- Fixed incorrect bit shifts in the following static functions:
    bli_obj_set_target_domain()
    bli_obj_set_target_prec()
    bli_obj_set_exec_domain()
    bli_obj_set_exec_prec()
- Fixed incorrect bitmask in bli_dt_proj_to_single_prec().
- Updated bli_obj_real_part() and bli_obj_imag_part() so that it updates
  the target and exec datatypes (in addition to the storage datatypes).
2018-06-29 12:30:44 -05:00
Field G. Van Zee
bd8c55fe26 Added dt_on_output field to auxinfo_t.
Details:
- Added a new field to the auxinfo_t struct that can be used, in theory,
  to request type conversion before the microkernel stores/accumulates
  its microtile back to memory.
- Added the appropriate get/set static functions to bli_type_defs.h.
2018-06-27 15:52:37 -05:00
Devin Matthews
a7166feb10 Finish macroization of assembly ukernels. 2018-06-25 12:09:18 -05:00
Devin Matthews
c81c6f23b9 Fix problem with inc and dec macros. 2018-06-20 15:20:44 -05:00
Devin Matthews
5a63971c82 Merge remote-tracking branch 'upstream/dev' into asm-macros 2018-06-20 14:07:49 -05:00
Devin Matthews
b4d94e54d4 Convert x86 microkernels to assembly macros. 2018-06-20 14:07:24 -05:00
Field G. Van Zee
17928b1c99 Added static funcs bli_dt_domain(), bli_dt_prec().
Details:
- Added definitions of static functions bli_dt_domain()/bli_dt_prec(),
  which extract a dom_t domain or prec_t precision value, respectively,
  from a num_t datatype.
- Changed the return types of bli_obj_domain() and bli_obj_prec() from
  objbits_t to dom_t and prec_t. (Not sure why they were ever set to
  return objbits_t.)
2018-06-19 17:59:03 -05:00
Field G. Van Zee
5f7fbb7115 Static funcs for projecting dt to single/double.
Details:
- Added static functions for projecting a datatype to single precision
  or double precision, both for obj_t's storage datatypes and standalone
  datatypes.
2018-06-19 15:38:55 -05:00
Field G. Van Zee
f317c2e31b Added get/set static funcs for exec dt/dom/prec.
Details:
- Added functions to bli_obj_macro_defs.h to get and set the target
  domain and target precision bits in the obj_t, and also added the
  appropriate support in bli_type_defs.h.
2018-06-19 12:21:23 -05:00
Field G. Van Zee
e88a5b8da8 Implemented castm, castv operations.
Details:
- Implemented castm and castv operations, which behave like copym and
  copyv except where the obj_t operands can be of different datatypes.
  These new operations, however, unlike copym/copyv, do not build upon
  existing level-1v kernels.
- Reorganized projm, projv into a 'proj' subdirectory of frame/base (to
  match the newly added frame/base/cast directory).
- Added new macros to bli_gentfunc_macro_defs.h, _gentprot_macro_defs.h
  that insert GENTFUNC2/GENTPROT2 macros for all non-homogeneous datatype
  combinations. Previously, one had to invoke two additional macros--one
  which mixed domains only and another that included all remaining
  cases--in order to get full type combination coverage.
- Defined a new static function, bli_set_dims_incs_2m(), to aid in the
  setting of various variables in the implementations of bli_??castm().
  This static function joins others like it in bli_param_macro_defs.h.
- Comment update to bli_copysc.h.
2018-06-18 15:56:26 -05:00
Field G. Van Zee
ed20392c50 Added get/set static funcs for exec dt/dom/prec.
Details:
- Added functions to bli_obj_macro_defs.h to get and set the execution
  domain and execution precision bits in the obj_t.
- Added/rearranged a few functions in bli_obj_macro_defs.h.
- Renamed some macros in bli_type_defs.h: EXECUTION -> EXEC.
2018-06-15 16:31:22 -05:00
Field G. Van Zee
87db5c048e Changed usage of virtual microkernel slots in cntx.
Details:
- Changed the way virtual microkernels are handled in the context.
  Previously, there were query routines such as bli_cntx_get_l3_ukr_dt()
  which returned the native ukernel for a datatype if the method was
  equal to BLIS_NAT, or the virtual ukernel for that datatype if the
  method was some other value. Going forward, the context native and
  virtual ukernel slots will both be initialized to native ukernel
  function pointers for native execution, and for non-native execution
  the virtual ukernel pointer will be something else. This allows us
  to always query the virtual ukernel slot (from within, say, the
  macrokernel) without needing any logic in the query routine to decide
  which function pointer (native or virtual) to return. (Essentially,
  the logic has been shifted to init-time instead of compute-time.)
  This scheme will also allow generalized virtual ukernels as a way
  to insert extra logic in between the macrokernel and the native
  microkernel.
- Initialize native contexts (in bli_cntx_ref.c) with native ukernel
  function addresses stored to the virtual ukernel slots pursuant to
  the above policy change.
- Renamed all static functions that were native/virtual-ambiguous, such
  as bli_cntx_get_l3_ukr_dt() or bli_cntx_l3_ukr_prefers_cols_dt()
  pursuant to the above polilcy change. Those routines now use the
  substring "get_l3_vir_ukr" in their name instead of "get_l3_ukr". All
  of these functions were static functions defined in bli_cntx.h, and
  most uses were in level-3 front-ends and macrokernels.
- Deprecated anti_pref bool_t in context, along with related functions
  such as bli_cntx_l3_ukr_eff_dislikes_storage_of(), now that 1m's
  panel-block execution is disabled.
2018-06-12 19:38:37 -05:00
Field G. Van Zee
22aa44ebec Merge branch 'dev' of github.com:flame/blis into dev 2018-06-07 17:42:59 -05:00
Field G. Van Zee
65fae95074 Implemented bli_setrm, _setim, _setrv, _setiv.
Details:
- Defined new wrappers to setm/setv operations in frame/base/bli_setri.c
  that will target only the real or only the imaginary parts of a
  matrix/vector object.
- Updated bli_obj_real_part() so that the complex-specific portions of
  the function are not executed if the object is real.
- Defined bli_obj_imag_part().
  - Caveat: If bli_obj_imag_part() is called on a real object, it does
    nothing, leaving the destination object untouched. The caller must
    take care to only call the function on complex objects.
- Reordered some of the static functions in bli_obj_macro_defs.h related
  to aliasing.
2018-06-07 17:41:09 -05:00
Field G. Van Zee
b65d0b841b Fixed bug in bli_dt_proj_to_complex().
Details:
- Fixed a bug identical to the one fixed in 0a4a27e, except this time in
  the bli_obj_param_defs.h header file. It looks like the only consumers
  of this static function were in bli_l0_oapi.c, and so this may not have
  been manifesting (yet).
2018-06-07 14:38:41 -05:00
Field G. Van Zee
0a4a27e1a4 Defined/implemented bli_projm().
Details:
- Defined a new operation in frame/base/bli_proj.c, bli_projm(), which
  behaves like bli_copym(), except that operands a and b are allowed to
  contain data of differing domains (e.g. a is real while b is complex,
  or vice versa). The file is named bli_proj.c, rather than bli_projm.c,
  with the intention that a 'v' vector version of the function may be
  added to the same file (at some point in the future).
- Added supporting bli_check_*() functions in bli_check.c to confirm
  consistent precisions between to datatypes/objects, as well as the
  appropriate error message in bli_error.c and a new error code in
  bli_type_defs.h.
- Wrote a bli_projm_check() function to go along with bli_projm().
- Defined static function bli_obj_real_part() in bli_obj_macro_defs.h,
  which will initialize an obj_t alias to the real part of the source
  object.
- Fixed a bug in the static function bli_dt_proj_to_complex(), found
  in bli_param_macro_defs.h. Thankfully, there were no calls to the
  function to produce buggy behavior.
2018-06-06 19:02:29 -05:00
Field G. Van Zee
22deef2f54 Support alternative gemm implementation sandboxes.
Detail:
- configure:
  - add support for --enable-sandbox=NAME to configure script, where NAME
    is a subdirectory of a new 'sandbox' directory that contains an
    alternative implementation of gemm. (For now, only implementations of
    gemm may be provided via a sandbox.);
  - add support for C++ compiler. C++ compilers are handled in a manner
    similar to that of C compilers, in that a default search order is
    used, and that CXX is searched for first, if the variable is set. In
    practice, the C++ compiler that is selected should correspond to the
    selected C compiler. (Example: If gcc is selected for C, g++ should
    be selected for C++.) The result of the search is output to config.mk
    via build/config.mk.in. NOTE: The use of C++ in BLIS is still
    hypothetical, but may eventually move to being experimental. This
    support was intended only for use of C++ within a gemm sandbox.
- build/config.mk.in:
  - define SANDBOX variable containing sandbox subdirectory name.
- build/bli_config.in:
  - define either of the BLIS_ENABLE_SANDBOX or BLIS_DISABLE_SANDBOX
    macros in bli_config.h.
- common.mk:
  - include makefile fragments that were propagated into the specified
    sandbox subdirectory;
  - generate different CFLAGS for sandboxes, as well as a separate
    CXXFLAGS variable for sandboxes when C++ source files are compiled;
  - isolate into a single location lists of file suffixes for various
    purposes.
  - reorganized/clean up code related to identifying header files and
    paths.
- Makefile:
  - generate object filepaths for and compile source code files found in
    sandbox sub-directory;
  - remove makefile fragments placed in sandbox sub-directory (cleanmk);
  - various other cleanups.
- Added .cc, .cpp, and .cxx to list of suffixes of files to recognize in
  makefile fragments (via build/gen-make-frags/suffix_list).
- Updated blis.h to conditionally #include bli_sandbox.h (via a new file,
  bli_sbox.h), which each sandbox is assumed to use for any type
  definitions and function prototypes it wishes to export out to blis.h.
- Conditionally disable bli_gemmnat() implementation in frame/3 when
  BLIS_ENABLE_SANDBOX is defined.
2018-05-24 14:28:55 -05:00
Field G. Van Zee
5140ee3424 Updated types of bli_is_[un]aligned_to() functions.
Details:
- Changed the void* arguments of the following static functions:
    bli_is_aligned_to()
    bli_is_unaligned_to()
    bli_offset_past_alignment()
  to siz_t, and the return type of bli_offset_past_alignment() from
  guint_t to siz_t. This allows for more versatile usage of these
  functions (e.g. when aligning both pointers and leading dimension).
- Updated all invocations of these functions, mostly in kernels/penryn
  but also in kernels/bgq, to include explicit typecasts to siz_t when
  pointer arguments are passed in.
- Thanks to Devin Matthews for pointing out this potential bug (via issue
  #211).
- Deleted a few trailing spaces in various penryn kernels.
- Removed duplicate instances of the words "derived" and "THEORY" from
  various kernel license headers, likely from a malformed recursive sed
  performed long ago.
2018-05-23 16:56:14 -05:00
Field G. Van Zee
962a706a6f Updated LICENSE file to mention HP Enterprise.
Details:
- Added HP Enterprise to the LICENSE file. Previously, only the source
  files touched by HPE contained the corresponding copyright notices.
  (This oversight was unintentional.)
- Updated file-level copyright notices to include a comma, to match
  the formatting used for UT and AMD copyrights.
2018-05-18 18:19:40 -05:00
Field G. Van Zee
12dfa95164 Fixed a bug in determining default integer size.
Details:
- Fixed a bug that would cause configurations to inadvertantly define
  their integers to be 32 bits when those environments actually call for
  64-bit integers. While either BLIS_ARCH_64 or BLIS_ARCH_32 is defined
  in bli_system.h (based on whether preprocessor macros such as __x86_64
  or __aarch64__ are defined by the environment), bli_system.h was being
  #included *after* bli_config_macro_defs.h, in which the BLIS_ARCH_64
  macro was used to choose an integer type size in the event that
  BLIS_INT_TYPE_SIZE was not already defined by configure via
  bli_config.h. And due to the structure of the cpp code in that file,
  the 32-bit integer case was being chosen. Thanks to Francisco Igual
  and Devangi Parikh for their help in isolating this bug.
- Moved the #include of hbwmalloc.h and related preprocessor code to
  bli_kernel_macro_defs.h to facilitate the reshuffling of the #include
  for bli_system.h in blis.h.
2018-05-16 12:46:57 -05:00
Field G. Van Zee
4fb353bd90 Merge branch 'master' into dev 2018-05-13 17:50:51 -05:00
Field G. Van Zee
bf03503059 Renamed (shortened) a few build system variables.
Details:
- Renamed the following variables in config.mk (via build/config.mk.in):
    BLIS_ENABLE_VERBOSE_MAKE_OUTPUT -> ENABLE_VERBOSE
    BLIS_ENABLE_STATIC_BUILD        -> MK_ENABLE_STATIC
    BLIS_ENABLE_SHARED_BUILD        -> MK_ENABLE_SHARED
    BLIS_ENABLE_BLAS2BLIS           -> MK_ENABLE_BLAS
    BLIS_ENABLE_CBLAS               -> MK_ENABLE_CBLAS
    BLIS_ENABLE_MEMKIND             -> MK_ENABLE_MEMKIND
  and also renamed all uses of these variables in makefiles and makefile
  fragments. Notice that we use the "MK_" prefix so that those variables
  can be easily differentiated (such as via grep) from their "BLIS_" C
  preprocessor macro counterparts.
- Other whitespace changes to build/config.mk.in.
- Renamed the following C preprocessor macros in bli_config.h (via
  build/bli_config.h.in):
    BLIS_ENABLE_BLAS2BLIS        -> BLIS_ENABLE_BLAS
    BLIS_DISABLE_BLAS2BLIS       -> BLIS_DISABLE_BLAS
    BLIS_BLAS2BLIS_INT_TYPE_SIZE -> BLIS_BLAS_INT_TYPE_SIZE
  and also renamed all relevant uses of these macros in BLIS source
  files.
- Renamed "blas2blis" variable occurrences in configure to "blas", as
  was done in build/config.mk.in and build/bli_config.h.in.
- Renamed the following functions in frame/base/bli_info.c:
    bli_info_get_enable_blas2blis() -> bli_info_get_enable_blas()
    bli_info_get_blas2blis_int_type_size()
                                    -> bli_info_get_blas_int_type_size()
- Remove bli_config.h during 'make cleanh' target of top-level Makefile.
2018-05-08 16:49:22 -05:00
Field G. Van Zee
4b36e85be9 Converted function-like macros to static functions.
Details:
- Converted most C preprocessor macros in bli_param_macro_defs.h and
  bli_obj_macro_defs.h to static functions.
- Reshuffled some functions/macros to bli_misc_macro_defs.h and also
  between bli_param_macro_defs.h and bli_obj_macro_defs.h.
- Changed obj_t-initializing macros in bli_type_defs.h to static
  functions.
- Removed some old references to BLIS_TWO and BLIS_MINUS_TWO from
  bli_constants.h.
- Whitespace changes in select files (four spaces to single tab).
2018-05-08 14:26:30 -05:00
Field G. Van Zee
75d0d1057d Renamed various datatype-related macros/functions.
Details:
- Renamed the following macros in bli_obj_macro_defs.h and
  bli_param_macro_defs.h:
  - bli_obj_datatype()                 -> bli_obj_dt()
  - bli_obj_target_datatype()          -> bli_obj_target_dt()
  - bli_obj_execution_datatype()       -> bli_obj_exec_dt()
  - bli_obj_set_datatype()             -> bli_obj_set_dt()
  - bli_obj_set_target_datatype()      -> bli_obj_set_target_dt()
  - bli_obj_set_execution_datatype()   -> bli_obj_set_exec_dt()
  - bli_obj_datatype_proj_to_real()    -> bli_obj_dt_proj_to_real()
  - bli_obj_datatype_proj_to_complex() -> bli_obj_dt_proj_to_complex()
  - bli_datatype_proj_to_real()        -> bli_dt_proj_to_real()
  - bli_datatype_proj_to_complex()     -> bli_dt_proj_to_complex()
- Renamed the following functions in bli_obj.c:
  - bli_datatype_size()                -> bli_dt_size()
  - bli_datatype_string()              -> bli_dt_string()
  - bli_datatype_union()               -> bli_dt_union()
- Removed a pair of old level-1f penryn intrinsics kernels that were no
  longer in use.
2018-04-30 14:57:33 -05:00
Field G. Van Zee
d6ab25a323 Add setijm, getijm operations.
Details:
- Added bli_setgetijm.c, which defines bli_setijm(), bli_getijm(), and
  related functions that can be used to read and write individual
  elements of an obj_t.
- Defined a new function, bli_obj_create_conf_to(), in bli_obj.c that will
  create a new object with dimensions conformal to an existing object.
  Transposition and conjugation states on the existing object are ignored,
  as are structure and uplo fields.
- Defined a new function, bli_datatype_string(), in bli_obj.c that returns
  a char* to a string representation of the name of each num_t datatype.
  For example, BLIS_DOUBLE is "double" and BLIS_DCOMPLEX is "dcomplex".
  BLIS_INT is included (as "int"), but BLIS_CONSTANT is not, and thus is
  not a valid input argument to bli_datatype_string().
- Added calls to bli_init_once() to various functions in bli_obj.c, the
  most important of which was bli_obj_create_without_buffer().
- Removed unintended/extra newline from the end of printv output.
- Whitespace changes to
  - frame/base/bli_machval.c
  - frame/base/bli_machval.h
  - frame/0/copysc/bli_copysc.c
- Trivial changes to README.md and common.mk.
2018-04-24 18:43:03 -05:00
Field G. Van Zee
60366a3fab Updates to knl kernels and related code.
Details:
- Imported the 24x16 knl sgemm microkernel (and its corresonding spackm
  kernel) from TBLIS and enabled its use in the knl sub-config. Also
  Added sgemm microkernel prototype to bli_kernels_knl.h.
- Updated dgemm and dpackm microkernels from TBLIS, which included an
  important change regarding the offsets array (changed from extern
  declaration to static declaration/definition).
- Activated use of level-1v and -1f zen kernels in skx and knl
  sub-configs.
- Removed some old macros no longer needed in bli_family_skx.h now that
  libmemkind support exists in configure.
- Moved bli_avx512_macros.h to frame/include and adjusted #includes in
  skx and knl kernels accordingly.
- Moved unused kernels in kernels/knl/3 to kernels/knl/3/other
  directory.
- Fixed a minor bug in the 'make' output per compile when verboseness
  is not turned on. The rule-generating function 'make-kernel-rule' was
  previously passing in the name of the config, rather than the name of
  the kernel set returned by get-config-for-kset, which could give
  misleading information to the user when the kconfig_map mapped a
  kernel set to a sub-configuration that did not share the same name.
  (This didn't affect the CFLAGS that were actually used.)
- Updated test/3m4m/Makefile, removing acml targets and renaming the
  remaining targets.
2018-04-16 18:46:21 -05:00
Field G. Van Zee
6a628184f6 Fixed a memkind-related compile-time bug on knl.
Details:
- Fixed a compile-time error that occurred due to the fact that
  BLIS_ENABLE_MEMKIND, defined in bli_config.h, was not being defined
  soon enough to be used in bli_system.h where it is needed to determine
  whether hbwmalloc.h should be #included. bli_system.h is now included
  after bli_config.h (and bli_config_macro_defs.h). Thanks to Dave Love
  for reporting this issue.
- Tweaked the language used by configure to echo the status of the
  --with[out]-memkind option.
2018-03-26 14:48:16 -05:00
Field G. Van Zee
22289ad23c Added build system support for libmemkind.
Details:
- Added support for libmemkind to configure. configure attempts to
  detect the presence of libmemkind by compiling a small program
  containing #include <hbwmalloc.h> and a call to hbw_malloc(). If
  successful, it is assumed that libmemkind is present and available.
  If present, use of libmemkind is enabled by default, and otherwise
  use is disabled by default. If libmemkind is present, the user may
  explicitly disable use of the library by running configure with the
  --without-memkind option. Furthermore, a configuration may disable
  libmemkind, perhaps conditional on some aspect of the build system,
  by including -DBLIS_DISABLE_MEMKIND in the configuration's CPPROCFLAGS
  make variable and setting the BLIS_ENABLE_MEMKIND makefile variable,
  set in config.mk, to 'no'. (The knl configuration makes use of this
  latter feature; see below.)
- If enabled at configure-time, bli_system.h will #include <hbwmalloc.h>
  and bli_kernel_macro_defs.h will define BLIS_MALLOC_POOL and
  BLIS_FREE_POOL to use hbw_malloc() and hbw_free(), respectively.
- Deprecated explicit use of BLIS_NO_HBWMALLOC in
  config/knl/bli_family.knl.h and replaced use of -DBLIS_NO_HBWMALLOC in
  config/knl/make_defs.mk with -DBLIS_DISABLE_MEMKIND, which overrides
  (#undefs) the definition of BLIS_ENABLE_MEMKIND in bli_system.h, if it
  would otherwise be defined. Also, set the BLIS_ENABLE_MEMKIND makefile
  variable to 'no'.
- common.mk now adds libmemkind to LDFLAGS if libmemkind is enabled.
2018-03-22 18:21:30 -05:00
Field G. Van Zee
cb7ed90752 Convert op names to uppercase before calling xerbla_().
Details:
- Defined a new function, bli_string_mkupper(), that calls toupper() on
  every non-NULL character in a string.
- Call bli_string_mkupper() prior to calling xerbla_() in the level-2/-3
  BLAS _check() macros. This prevents the BLAS testsuite from complaining
  that the operation name (e.g. "dgemm") does not match the expected
  value (e.g. "DGEMM"). Thanks to Dave Love for reporting this issue.
2018-03-16 13:05:56 -05:00
Field G. Van Zee
1a3031740f Updates to ARM hardware detection support.
Details:
- Updated/clarified the ARM preprocessor macro branch of bli_cpuid.c.
  Going forward, cortexa57 (64-bit), cortexa15, and cortexa9 (32-bit)
  sub-configurations are supported. However, the functions that detect
  features specific to a15 and a9 are identical, and since a15 is tested
  first, it will always be chosen for arm32 hardware (even if both
  sub-configurations were enabled at configure-time and the library is
  linked and run on an a9). Thus, more work needs to be done to
  distinguish these two.
- Added cpp guard around x86_64 portions of bli_cpuid.c. Now, either
  the x86_64 or ARM code will be compiled (or neither, if neither
  environment is detected).
- In bli_arch_query_id(), call bli_cpuid_query_id() when the
  BLIS_FAMILY_ARM64 or BLIS_FAMILY_ARM32 macros are defined.
- Added arm64 and arm32 configuration families to config_registry.
- Added a note to the arch_t typedef enum in bli_type_defs.h reminding
  the developer to update the string array in bli_arch.c whenever new
  enum values are added or existing values are reordered.
2018-03-13 16:04:40 -05:00
Field G. Van Zee
16813335bd Merge branch 'amd' into rt
Details:
- Merged contributions made by AMD via 'amd' branch (see summary below).
  Special thanks to AMD for their contributions to-date, especially with
  regard to intrinsic- and assembly-based kernels.
- Added column storage output cases to microkernels in
  bli_gemm_zen_asm_d6x8.c and bli_gemmtrsm_l_zen_asm_d6x8.c. Even with
  the extra cost of transposing the microtile in registers, this is
  much faster than using the general storage case when the underlying
  matrix is column-stored.
- Added s and d assembly-based zen gemmtrsm_u microkernel (including
  column storage optimization mentioned above).
- Updated zen sub-configuration to reflect presence of new native
  kernels.
- Temporarily reverted zen sub-configuration's level-3 cache blocksizes
  to smaller haswell values.
- Temporarily disabled small matrix handling for zen configuration
  family in config/zen/bli_family_zen.h.
- Updated zen CFLAGS according to changes in 1e4365b.
- Updated haswell microkernels such that:
  - only one vzeroupper instruction is called prior to returning
  - movapd/movupd are used in leiu of movaps/movups for double-real
    microkernels. (Note that single-real microkernels still use
    movaps/movups.)
- Added kernel prototypes to kernels/zen/bli_kernels_zen.h, which is
  now included via frame/include/bli_arch_config.h.
- Minor updates to bli_amaxv_ref.c (and to inlined "test" implementation
  in testsuite/src/test_amaxv.c).
- Added early return for alpha == 0 in bli_dotxv_ref.c.
- Integrated changes from f07b176, including a fix for undefined
  behavior when executing the 1m method under certain conditions.
- Updated config_registry; no longer need haswell kernels for zen
  sub-configuration.
- Tweaked marginal and pass thresholds for dotxf.
- Reformatted level-1v, -1f, and -3 amd kernels and inserted additional
  comments.
- Updated LICENSE file to explicitly mention that parts are copyright
  UT-Austin and AMD.
- Added AMD copyright to header templates in build/templates.

Summary of previous changes from 'amd' branch.
- Added s and d assembly-based zen gemm microkernels (d6x8 and d8x6) and
  s and d assembly-based zen gemmtrsm_l microkernels (d6x8).
- Added s and d intrinsics-based zen kernels for amaxv, axpyv, dotv, dotxv,
  and scalv, with extra-unrolling variants for axpyv and scalv.
- Added a small matrix handler to bli_gemm_front(), with the handler
  implemented in kernels/zen/3/bli_gemm_small_matrix.c.
- Added additional logic to sumsqv that first attempts to compute the
  sum of the squares via dotv(). If there is a floating-point exception
  (FE_OVERFLOW), then the previous (numerically conservative) code is
  used; otherwise, the result of dotv() is square-rooted and stored as
  the result. This new implementation is only enabled when FE_OVERFLOW
  is #defined. If the macro is not #defined, then the previous
  implementation is used.
- Added axpyv and dotv standalone test drivers to test directory.
- Added zen support to old cpuid_x86.c driver in build/auto-detect/old.
- Added thread-local and __attribute__-related macros to bli_macro_defs.h.
2018-02-21 17:43:32 -06:00
Field G. Van Zee
0ce5e19c31 Reimplemented configure-time hardware detection.
Details:
- Reimplemented the hardware detection functionality invoked when running
  "./configure auto". Previously, a standalone script in build/auto-detect
  that used CPUID was used. However, the script attempted to enumerate all
  models for each microarchitecture supported. The new approach recycles
  the same code used for runtime hardware detection introduced in 2c51356.
  This has two immediate benefits. First, it reduces and consolidates the
  code required to detect microarchitectures via the CPUID instruction.
  Second, it provides an indirect way of testing at configure-time the
  code that is used to detect hardware at runtime. This code is (a) only
  activated when targeting a configuration family (such as intel64 or
  amd64) at configure-time and (b) somewhat difficult to test in
  practice, since it relies on having access to older microarchitectures.
- The above change required placing conditional cpp macro blocks in
  bli_arch.c and bli_cpuid.c which either #include "blis.h" or #include
  a bare-bones set of headers that does not rely on the presence of a
  bli_config.h header. This is needed because bli_config.h has not been
  created yet when configure-time auto-detection takes places.
- Defined a new function in bli_arch.c, bli_arch_string(), which takes
  an arch_t id and returns a pointer to a string that contains the
  lowercase name of the corresponding microarchitecture. This function
  is used by the auto-detection script to printf() the name of the
  sub-configuration corresponding to the detected hardware.
2017-12-23 15:32:03 -06:00
Field G. Van Zee
9804adfd40 Added option to disable pack buffer memory pools.
Details:
- Added a new configure option, --[en|dis]able-packbuf-pools, which will
  enable or disable the use of internal memory pools for managing buffers
  used for packing. When disabled, the function specified by the cpp
  macro BLIS_MALLOC_POOL is called whenever a packing buffer is needed
  (and BLIS_FREE_POOL is called when the buffer is ready to be released,
  usually at the end of a loop). When enabled, which was the status quo
  prior to this commit, a memory pool data structure is created and
  managed to provide threads with packing buffers. The memory pool
  minimizes calls to bli_malloc_pool() (i.e., the wrapper that calls
  BLIS_MALLOC_POOL), but does so through a somewhat more complex
  mechanism that may incur additional overhead in some (but not all)
  situations. The new option defaults to --enable-packbuf-pools.
- Removed the reinitialization of the memory pools from the level-3
  front-ends and replaced it with automatic reinitialization within the
  pool API's implementation. This required an extra argument to
  bli_pool_checkout_block() in the form of a requested size, but hides
  the complexity entirely from BLIS. And since bli_pool_checkout_block()
  is only ever called within a critical section, this change fixes a
  potential race condition in which threads using contexts with different
  cache blocksizes--most likely a heterogeneous environment--can check
  out pool blocks that are too small for the submatrices it wishes to
  pack. Thanks to Nisanth Padinharepatt for reporting this potential
  issue.
- Removed several functions in light of the relocation of pool reinit,
  including bli_membrk_reinit_pools(), bli_memsys_reinit(),
  bli_pool_reinit_if(), and bli_check_requested_block_size_for_pool().
- Updated the testsuite to print whether the memory pools are enabled or
  disabled.
2017-12-21 19:22:57 -06:00
Field G. Van Zee
83316485ce Simplified/fixed self-initialization.
Details:
- Fixed a race condition in self-initialization whereby the bli_is_init
  static variable could be erroneously read as TRUE by thread 1 while
  thread 0 is still executing bli_init_apis(), thus allowing thread 1 to
  use the library before it is actually ready. Thanks to to Minh Quan Ho
  and Devin Matthews for pointing out this issue.
- Part of the solution to the aforementioned race condition was involved
  replacing the runtime initialization of the global scalar constants
  (e.g., BLIS_ONE, BLIS_ZERO, etc.) in bli_const.c with a static
  initialization of those same constants. This eliminates the need for
  bli_const_init() altogether. (The static initialization is made concise
  via preprocess macros.)
- Defined bli_gks_query_cntx_noinit(), which behaves just like
  bli_gks_query_cntx(), except that it does not call bli_init_once(). This
  function is called in lieu of bli_gks_query_cntx() in bli_ind_init() and
  bli_memsys_init() so as to not result in any recursion into
  bli_init_once().
- Removed BLIS_ONE_HALF, BLIS_MINUS_ONE_HALF global scalar constants.
  They have no use in BLIS or its test products, and we have little reason
  to believe they are used by others.
- Removed testsuite/out file, which was accidentally committed as part
  of 70640a3.
2017-12-13 14:14:50 -06:00
Field G. Van Zee
513ef4d040 Various typecasting fixes, mis-typed enums, etc.
Details:
- Fixed implicit typecasting of conj_t to trans_t in bli_[un]packm_cxk.c.
- Properly typecast integer arguments to match format specifier in various
  calls to printf() in bli_l3_thrinfo.c, bli_cntx.c, bli_pool.c, and
  bli_util_oapi.c.
- Fixed "unsigned less-than-comparison with zero" checks in bli_check.c,
  bli_cntx.h.
- Fixed mis-typed enums in bli_cntx.c (e.g., l1mkr_t that should have been
  l1fkr_t or l1vkr_t).
- Fixed instances of opid_t value BLIS_GEMM that should have been l3ukr_t
  value BLIS_GEMM_UKR in bli_cntx_ref.c.
- NOTE: These issues were identified via compiler warnings when building
  BLIS with clang on a rather old installation of OS X:
    $ clang --version
    Apple LLVM version 5.0 (clang-500.2.79) (based on LLVM 3.3svn)
    Target: x86_64-apple-darwin15.2.0
    Thread model: posix
2017-12-11 12:35:59 -06:00
Nisanth M P
3a44118398 Added AMD copyright line to the changed files in last 3 commits
Change-Id: I37d5dbbbe1b199e07529610a5e9cc9e49d067c66
2017-12-11 12:41:02 +05:30
Nisanth M P
c669716790 Adding __attribute__((constructor/destructor)) for CLANG case.
CLANG supports __attribute__, but its documentation doesn't
mention support for constructor/destructor. Compiling with
clang and testing shows that it does support this.

Change-Id: Ie115b20634c26bda475cc09c20960d687fb7050b
2017-12-11 12:12:29 +05:30
Nisanth M P
9c0a3c4c02 Thread Safety: Move bli_init() before and bli_finalize() after main()
BLIS provides APIs to initialize and finalize its global context.
One application thread can finalize BLIS, while other threads
in the application are stil using BLIS.

This issue can be solved by removing bli_finalize() from API.
One way to do this is by getting bli_finalize() to execute by default
after application exits from main().

GCC supports this behaviour with the help of __attribute__((destructor))
added to the function that need to be executed after main exits.

Similarly bli_init() can be made to run before application enters main()
so that application need not call it.

Change-Id: I7ce6cfa28b384e92c0bdf772f3baea373fd9feac
2017-12-11 12:12:29 +05:30
Nisanth M P
83f31253eb Thread safety: Make the global induced method status array local to thread
BLIS retains a global status array for induced methods, and provides
APIs to modify this state during runtime. So, one application thread
can modify the state, before another starts the corresponding
BLIS operation.

This patch solves this issue by making the induced method status array
local to threads.

Change-Id: Iff59b6f473771344054c010b4eda51b7aa4317fe
2017-12-11 12:12:29 +05:30
Field G. Van Zee
7628de3f76 Removed trailing enum commas from bli_type_defs.h.
Details:
- Removed trailing commas from enums in bli_type_defs.h. Thanks to
  Erling Andersen for pointing out this inconsistency and suggesting
  the change.
2017-12-11 12:12:29 +05:30
Field G. Van Zee
95adc43d80 Moved 'family' field from cntx_t to cntl_t.
Details:
- Removed the family field inside the cntx_t struct and re-added it to the
  cntl_t struct. Updated all accessor functions/macros accordingly, as well
  as all consumers and intermediaries of the family parameter (such as
  bli_l3_thread_decorator(), bli_l3_direct(), and bli_l3_prune_*()). This
  change was motivated by the desire to keep the context limited, as much
  as possible, to information about the computing environment. (The family
  field, by contrast, is a descriptor about the operation being executed.)
- Added additional functions to bli_blksz_*() API.
- Added additional functions to bli_cntx_*() API.
- Minor updates to bli_func.c, bli_mbool.c.
- Removed 'obj' from bli_blksz_*() API names.
- Removed 'obj' from bli_cntx_*() API names.
- Removed 'obj' from bli_cntl_*(), bli_*_cntl_*() API names. Renamed routines
  that operate only on a single struct to contain the "_node" suffix to
  differentiate with those routines that operate on the entire tree.
- Added enums for packm and unpackm kernels to bli_type_defs.h.
- Removed BLIS_1F and BLIS_VF from bszid_t definition in bli_type_defs.h.
  They weren't being used and probably never will be.
2017-12-11 12:12:29 +05:30
Field G. Van Zee
8f739cc847 Added API to set mt environment variables.
Details:
- Renamed bli_env_get_nway() -> bli_thread_get_env().
- Added bli_thread_set_env() to allow setting environment variables
  pertaining to multithreading, such as BLIS_JC_NT or BLIS_NUM_THREADS.
- Added the following convenience wrapper routines:
    bli_thread_get_jc_nt()
    bli_thread_get_ic_nt()
    bli_thread_get_jr_nt()
    bli_thread_get_ir_nt()
    bli_thread_get_num_threads()
    bli_thread_set_jc_nt()
    bli_thread_set_ic_nt()
    bli_thread_set_jr_nt()
    bli_thread_set_ir_nt()
    bli_thread_set_num_threads()
- Added #include "errno.h" to bli_system.h.
- This commit addresses issue #140.
- Thanks to Chris Goodyer for inspiring these updates.
2017-12-11 12:08:58 +05:30
Marat Dukhan
1016383307 Fix Emscripten builds 2017-12-11 12:08:58 +05:30
Field G. Van Zee
abaeaa68ea Fixed a bug in norm1v, norm1m.
Details:
- Fixed a bug that manifested as improperly-computed 1-norm for vectors
  and matrices. This is one of the few operations in BLIS that does not
  have its own test module within the testsuite, hence why it went
  undetected for so long. The bad 1-norms were being used to normalize
  matrices in the testsuite after initialization, which led to some
  matrices containing a combination of "large" and "small" values. This
  tended to push the residuals computed after each test away from zero.
  In some cases, they were off *just* enough to the testsuite to label
  it a "failure". Many thanks to Jeff Hammond for reporting this bug.
  (Wonky details: the bug was due to improperly-defined level-0 scalar
  macros for abval2, an operation that computes the absolute square,
  or complex magnitude/modulus. Certain complex domain instances of
  abval2 were being incorrectly defined in terms of real-only solutions,
  leading to bad results. This level-0 operation forms the basis of
  norm1v/norm1m. absq2 was also affected, but almost nothing uses
  this operation.)
2017-12-11 12:05:22 +05:30
Field G. Van Zee
4f61528d56 Added 1m-specific APIs for bp, pb gemm algorithms.
Details:
- Defined bli_gemmbp_cntl_create(), bli_gemmpb_cntl_create(), with the
  body of bli_gemm_cntl_create() replaced with a call to the former.
- Defined bli_cntl_free_w_thrinfo(), bli_cntl_free_wo_thrinfo(). Now,
  bli_cntl_free() can check if the thread parameter is NULL, and if so,
  call the latter, and otherwise call the former.
- Defined bli_gemm1mbp_cntx_init(), bli_gemm1mpb_cntx_init(), both in
  terms of bli_gemm1mxx_cntx_init(), which behaves the same as
  bli_gemm1m_cntx_init() did before, except that an extra bool parameter
  (is_pb) is used to support both bp and pb algorithms (including to
  support the anti-preference field described below).
- Added support for "anti-preference" in context. The anti_pref field,
  when true, will toggle the boolean return value of routines such as
  bli_cntx_l3_ukr_eff_prefers_storage_of(), which has the net effect of
  causing BLIS to transpose the operation to achieve disagreement (rather
  than agreement) between the storage of C and the micro-kernel output
  preference. This disagreement is needed for panel-block implementations,
  since they induce a transposition of the suboperation immediately before
  the macro-kernel is called, which changes the apparent storage of C. For
  now, anti-preference is used only with the pb algorithm for 1m (and not
  with any other non-1m implementation).
- Defined new functions,
    bli_cntx_l3_ukr_eff_prefers_storage_of()
    bli_cntx_l3_ukr_eff_dislikes_storage_of()
    bli_cntx_l3_nat_ukr_eff_prefers_storage_of()
    bli_cntx_l3_nat_ukr_eff_dislikes_storage_of()
  which are identical to their non-"eff" (effectively) counterparts except
  that they take the anti-preference field of the context into account.
- Explicitly initialize the anti-pref field to FALSE in
  bli_gks_cntx_set_l3_nat_ukr_prefs().
- Added bli_gemm_ker_var1.c, which implements a panel-block macro-kernel
  in terms of the existing block-panel macro-kernel _ker_var2(). This
  technique requires inducing transposes on all operands and swapping
  the A and B.
- Changed bli_obj_induce_trans() macro so that pack-related fields are
  also changed to reflect the induced transposition.
- Added a temporary hack to bli_l3_3m4m1m_oapi.c that allows us to easily
  specify the 1m algorithm (block-panel or panel-block).
- Renamed the following cntx_t-related macros:
    bli_cntx_get_pack_schema_a() -> bli_cntx_get_pack_schema_a_block()
    bli_cntx_get_pack_schema_b() -> bli_cntx_get_pack_schema_b_panel()
    bli_cntx_get_pack_schema_c() -> bli_cntx_get_pack_schema_c_panel()
  and updated all instantiations. Also updated the field names in the
  cntx_t struct.
- Comment updates.
2017-12-11 11:58:33 +05:30