Commit Graph

138 Commits

Author SHA1 Message Date
Field G. Van Zee
2553734d1d Merge branch 'master' into rt 2017-10-18 13:46:50 -05:00
Field G. Van Zee
375342799c Removed a duplicate bli_avx512_macros.h header.
Details:
- Removed a duplicate header file that was causing problems during
  installation for the 'knl' configuration. Thanks to Victor Eijkhout
  for reporting this issue.
2017-10-18 13:41:25 -05:00
Field G. Van Zee
453deb2906 Implemented runtime kernel management.
Details:
- Reworked the build system around a configuration registry file, named
  config_registry', that identifies valid configuration targets, their
  constituent sub-configurations, and the kernel sets that are needed by
  those sub-configurations. The build system now facilitates the building
  of a single library that can contains kernels and cache/register
  blocksizes for multiple configurations (microarchitectures). Reference
  kernels are also built on a per-configuration basis.
- Updated the Makefile to use new variables set by configure via the
  config.mk.in template, such as CONFIG_LIST, KERNEL_LIST, and KCONFIG_MAP,
  in determining which sub-configurations (CONFIG_LIST) and kernel sets
  (KERNEL_LIST) are included in the library, and which make_defs.mk files'
  CFLAGS (KCONFIG_MAP) are used when compiling kernels.
- Reorganized 'kernels' directory into a "flat" structure. Renamed kernel
  functions into a standard format that includes the kernel set name
  (e.g. 'haswell'). Created a "bli_kernels_<kernelset>.h" file in each
  kernels sub-directory. These files exist to provide prototypes for the
  kernels present in those directories.
- Reorganized reference kernels into a top-level 'ref_kernels' directory.
  This directory includes a new source file, bli_cntx_ref.c (compiled on
  a per-configuration basis), that defines the code needed to initialize
  a reference context and a context for induced methods for the
  microarchitecture in question.
- Rewrote make_defs.mk files in each configuration so that the compiler
  variables (e.g. CFLAGS) are "stored" (renamed) on a per-configuration
  basis.
- Modified bli_config.h.in template so that bli_config.h is generated with
  #defines for the config (family) name, the sub-configurations that are
  associated with the family, and the kernel sets needed by those
  sub-configurations.
- Deprecated all kernel-related information in bli_kernel.h and transferred
  what remains to new header files named "bli_arch_<configname>.h", which
  are conditionally #included from a new header bli_arch.h. These files
  are still needed to set library-wide parameters such as custom
  malloc()/free() functions or SIMD alignment values.
- Added bli_cntx_init_<configname>.c files to each configuration directory.
  The files contain a function, named the same as the file, that initializes
  a "native" context for a particular configuration (microarchitecture). The
  idea is that optimized kernels, if available, will be initialized into
  these contexts. Other fields will retain pointers to reference functions,
  which will be compiled on a per-configuration basis. These bli_cntx_init_*()
  functions will be called during the initialization of the global kernel
  structure. They are thought of as initializing for "native" execution, but
  they also form the basis for contexts that use induced methods. These
  functions are prototyped, along with their _ref() and _ind() brethren, by
  prototype-generating macros in bli_arch.h.
- Added a new typedef enum in bli_type_defs.h to define an arch_t, which
  identifies the various sub-configurations.
- Redesigned the global kernel structure (gks) around a 2D array of cntx_t
  structures (pointers to cntx_t, actually). The first dimension is indexed
  over arch_t and the inner dimension is the ind_t (induced method) for
  each microarchitecture. When a microarchitecture (configuration) is
  "registered" at init-time, the inner array for that configuration in the
  2D array is initialized (and allocated, if it hasn't been already). The
  cntx_t slot for BLIS_NAT is initialized immediately and those for other
  induced method types are initialized and cached on-demand, as needed. At
  cntx_t registration, we also store function pointers to cntx_init functions
  that will initialize (a) "reference" contexts and (b) contexts for use with
  induced methods. We don't cache the full contexts for reference contexts
  since they are rarely needed. The functions that initialize these two kinds
  of contexts are generated automatically for each targeted sub-configuration
  from cpp-templatized code at compile-time. Induced method contexts that
  need "stage" adjustments can still obtain them via functions in
  bli_cntx_ind_stage.c.
- Added new functions and functionality to bli_cntx.c, such as for setting
  the level-1f, level-1v, and packm kernels, and for converting a native
  context into one for executing an induced method.
- Moved the checking of register/cache blocksize consistency from being cpp
  macros in bli_kernel_macro_defs.h to being runtime checks defined in
  bli_check.c and called from bli_gks_register_cntx() at the time that the
  global kernel structure's internal context is initialized for a given
  microarchitecture/configuration.
- Deprecated all of the old per-operation bli_*_cntx.c files and removed
  the previous operation-level cntx_t_init()/_finalize() invocations.
  Instead, we now query the gks for a suitable context, usually via
  bli_gks_query_cntx().
- Deprecated support for the 3m2 and 3m3 induced methods. (They required
  hackery that I was no longer willing to support.)
- Consolidated the 1e and 1r packm kernels for any given register blocksize
  into a single kernel that will branch on the schema and support packing
  to both formats.
- Added the cntx_t* argument to all packm kernel signatures.
- Deprecated the local function pointer array in all bli_packm_cxk*.c files
  and instead obtain the packm kernel from the cntx_t.
- Added bli_calloc_intl(), which serves as the calloc-equivalent to to
  bli_malloc_intl(). Useful when we wish to allocate and initialize to
  zero/NULL.
- Converted existing cpp macro functions defined in bli_blksz.h, bli_func.h,
  bli_cntx.h into static functions.
2017-10-18 13:29:32 -05:00
Devin Matthews
7dc78b49f9 Add vzeroupper to Intel AVX kernels. 2017-08-15 10:02:25 -05:00
Devin Matthews
7f41bb0a0b PACKDIM_MR=8 didn't work out, but messing with the prefetching helps 2%. 2017-05-26 14:49:31 -04:00
Devin Matthews
d87614af3f Revert "Change PACKDIM_MR (double) for haswell to 8."
This reverts commit 681eec913d.
2017-05-26 14:47:36 -04:00
Devin Matthews
681eec913d Change PACKDIM_MR (double) for haswell to 8. 2017-05-26 12:28:09 -05:00
Field G. Van Zee
f484c6cd43 Whitespace reformatting to armv8a kernels file.
Details:
- Updated formatting of function signature/header in
  kernels/armv8a/3/bli_gemm_opt_4x4.c.
2017-03-17 12:07:27 -05:00
Devin Matthews
0e18f68cf1 Handle k=0 correctly in KNL dgemm ukernel. 2017-02-20 09:03:21 -06:00
Devin Matthews
7d42fc0796 Cast dim_t and inc_t parameters to 64-bit in KNL microkernels. 2017-02-19 21:10:55 -05:00
Francisco Igual
7f31a6307b Fixed missing cntx argument in ARMv8 microkernels. 2016-11-27 14:40:47 +01:00
Field G. Van Zee
b3e58ee303 Reimplemented 4x12 haswell ukernels (real only).
Details:
- Replaced permutation-based implementations in bli_gemm_asm_d4x12.c, which
  defines 4x24 single real and 4x12 double real gemm microkernels, with
  broadcast-based implementations. (The previous microkernel file has been
  moved to an 'old' subdirectory.)
2016-11-23 17:58:26 -06:00
Field G. Van Zee
8a11a2174a Updates to non-default haswell microkernels.
Details:
- Updated s and d microkernels in bli_gemm_asm_d8x6.c to relax alignment
  constraints.
- Added missing c and z microkernels, which are based on the corresponding
  kernels in the d6x8 set.
- This completes the d8x6 set (which may be used for situations when it
  is desirable to have a microkernel with a column preference).
2016-10-31 19:07:55 -05:00
Devin Matthews
11eb7957ab Merge branch 'master' into knl
# Conflicts:
#	frame/thread/bli_thread.h
2016-10-25 13:51:07 -05:00
Devin Matthews
cd5b668183 Don't use %rbp in KNL packing kernels. 2016-10-25 13:49:27 -05:00
Devin Matthews
5117d444f7 Change .align to .p2align in Bulldozer ukernels
Apparently OSX doesn't allow .align directives for >16B, so I've changed these to their .p2align counterparts.
2016-10-24 16:20:47 -05:00
Field G. Van Zee
121c39d455 Added complex gemm micro-kernels for haswell.
Details:
- Defined cgemm (3x8) and zgemm (3x4) micro-kernels for haswell-based
  architectures. As with their real domain brethren, these kernels perfer
  row storage, (though this doesn't affect most users due to high-level
  optimizations in most level-3 operations that induce a transpose to
  whatever storage preference the kernel may have).
2016-09-05 13:11:42 -05:00
Devin Matthews
c8e4ef9395 Add prefetchw to 30x8 kernel. 2016-08-03 16:13:03 -05:00
Devin Matthews
4b5a2f3d6e Merge remote-tracking branch 'origin/knl' into knl
# Conflicts:
#	kernels/x86_64/knl/3/bli_dgemm_opt_24x8.c
2016-08-03 16:09:51 -05:00
Devin Matthews
380736bfe9 Add (new) 30x8 KNL kernel and fix non-scatter prefetch bug. 2016-08-03 16:08:28 -05:00
Devin Matthews
9f52a587de Try prefetchw[t1] instead of regular prefetch for C. 2016-08-03 16:03:53 -05:00
Devin Matthews
8945a1512d This version gets ~1550 GFLOPs on KNL wuth 16x4. 2016-08-03 11:28:24 -05:00
Devin Matthews
6ce4c022eb Switch back to 24x8. I could only squeeze 24.5GFLOP out of 8x24, and scalability is not improved. 2016-07-27 16:26:36 -05:00
Field G. Van Zee
c31b1e7b9d Relax alignment restrictions for sandybridge ukrs.
Details:
- Relaxed the base pointer and leading dimension alignment restrictions
  in the sandybridge gemm microkernels, allowing the use of vmovups/vmovupd
  instead of vmovaps/vmovapd. These change mimic those made to the haswell
  microkernels in e0d2fa0 and ee2c139.
- Updated testsuite modules as well as standalone test drivers in 'test'
  directory to use DBL_MAX as the initial time candidate. Thanks to Devin
  Matthews for suggesting this change.
- Inserted #include "float.h" into bli_system.h (to gain access to DBL_MAX).
- Minor update (vis-a-vis contexts) to driver code in test/3m4m.
2016-07-27 15:58:07 -05:00
Devin Matthews
b8f2b55532 Try an 8x24 kernel for the hell of it. 2016-07-27 15:22:55 -05:00
Devin Matthews
ad89ed2e82 Merge branch 'knl' of github.com:devinamatthews/blis into knl 2016-07-27 11:45:40 -05:00
Devin Matthews
2c9de740ed This version gets ~26GF on one core. 2016-07-27 11:44:54 -05:00
Devin Matthews
81e2b05f31 Add optimized packing kernels for KNL. 2016-07-27 11:39:05 -05:00
Devin Matthews
a7d8ca97b8 All fixed. 2016-07-25 15:15:13 -05:00
Devin Matthews
963d0393b0 Add 24xk pack kernel. 2016-07-25 14:40:53 -05:00
Devin Matthews
117b76739a In the midst of debugging. 2016-07-25 13:53:07 -05:00
Devin Matthews
8c0a4fd1d3 Fix some row/column confusion. 2016-07-25 13:09:24 -05:00
Devin Matthews
c44f9f9693 Simplify displacements -- clang assembler was badly botching EVEX compressed displacements giving false alarms for instruction length. 2016-07-25 12:02:24 -05:00
Devin Matthews
e0cce177cc Minor fixes for 8x24 KNL kernel. 2016-07-25 10:02:25 -05:00
Devin Matthews
65735bbedf Switch to 24x8 kernel, unrolled by 16. 2016-07-24 21:50:32 -05:00
Devin Matthews
45d5dc9717 Add 24x8 "KNC-style" kernel for KNL. 2016-07-24 14:25:26 -05:00
Devin Matthews
8ff2e069c4 Add 4x unrolled variant for KNL microkernel. 2016-07-22 16:22:26 -05:00
Devin Matthews
9cb2ed9b0c Git rid of one RBX update. 2016-07-22 16:10:30 -05:00
Devin Matthews
451bde076f Add some more knobs to twiddle for KNL microkernel. 2016-07-22 15:43:00 -05:00
Devin Matthews
8c6e621c09 Make knl conform to new kernel dir structure. 2016-07-22 15:05:15 -05:00
Devin Matthews
707a2b7fac Somehow forgot the most important microkernel. 2016-07-22 13:49:44 -05:00
Devin Matthews
08f1d6b6fa Use 64-bit intermediate variable for k for architectures that do 64-bit loads in case dim_t is 32-bit. 2016-07-22 13:44:37 -05:00
Devin Matthews
e0d2fa0d83 Relax alignment restrictions for haswell sgemm. 2016-07-22 12:56:51 -05:00
Devin Matthews
ee2c139df6 Remove alignment restrictions on C in haswell kernel. 2016-07-22 12:06:03 -05:00
Field G. Van Zee
c3a4d39d03 Updates to haswell gemm micro-kernels.
Details:
- Added two new sets of [sd]gemm micro-kernels for haswell architectures,
  one that is 4x24/4x12 (s and d) and one that is 6x16/6x8.
- Changed the haswell configuration to use the 6x16/6x8 micro-kernels
  by default.
- Updated various Makefiles, in test, test/3m4m, and testsuite.
2016-05-04 17:22:56 -05:00
Field G. Van Zee
ed7326c836 Added 'restrict' to l1v/l1f code in 'kernels' dir.
Details:
- Added 'restrict' keyword to existing kernel definitions in 'kernels'
  directory. These changes were meant for inclusion in bbb8569.
2016-04-27 14:57:40 -05:00
Field G. Van Zee
eb2f18e484 More compile-time fixes to bgq gemm ukernel code. 2016-04-19 12:50:32 -05:00
Field G. Van Zee
ff84469a45 Applied various compilation fixes to bgq kernels. 2016-04-18 12:29:09 -05:00
Field G. Van Zee
dd62080cea Compile-time fix to bgq l1f kernels.
Details:
- Fixed an old reference to bli_daxpyf_fusefac, which no longer exists,
  by replacing it with the axpyf fusing factor (8), and cleaned up the
  relevant section of config/bgq/bli_kernel.h.
- Removed most of the details of the level-3 kernels from the template
  kernel code in config/template/kernels/3 and replaced it with a
  reference to the relevant kernel wiki maintained on the BLIS github
  website.
2016-04-15 11:15:41 -05:00
Field G. Van Zee
0bd4169ea7 Fixed context-broken dunnington/penryn kernels.
Details:
- Added missing context parameters to several instances where simpler
  kernels, or reference kernels, are called instead of executing the
  main body code contained in the kernel function in question.
- Renamed axpyv and dotv kernel files to use "opt" instead of "int"
  substring, for consistency with level-1f kernels.
2016-04-11 18:08:32 -05:00