Commit Graph

2030 Commits

Author SHA1 Message Date
Satish Balay
d560d105d2 OSX: specify the full path to the location of libblis.dylib (#390)
* OSX: specify the full path to the location of libblis.dylib so that it can be found at runtime

Before this change:

Appication gives runtime error [when linked with blis]
dyld: Library not loaded: libblis.3.dylib

balay@kpro lib % otool -L libblis.dylib
libblis.dylib:
        libblis.3.dylib (compatibility version 0.0.0, current version 0.0.0)
        /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1281.0.0)

After this change:
balay@kpro lib % otool -L libblis.dylib
libblis.dylib:
	/Users/balay/petsc/arch-darwin-c-debug/lib/libblis.3.dylib (compatibility version 0.0.0, current version 0.0.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1281.0.0)

* INSTALL_LIBDIR -> libdir as INSTALL_LIBDIR has DESTDIR

Co-Authored-By: Jed Brown <jed@jedbrown.org>

* CREDITS file update.

Co-authored-by: Jed Brown <jed@jedbrown.org>
Co-authored-by: Field G. Van Zee <field@cs.utexas.edu>
2020-05-21 11:54:54 +05:30
Field G. Van Zee
3597284b9d Updates, tweaks to runme.sh in test/1m4m.
Details:
- Made several updates to test/1m4m/runme.sh, including:
  - Added missing handling for 1m and 4m1a implementations when setting
    the BLIS_??_NT environment variables.
  - Added support for using numactl to run the test executables.
  - Several other cleanups.
2020-05-21 11:54:53 +05:30
Field G. Van Zee
b325f1ea62 Warn user when auto-detection returns 'generic'.
Details:
- Added logic to configure that causes the script to output a warning
  to the user if/when "./configure auto" is run and the underlying
  hardware feature detection code is unable to identify the hardware.
  In these cases, the auto-detect code will return 'generic', which
  is likely not what the user expected, and a flag will be set so that
  a message is printed at the end of the configure output. (Thankfully,
  we don't expect this scenario to play out very often.) Thanks to
  Devin Matthews for suggesting this fix #384.
2020-05-21 11:54:53 +05:30
Field G. Van Zee
9e76059f15 Renamed bli_thread_obarrier(), _obroadcast().
Details:
- Renamed two bli_thread_*() APIs:
    bli_thread_obarrier()   -> bli_thread_barrier()
    bli_thread_obroadcast() -> bli_thread_broadcast()
  The 'o' was a leftover from when thrcomm_t objects tracked both
  "inner" and "outer" communicators. They have long since been
  simplified to only support the latter, and thus the 'o' is
  superfluous.

Change-Id: If9ec9a2383dfb02e1cfc74918f87a1fabddbd55b
2020-05-21 11:54:37 +05:30
Field G. Van Zee
6a957d7247 List Gentoo under supported external packages.
Details:
- Add mention of Gentoo Linux under the list of external packages in
  the README.md file. Thanks to M. Zhou for maintaining this package.
2020-05-21 11:50:37 +05:30
Field G. Van Zee
c7faae9442 Merged test/sup, test/supmt into test/sup.
Details:
- Updated the Makefile, test_gemm.c, and runme.sh in test/sup to be able
  to compile and run both single-threaded and multithreaded experiments.
  This should help with maintenance going forward.
- Created a test/sup/octave_st directory of scripts (based on the
  previous test/sup/octave scripts) as well as a test/sup/octave_mt
  directory (based on the previous test/supmt/octave scripts). The
  octave scripts are slightly different and not easily mergeable, and
  thus for now I'll maintain them separately.
- Preserved the previous test/sup directory as test/sup/old/supst and
  the previous test/supmt directory as test/sup/old/supmt.

Change-Id: Ia230fc65185fd9a34eec714721004aa9e0bd40ed
2020-05-21 11:50:19 +05:30
Field G. Van Zee
6d369532e3 Updated sup[mt] Makefiles for variable dim ranges.
Details:
- Updated test/sup/Makefile and test/supmt/Makefile to allow specifying
  different problem size ranges for the drivers where one, two, or three
  matrix dimensions is large. This will facilitate the generation of
  more meaningful graphs, particularly when two dimensions are tiny.
2020-05-21 11:46:36 +05:30
Field G. Van Zee
2096f41aa6 Updates to octave scripts in test/sup[mt]/octave.
Details:
- Optimized scripts in test/sup/octave and test/supmt/octave for use
  with octave 5.2.0 on Ubuntu 18.04.
- Fixed stray 'end' keywords in gen_opsupnames.m and plot_l3sup_perf.m,
  which were not only unnecessary but also causing issues with versions
  5.x.
2020-05-21 11:46:35 +05:30
Field G. Van Zee
08709d4117 Removed sorting on LDFLAGS in common.mk (#373).
Details:
- Removed a line of code in common.mk that passed LDFLAGS through the
  sort function. The purpose was not to sort the contents, but rather
  to remove duplicates. However, there is valid syntax in a string of
  linker flags that, when sorted, yields different/broken behavior.
  So I've removed the line in common.mk that sorts LDFLAGS. Also, for
  future use, I've added a new function, rm-dupls, that removes
  duplicates without sorting. (This function was based on code from a
  stackoverflow thread that is linked to in the comments for that
  code.) Thanks to Isuru Fernando for reporting this issue (#373).

Change-Id: Ie355cc111fd2c6669f0c3088e8fa5dc7c407a3b9
2020-05-21 11:45:48 +05:30
Field G. Van Zee
b3c0309009 CHANGELOG update (0.6.1) 2020-05-21 11:42:04 +05:30
Field G. Van Zee
d6496d55cc ReleaseNotes.md update in advance of next version.
Details:
- Updated ReleaseNotes.md in preparation for next version.

Change-Id: I2aa6f944ce2584de85ae7b6921ff0193b3b7020a
2020-05-21 11:41:49 +05:30
Field G. Van Zee
51f87f3e42 Removed 'attic/windows' (to prevent confusion).
Details:
- Finally removed 'attic/windows' and its contents. This directory once
  contained "proto" Windows support for BLIS, but we've since moved on
  to (thanks to Isuru Fernando) providing Windows DLL support via
  AppVeyor's build artifacts. Furthermore, since 'windows' was the only
  subdirectory within 'attic', the directory path would show up in
  GitHub's listing at https://github.com/flame/blis, which probably led
  to someone being confused about how BLIS provides Windows support. I
  assume (but don't know for sure) that nobody is using these files, so
  this is admittedly a case of shoot first and ask questions later.
2020-05-21 11:41:10 +05:30
Field G. Van Zee
142df1b1e9 CREDITS file update. 2020-05-21 11:41:10 +05:30
Dave Love
291ee5f748 Fix parsing in vpu_count on workstation SKX (#351)
* Fix parsing in vpu_count on workstation SKX

* Document Skylake-X as Haswell for single FMA

* Update vpu_count for Skylake and Cascade Lake models

* Support printing the configuration selected, controlled by the environment

Intended particularly for diagnosing mis-selection of SKX through
unknown, or incorrect, number of VPUs.

* Move bli_log outside the cpp condition, and use it where intended

* Add Fixme comment (Skylake D)

* Mostly superficial edits to commits towards #351.

Details:
- Moved architecture/sub-config logging-related code from bli_cpuid.c
  to bli_arch.c, tweaked names, and added more set/get layering.
- Tweaked log messages output from bli_cpuid_is_skx() in bli_cpuid.c.
- Content, whitespace changes to new bullet in HardwareSupport.md that
  relates to single-VPU Skylake-Xs.

* Fix comment typos

Co-authored-by: Field G. Van Zee <field@cs.utexas.edu>
2020-05-21 11:40:57 +05:30
Field G. Van Zee
99da76fd64 Fixed 'configure' breakage introduced in 6433831.
Details:
- Added a missing 'fi' (endif) keyword to a conditional block added in
  the configure script in commit 6433831.
2020-05-21 11:40:00 +05:30
Field G. Van Zee
38ecda47e7 Updated 1m draft article link in README.md. 2020-05-21 11:40:00 +05:30
Jeff Hammond
570d51483b blacklist ICC 18 for knl/skx due to test failures
Signed-off-by: Jeff Hammond <jeff.r.hammond@intel.com>
2020-05-21 11:40:00 +05:30
Jeff Hammond
afc57adc1b blacklist Intel 19+
Signed-off-by: Jeff Hammond <jeff.r.hammond@intel.com>
2020-05-21 11:40:00 +05:30
Jeff Hammond
dd54e792a7 fix link to docs
the comment contains an incorrect link, which is trivially fixed here.

@fgvanzee I hope you don't mind that I committed directly to master but this cannot break anything.
2020-05-21 11:40:00 +05:30
Field G. Van Zee
d988a5bbd7 Fixed bugs in cblas_sdsdot(), sdsdot_().
Details:
- Fixed a bug in sdsdot_sub() that redundantly added the "alpha" scalar,
  named 'sb'. This value was already being added by the underlying
  sdsdot_() function. Thus, we no longer add 'sb' within sdsdot_sub().
  Thanks to Simon Lukas Märtens for reporting this bug via #367.
- Fixed a second bug in order of typecasting intermediate products in
  sdsdot_(). Previously, the "alpha" scalar was being added after the
  "outer" typecast to float. However, the operation is supposed to first
  add the dot product to the (promoted) scalar and THEN downcast the sum
  to float. Thanks to Devin Matthews for catching this bug.
2020-05-21 11:40:00 +05:30
Field G. Van Zee
afee36b251 Annoted missing thread-related symbols for export.
Details:
- Added BLIS_EXPORT_BLIS annotation to function prototypes for

    bli_thrcomm_bcast()
    bli_thrcomm_barrier()
    bli_thread_range_sub()

  so that these functions are exported to shared libraries by default.
  This (hopefully) fixes issue #366. Thanks to Kyungmin Lee for
  reporting this bug.
- CREDITS file update.
2020-05-21 11:40:00 +05:30
Nicholai Tukanov
718b64814d Add prototypes for POWER9 reference kernels (#365)
Updates and fixes to power9 subconfig.

Details:
- Register s,c,z reference gemm and trsm ukernels that assume elements
  of B have been broadcast.
- Added prototypes for level-3 ukernels that assume elements of B have
  been broadcast. Also added prototype for an spackm function that
  employs a duplication/broadcast factor of 4.
- Register virtual gemmtrsm ukernels that work with broadcasting of B.
- Disable right-side hemm, symm, trmm, and trmm3 in bli_family_power9.h.
- Thanks to Nicholai Tukanov for providing these updates.
2020-05-21 11:40:00 +05:30
Meghana Vankadari
9ea0472f4c Replaced all the instances of zen_basic with zen_ref_c
Change-Id: Id53f2c1ce7e9878991a831c3651061f0b679b080
Signed-off-by: Meghana Vankadari <Meghana.Vankadari@amd.com>
AMD-Internal: [CPUPL-885]
2020-05-19 20:27:17 +05:30
Meghana Vankadari
4fcc4e499d Optimized DGEMV kernel and changed BLAS interface call
Details:
- Optimized daxpyf kernel with fuse_factor=5 and iter_unroll=2.
- Modified framework files of dgemv to remove dependency on cntx variable.
- Updated cntx_init file of zen2 to choose optimized kernels.
- Modified BLAS interface call for DGEMV to reduce framework overhread.
- Currently these changes are applicable for zen2 configuration.
  They will be enabled for zen family processors in future.
- Changed naming convention for new BLAS macros to indicate their use.
- Added new optimized kernel for axpyf under zen2 folder.
- Implemented basic GEMV kernel without using axpyv or axpyf.
  This kernel is chosen for small sizes.

Change-Id: I4278d37e494854879c71499b8b9da8c5dbe3bf5b
Signed-off-by: Meghana Vankadari <Meghana.Vankadari@amd.com>
AMD-Internal: [CPUPL-885]
2020-05-19 06:40:44 -04:00
managalv
af1ad806f2 CPUPL-929: Improve Complex GEMM performance - Support all storage formats and non Transpose/Conjugate Matrices
Details:
Supports cgemm SUP all storage formats for XXR format

Change-Id: I1f1ac6b47f0b54141acac65e2cb4f3a2aaa3bac6
2020-05-18 21:06:57 +05:30
managalv
310dda928f CPUPL-709: Improve Complex GEMM performance - Level 1 Optimization
Details
Added SUP support for cgemm in M direction
SUP kernels are 3x8m, 3x4m, 3x2m is implemeted
Sub kernels are implemented to support various dimenions
SUP CGEMM supports matrix C & A row/col major and Matrix B is row major matrix

Change-Id: Ia6854b929d3b5741a4900422d05df1257f5d014d
2020-05-18 20:43:49 +05:30
Nallani Bhaskar
b3a308b689 CPUPL-948: Selective Packing changes are imlplemented in sgemm sup
Description:
Pannel strides are updated using variables rather than constant values to
support selecive packing in sgemm sup kernels

Change-Id: Ic098eb70592d12d7d2174a1166aebf3bc749140c
2020-05-18 11:46:33 +05:30
Devrajegowda, Kiran
884f2febd1 Revert "Block parameters tuning to improve sgemm performance on Rome"
Details:
    - Reverts commit 1c76723320.
    - Regression in Multi-Threaded sgemm performance with new block parameters on Rome

Change-Id: I67b050f6434f6ade2c982b3cd10aa863c0077601
Signed-off-by: Kiran ND <Kiran.Devrajegowda@amd.com>
AMD-Internal: [CPUPL-920]
2020-05-14 11:18:46 +05:30
Devrajegowda, Kiran
6f33fd6aac Modified Function definition for BLAS and CBLAS interfaces of ?SCALV API
Details:
    -Kernel is called directly from API call to avoid framework
     overhead in case of single and double precisions.
    -Currently these changes are applicable only for zen2 configuration.
     They will be enabled for zen family processors in future.
    -These changes improve performance of BLAS and CBLAS interfaces of API.
     They do not affect BLIS-specific APIs.
    -setv simd kernel is added for single and double precision elements

Change-Id: I1b343aa232f2571717c2b01ada5914f869883e1a
    Signed-off-by: Kiran ND <Kiran.Devrajegowda@amd.com>
    AMD-Internal: [CPUPL-817]
2020-05-13 01:51:48 -04:00
Meghana Vankadari
d6db8d1d2c Merge "Modified Function definition for BLAS and CBLAS interfaces of DOTV and SWAPV APIs" into amd-staging-rome-2.2 2020-05-06 00:34:37 -04:00
Nallani Bhaskar
830f1a44c6 CPUPL-849: BLIS SGEMM general stride test cases fails for smaller matrix sizes
Details:
Support for inputs with general strides is not yet implemented in sup. This check will redirect to default path.

Change-Id: I594672e56ffb60c8d89a634e27f30f2ac2a7e38f
2020-05-05 16:23:46 +05:30
Meghana
28bb28b79f Modified Function definition for BLAS and CBLAS interfaces of DOTV and SWAPV APIs
Details:
-Kernel is called directly from API call to avoid framework
 overhead in case of single and double precisions.
-Currently these changes are applicable only for zen2 configuration.
 They will be enabled for zen family processors in future.
-These changes improve performance of BLAS and CBLAS interfaces of API.
 They do not affect BLIS-specific APIs.

Change-Id: I1eb7ca470ced82c3cfa8b22f2b53000d42fef96c
Signed-off-by: Meghana Vankadari <Meghana.Vankadari@amd.com>
AMD-Internal: [CPUPL-847,CPUPL-816]
2020-05-04 15:06:07 +05:30
Nallani Bhaskar
49cd7a96d5 CPUPL-866: ZenDNN gtest cases failing with blis 2.1 and later releases
Change-Id: Ib9ddfb133576d06cea6642fc3fefd818317fe922
2020-05-03 13:00:43 +05:30
Devrajegowda, Kiran
4ad5b1a5e6 Update zen2 kernel context with number of level1 kernels
Details:
       - Adding missed copyv simd kernel changes while rebasing.

Change-Id: Iaedabd39fdf297fefec1e48e7d2c1a2f3d7eb08d
Signed-off-by: Kiran N D <kiran.Devrajegowda@amd.com>
AMD-Internal: [CPUPL-818]
2020-04-24 12:22:24 +05:30
Devrajegowda, Kiran
4caee59466 Adding a simd kernel for copyv function
Details:
    - Separate kernel for copyv function added to improve performance.
    - Modified cntx_init file in zen and zen2 configuration
    - Added test_copyv.c in test folder
    - Modified test/Makefile to include test_copyv.c

Change-Id: I297f539f2ddd2d71997b127a71a460991cd07b41
Signed-off-by: Kiran N D <kiran.Devrajegowda@amd.com>
AMD-Internal: [CPUPL-818]
2020-04-24 01:55:25 -04:00
Nallani Bhaskar
ba00f75f64 Merge "JIRA: CPUPL-853: Fix for the redefinition of _unsigned int __get_cpuid_max(unsigned int, unsigned int*)_. http://ontrack-internal.amd.com/browse/CPUPL-853 https://github.com/flame/blis/issues/393" into amd-staging-rome-2.2 2020-04-24 01:12:00 -04:00
Nallani Bhaskar
ea3865fbf2 JIRA: CPUPL-853: Fix for the redefinition of _unsigned int __get_cpuid_max(unsigned int, unsigned int*)_. http://ontrack-internal.amd.com/browse/CPUPL-853 https://github.com/flame/blis/issues/393
Change-Id: I88c23b2fdad0beb3796d0e6acbcf215fe9daab2d
2020-04-23 17:14:24 +05:30
Meghana
f80e21ca7b Modified Function definition for BLAS and CBLAS interfaces of I?AMAX API
Details:
-Kernel is called directly from API call to avoid framework
 overhead in case of single and double precisions.
-Currently these changes are applicable only for zen2 configuration.
 They will be enabled for zen family processors in future.
-These changes improve performance of BLAS and CBLAS interfaces of API.
 They do not affect BLIS-specific APIs.

Change-Id: Ib12f5a4f66a3227681fb3028207a08cb69cc2406
Signed-off-by: Meghana Vankadari <Meghana.Vankadari@amd.com>
AMD-Internal: [CPUPL-855]
2020-04-22 17:43:55 +05:30
Meghana
138bc75063 Modified function definition for AXPY CBLAS interface
Details:
-Kernel is called directly from API call to avoid framework
 overhead in case of single and double precisions.
-Currently these changes are applicable only for zen2 configuration.
 They will be enabled for zen family processors in future.

Change-Id: Ifa17dc28d3b38e1e16b28bb785d9fdf4a223d909
Signed-off-by: Meghana Vankadari <Meghana.Vankadari@amd.com>
AMD-Internal: [CPUPL-805]
2020-04-21 07:59:07 -04:00
Devrajegowda, Kiran
1c76723320 Block parameters tuning to improve sgemm performance on Rome
Details:
    - Tuned block sizes to get better performance for sgemm default path.

Change-Id: I892e8642fa2d03a07a6d53537131536e6b1b091e
Signed-off-by: Kiran N D <kiran.Devrajegowda@amd.com>
AMD-Internal: [CPUPL-832]
2020-04-21 07:34:12 -04:00
Meghana Vankadari
139fbbb77f Merge "Added opt kernels for SWAPV" into amd-staging-rome-2.2 2020-04-21 05:54:19 -04:00
Kiran Varaganti
0fdb539d40 Fixed CPUPL-845 - expert interfaces consistent with other interfaces w.r.t disabling selective packing in sup by defaut
Change-Id: Id678ee727e8e9197e1c5b48a994fafd7797c48f2
2020-04-20 16:15:05 +05:30
Meghana
b846059bcf Added opt kernels for SWAPV
Details:
-Added SIMD kernels for SWAPV for both single and double precisions.
-Modified cntx_init file for zen and zen2 configurations to choose opt kernels for
 SWAPV.
-Added test_swapv.c in test folder.
-Modified test/Makefile to include test_swapv.c

Change-Id: Ida786eec722e634aee0dacdd51c327823c80f01a
Signed-off-by: Meghana Vankadari <Meghana.Vankadari@amd.com>
AMD-Internal: [CPUPL-847]
2020-04-20 01:21:44 -05:00
Dipal Madhukar Zambare
f7bb291f6b Merge "Disable execution and debug trace by default." into amd-staging-rome-2.2 2020-04-20 01:16:04 -04:00
dzambare
80de43a483 Disable execution and debug trace by default.
Change-Id: I126336bc3d8a49019b083e66621c6a79725f7f0d
2020-04-20 10:44:32 +05:30
Meghana
80086fad15 Modified function definition for AXPY BLAS interface
Details:
-Calling the kernel directly from API call to avoid framework
overhead.
-Currently these changes are only applicable for zen2 configuration.
 They will be enabled for zen family processors in future.

Change-Id: I0139e185178f726f5cd8cba0ff6a441a00d67868
Signed-off-by: Meghana Vankadari <Meghana.Vankadari@amd.com>
AMD-Internal: [CPUPL-805]
2020-04-19 23:55:27 -05:00
Vasanthakumar Rajagopal
489d501f2e Merge "Execution and Debug trace support." into amd-staging-rome-2.2 2020-04-15 06:30:50 -04:00
Meghana
e56cf63a3f Optimized "bli_dotv_zen_int10" kernels
Details:
- Fixed issues in "bli_dotv_zen_int10" kernels and optimized them.
- Changed cntx_init file to choose "bli_dotv_zen_int10" kernel for dotv
 API call.

Change-Id: Iee8d7519f3a22a2d41166390be6047e9cb37557f
Signed-off-by: Meghana Vankadari <Meghana.Vankadari@amd.com>
AMD-Internal: [CPUPL-824]
2020-04-14 09:52:57 +05:30
dzambare
d40edf7dac Execution and Debug trace support.
Added support add debug logging, execution trace and decode.

Change-Id: I024bf6165daa9e23a62423f2401c0f1c5de459ba
AMD-Internal: [CPUPL-806]
2020-04-07 08:48:59 +05:30
Meghana
b5fe75e104 Closing input and output files in test_gemm.c and test_trsm.c
Change-Id: I75cdd5adc2bd2dac7d0eca9c050e06dbd52bec26
Signed-off-by: Meghana Vankadari <Meghana.Vankadari@amd.com>
2020-03-24 09:09:58 +05:30