mirror of
https://github.com/amd/blis.git
synced 2026-05-11 09:39:59 +00:00
4294 lines
173 KiB
Plaintext
4294 lines
173 KiB
Plaintext
commit 036cc634918463b1caa0fd89c9a211f2f5639af7 (HEAD, tag: 0.1.3, master)
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Jun 23 13:48:17 2014 -0500
|
|
|
|
Version file update (0.1.3)
|
|
|
|
commit 09d9a3bf6763932d9f571085b2cfd1b8631eccba
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Jun 23 13:43:26 2014 -0500
|
|
|
|
Reverting version file to test new version script.
|
|
|
|
Details:
|
|
- Changed version file contents to 0.1.2 so that I can test out a new
|
|
version file bumping script.
|
|
|
|
commit ebb33965981dcb2b0bdee5fc7fdf6c959420f311
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Jun 23 11:22:50 2014 -0500
|
|
|
|
Added 'version' file.
|
|
|
|
commit 2cb9a5501a3cbeb6692cf68e896087ba73b6af69
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Jun 23 10:42:29 2014 -0500
|
|
|
|
Removed 'version' from .gitignore file.
|
|
|
|
commit b40dcefc5ee31f67aa3990e2e9d2ef8ed1386a25 (origin/master)
|
|
Merge: 7101a8e b693b0c
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Jun 23 10:39:05 2014 -0500
|
|
|
|
Merge pull request #11 from Maratyszcza/stable
|
|
|
|
[sc]axpy kernels for PNaCl
|
|
|
|
commit b693b0cddcfb41450e3c09a3ab97acb44c1ccdec
|
|
Author: Marat Dukhan <maratek@gmail.com>
|
|
Date: Sun Jun 22 13:44:25 2014 -0700
|
|
|
|
[SC]AXPY kernels for PNaCl
|
|
|
|
commit 7101a8eec0327d6c3a7eb36eb4b0fd45c1c6d162
|
|
Merge: ad48dca 020a831
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Jun 19 21:46:50 2014 -0500
|
|
|
|
Merge pull request #10 from Maratyszcza/stable
|
|
|
|
Portable Native Client port
|
|
|
|
commit 020a831bc5f61744cb8354886aa679b99b1285f6
|
|
Author: Marat Dukhan <maratek@gmail.com>
|
|
Date: Thu Jun 19 00:58:26 2014 -0700
|
|
|
|
Code clean-up in PNaCl port
|
|
|
|
commit 491be4f91ed725522f5cc7184053857c6c376ada
|
|
Author: Marat Dukhan <maratek@gmail.com>
|
|
Date: Thu Jun 19 00:45:44 2014 -0700
|
|
|
|
Optimized dot product kernels for PNaCl
|
|
|
|
commit 4b8e71aab80182873a2e138eb07902b8d8fd5480
|
|
Author: Marat Dukhan <maratek@gmail.com>
|
|
Date: Thu Jun 19 00:43:25 2014 -0700
|
|
|
|
Use AR rcs flags for PNaCl target to avoid warning
|
|
|
|
commit 031deb2a5c718d569bde842590a791b812f4cf1d
|
|
Author: Marat Dukhan <maratek@gmail.com>
|
|
Date: Wed Jun 18 03:11:34 2014 -0700
|
|
|
|
PNaCl configuration: use pnacl-ar instead or ar (fixes build issue on Mac)
|
|
|
|
commit 68a02976e3c3638f0a9821342e269a1743e3ace3
|
|
Author: Marat Dukhan <maratek@gmail.com>
|
|
Date: Wed Jun 18 03:10:25 2014 -0700
|
|
|
|
Compile pnacl configuration in GNU11 mode to avoid warning about non-standard features
|
|
|
|
commit 6f8462eb0ec278b89731e73ef583386a3371d095
|
|
Author: Marat Dukhan <maratek@gmail.com>
|
|
Date: Wed Jun 18 03:08:46 2014 -0700
|
|
|
|
Fix inconsistent VERBOSE macro in Makefile
|
|
|
|
commit b2ffb4de8b6872cb23537ad282e557d11dcd9c8b
|
|
Author: Marat Dukhan <maratek@gmail.com>
|
|
Date: Sun Jun 15 18:41:30 2014 -0400
|
|
|
|
Reformatted PNaCl GEMM kernels
|
|
|
|
commit 6de2d472d98baa215264a776f3d5291780a6a085
|
|
Author: Marat Dukhan <maratek@gmail.com>
|
|
Date: Sun Jun 15 08:44:31 2014 -0400
|
|
|
|
CGEMM and ZGEMM kernels for PNaCl
|
|
|
|
commit f064711a5e6fb3852c17c7520909b09dc27665f2
|
|
Author: Marat Dukhan <maratek@gmail.com>
|
|
Date: Sun Jun 15 06:27:37 2014 -0400
|
|
|
|
SGEMM and DGEMM kernels for PNaCl
|
|
|
|
commit ad48dca22913a363899f0bef45553898718eebb1
|
|
Merge: ee2b679 7118f87
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Sat Jun 14 15:10:13 2014 -0500
|
|
|
|
Merge pull request #9 from tkelman/memalign_windows
|
|
|
|
Use _aligned_malloc instead of posix_memalign on Windows
|
|
|
|
commit 7118f87e18b4941423472afc00215c1d1f2a1fcd
|
|
Author: Tony Kelman <tony@kelman.net>
|
|
Date: Sat Jun 14 06:53:20 2014 -0700
|
|
|
|
Use _aligned_malloc instead of posix_memalign on Windows
|
|
|
|
commit ee2b679281ca45fb40b2198e293bc3bc3d446632
|
|
Author: Tyler Smith <tms@cs.utexas.edu>
|
|
Date: Fri Jun 6 12:41:55 2014 -0500
|
|
|
|
Only include omp.h if BLIS_ENABLE_OPENMP is set
|
|
|
|
commit 19c05dfaac43c627f86e897c8c00f1f9440754aa
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Jun 5 10:54:16 2014 -0500
|
|
|
|
CHANGELOG update (for 0.1.2).
|
|
|
|
commit 00f232f8ed1f7c41619b12ebf779ebe2c3b2d3cd (tag: 0.1.2)
|
|
Author: Tyler Smith <tms@cs.utexas.edu>
|
|
Date: Mon Jun 2 13:40:57 2014 -0500
|
|
|
|
Added single-precision micro-kernel for Knights Corner aka MIC aka Xeon Phi
|
|
|
|
commit 3fc60e491426f6248c0feae88d971e4d1f88fb95
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed May 21 11:34:42 2014 -0500
|
|
|
|
Fixed ldim alignment bug in core2 gemm ukernel.
|
|
|
|
Details:
|
|
- Fixed a bug in the dunnington/core2 gemm micro-kernels that resulted in
|
|
a segmentation fault if a column-stored matrix's starting address was
|
|
aligned, but its leading dimension was such that its second column was
|
|
unaligned. Basically, the micro-kernel was assuming that aligned load
|
|
instructions were safe when they actually were not. An extra condition
|
|
that checks the alignment of cs_c (ie: the leading dimension in the
|
|
column storage case) has now been added. Thanks to Michael Lehn for
|
|
reporting this bug.
|
|
|
|
commit 77a2d8dac8b242d7a202c9aabda3927ab68cf987
|
|
Merge: 8c5d607 21fb089
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue May 20 09:53:19 2014 -0500
|
|
|
|
Merge pull request #8 from tlrmchlsmth/master
|
|
|
|
Added multithreading to most level-3 operations.
|
|
|
|
commit 21fb089387ee7c87f6dc53b0f60f68b48d3ff3e8
|
|
Author: Tyler Smith <tms@cs.utexas.edu>
|
|
Date: Mon May 19 20:38:55 2014 -0700
|
|
|
|
Reverting changes dunnington and reference configs
|
|
|
|
Now they are unchanged from the main branch of BLIS
|
|
|
|
commit 8a0ef0e0db5880730425926f8ba56b457a2ba764
|
|
Author: Tyler Smith <tms@cs.utexas.edu>
|
|
Date: Fri May 16 13:44:14 2014 -0500
|
|
|
|
Fixed rounding error in bli_get_range_weighted
|
|
|
|
commit 0b4b1680334528b1b60bc696537600f763198e92
|
|
Author: Tyler Smith <tms@cs.utexas.edu>
|
|
Date: Fri May 16 12:23:37 2014 -0500
|
|
|
|
Fixed bug with disabling JC loop threading for right sided trmm
|
|
|
|
commit 5c048a90d8dfa1dbde4e45fbc10ffcbdfe59d960
|
|
Author: Tyler Smith <tms@cs.utexas.edu>
|
|
Date: Wed May 14 16:20:06 2014 -0500
|
|
|
|
Disabled parallelism for right-sided TRMM JC loop
|
|
|
|
The loop has dependent iterations.
|
|
|
|
commit 13a4c717ed0e273359dbaf5554cc4fa70b087d71
|
|
Author: Tyler Smith <tms@cs.utexas.edu>
|
|
Date: Wed May 14 14:59:04 2014 -0500
|
|
|
|
Fixed bug with bli_get_range_weighted
|
|
|
|
commit 45957cc7745e9bb1698408d72f53ef192e960820
|
|
Author: Tyler Smith <tms@cs.utexas.edu>
|
|
Date: Tue May 13 17:14:46 2014 -0500
|
|
|
|
Allowed threading to be turned off
|
|
|
|
No longer requires OpenMP to compile
|
|
Define the following in bli_config.h in order to enable multithreading:
|
|
BLIS_ENABLE_MULTITHREADING
|
|
BLIS_ENABLE_OPENMP
|
|
|
|
Also fixes a bug with bli_get_range_weighted
|
|
|
|
commit bd1dc98ce599d74513a553fe3b37a2ebca1c3812
|
|
Author: Tyler Smith <tms@cs.utexas.edu>
|
|
Date: Mon May 12 17:26:19 2014 -0500
|
|
|
|
Disabled multithreading of the kc loop
|
|
|
|
commit 456df0372170bd7ca2c7e2d85365a69f1f04de88
|
|
Author: Tyler Smith <tms@cs.utexas.edu>
|
|
Date: Wed Apr 30 12:28:00 2014 -0500
|
|
|
|
Replaced register blocksize hack with querying the register blocksize for determining parallelism granularity
|
|
|
|
commit f4fdfe8fc573553eb36795b79cdf681270dab71b
|
|
Merge: 31bb065 8c5d607
|
|
Author: Tyler Smith <tms@cs.utexas.edu>
|
|
Date: Wed Apr 30 11:46:35 2014 -0500
|
|
|
|
Merge http://github.com/flame/blis
|
|
|
|
commit 8c5d6071e24ba10a53669390a47287e86ff354ce
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Apr 29 12:26:12 2014 -0500
|
|
|
|
Added _check() routines for fprint[mv], rand[mv].
|
|
|
|
Details:
|
|
- Added _check() routines for fprintm, fprintv, randm, and randv.
|
|
- Added invocations to the above routines from their respective
|
|
front-ends.
|
|
|
|
commit 262cdabcc885bcf6636f4d8bb7d320f95e81d820
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Apr 28 16:48:25 2014 -0500
|
|
|
|
Changed treatment of NULL object buffers.
|
|
|
|
Details:
|
|
- Relaxed the constraint in bli_obj_attach_buffer_check(), which required
|
|
the buffer address being attached to be non-NULL. This is acceptable
|
|
because the user was already able to create and use objects with NULL
|
|
buffers (via bli_obj_create_without_buffer(), which initializes the
|
|
buffer to NULL).
|
|
- Inserted calls to newly defined function, bli_check_object_buffer(),
|
|
into nearly all operations' _check() or _int_check() functions. This
|
|
allows BLIS to abort peacefully if a computational routine is called
|
|
with an object containing a NULL buffer. By contrast, under such
|
|
conditions, BLAS would typically fail with a segmentation fault.
|
|
- Within operation front-ends, moved the calls to _check()/_int_check()
|
|
so that zero dimensions are checked first (and if found, execution
|
|
returns with trivial or no computation). This resolves issue #7. Thanks
|
|
to Jack Poulson for reporting this bug.
|
|
|
|
commit 31bb065ba40ae0c5a614e743b8025abca012b99e
|
|
Merge: 20e2443 7c61959
|
|
Author: Tyler Smith <tms@cs.utexas.edu>
|
|
Date: Wed Apr 23 12:30:19 2014 -0500
|
|
|
|
Merge http://github.com/flame/blis
|
|
|
|
commit 7c61959955c8ba78160d0ed4d1979022029d963b
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Apr 10 17:18:36 2014 -0500
|
|
|
|
Can now query register blocksizes from blk algs.
|
|
|
|
Details:
|
|
- Added a new field to blksz_t objects that allows one to attach a
|
|
sub-object. Doing this allows us to associate a register blocksize with
|
|
any given cache blocksize. That way, the register blocksize can be
|
|
queried wherever the cache blocksize would normally be accessible
|
|
(e.g. a blocked algorithm).
|
|
- Modified bli_gemm_cntl.c (and 4m/3m variants) so that the register
|
|
blocksizes are attached to the cache blocksizes after they are created.
|
|
|
|
commit 58671597d3d450817b2eda576c05ed6dadd8af6d
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Apr 10 15:35:30 2014 -0500
|
|
|
|
Minor cleanups to level-2 _cntl.c files.
|
|
|
|
Details:
|
|
- Changed level-2 _cntl.c files so that the blocksizes for gemv are
|
|
imported and used, rather than blocksizes being declared locally.
|
|
- Whitespace changes to gemv_cntl.c and gemm_cntl.c files (as well as
|
|
4m/3m variants).
|
|
- Removed test/old/test_blis2.c.
|
|
|
|
commit 20e24430a772bc0fbaf24dec2f8c544096fd3f4e
|
|
Author: Tyler Michael Smith <tmsmith@vestalac1.ftd.alcf.anl.gov>
|
|
Date: Tue Apr 8 17:50:44 2014 +0000
|
|
|
|
Some fixes for the bgq kernels
|
|
|
|
commit bde697f75ec1e7f2decebee0c9bd620b4c134cd5
|
|
Author: Tyler Smith <tms@cs.utexas.edu>
|
|
Date: Fri Apr 4 16:43:44 2014 -0500
|
|
|
|
Add -openmp to ldflags as well
|
|
|
|
commit c332be8cd471eeace7b4fa4ae7443088b6a68ec3
|
|
Author: Tyler Smith <tms@cs.utexas.edu>
|
|
Date: Fri Apr 4 16:37:50 2014 -0500
|
|
|
|
Added -openmp flag to Xeon Phi build for convenience
|
|
|
|
commit e7ca9e4b4a24d585c9aec8293fc7bb79e4171ad0
|
|
Author: Tyler Smith <tms@cs.utexas.edu>
|
|
Date: Fri Apr 4 16:31:15 2014 -0500
|
|
|
|
Used BLIS_DEFAULT_*_MR for rounding partitioning instead of BLIS_DEFAULT_*_MC
|
|
|
|
commit 7b9b228c6fa4cfb70b1ebb855b009a036e85fac3
|
|
Author: Tyler Smith <tms@cs.utexas.edu>
|
|
Date: Fri Apr 4 16:29:10 2014 -0500
|
|
|
|
Fix for tree barrier freeing bug
|
|
|
|
commit 5ec93bd9a76096312d51c326ccde1e9bd0a436ab
|
|
Author: Tyler Smith <tms@cs.utexas.edu>
|
|
Date: Fri Apr 4 15:09:10 2014 -0500
|
|
|
|
Bunch of minor fixes
|
|
|
|
Removed barrier after unpackm in all level3 blocked variants
|
|
Now there is an implicit barrier inside unpackm that only occurs if C is packed (which is usually not the case)
|
|
|
|
Moved the enabling of the tree barriers into bli_config.h
|
|
Fed the default MR and NR for double precision into bli_get_range instead of the number 8
|
|
|
|
commit 575fb9b0b08f3bdb56ccde056da619d1585617c1
|
|
Author: Tyler Smith <tms@cs.utexas.edu>
|
|
Date: Fri Apr 4 12:13:29 2014 -0500
|
|
|
|
Changed default blocking factor to default double precision MR and NR
|
|
|
|
commit ab9c7880335c281432d5809fe0dec46753d22569
|
|
Author: Tyler Smith <tms@cs.utexas.edu>
|
|
Date: Fri Apr 4 11:38:11 2014 -0500
|
|
|
|
Added faster tree barriers necessary for performance for Xeon Phi
|
|
|
|
Fixed up some stuff in the thread info free functions
|
|
Disabled threading for TRSM so that it actually works when threading environment variables are set
|
|
|
|
commit ec58a7923cccac08632670caadf3cf6ff5dce766
|
|
Author: Tyler Smith <tms@cs.utexas.edu>
|
|
Date: Fri Apr 4 10:22:48 2014 -0500
|
|
|
|
Freeing thread info paths.
|
|
|
|
Also made herk IC and JC loops do weighted partitioning
|
|
|
|
commit 2b6848b2397d6d84ca4e5f792fc51ad05e351a36
|
|
Merge: 4e3eb39 21a0efb
|
|
Author: Tyler Smith <tms@cs.utexas.edu>
|
|
Date: Fri Apr 4 09:54:54 2014 -0500
|
|
|
|
Merge http://github.com/flame/blis
|
|
|
|
Conflicts:
|
|
kernels/bgq/1/bli_axpyv_opt_var1.c
|
|
kernels/bgq/1/bli_dotv_opt_var1.c
|
|
|
|
commit 4e3eb39aca4df0b9fdc003d468f368a2f2ba597d
|
|
Author: Tyler Michael Smith <tmsmith@vestalac1.ftd.alcf.anl.gov>
|
|
Date: Fri Apr 4 14:50:03 2014 +0000
|
|
|
|
Some fixes to the bgq config
|
|
MR and NR for double complex were wrong
|
|
Default fusing factor for double precision was wrong as well
|
|
|
|
commit 21a0efb33d7435139e9c43c1a4787a6bff533e26
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Apr 3 16:38:44 2014 -0500
|
|
|
|
Fixed follow-up to issue #6.
|
|
|
|
commit c318157a9bee8ea6e59be16f99f65d9271fe0d27
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Apr 3 16:24:34 2014 -0500
|
|
|
|
Fixed issue #6 (incorrect 'restrict' usage).
|
|
|
|
Details:
|
|
- Fixed improper usage of restrict keyword in axpyv and dotv bgq kernels.
|
|
(However, there may be other instances of similar misuse elsewhere in
|
|
BLIS.) Thanks to Jeff Hammond for reporting this issue.
|
|
|
|
commit b5150a1bf3bd89598e2b3aeac110eb5b44ac6c12
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Apr 3 12:25:45 2014 -0500
|
|
|
|
Added #include "arm_neon.h" to ARM gemm ukernel.
|
|
|
|
Details:
|
|
- Inserted #include "arm_neon.h" into gemm ukernel source file for
|
|
arm/neon. Thanks to Jean-Michel Hautbois for suggesting this fix.
|
|
|
|
commit 2041c264517b6c590fd4f7e8253e6911b622d1c3
|
|
Author: Tyler Smith <tms@cs.utexas.edu>
|
|
Date: Thu Apr 3 10:30:03 2014 -0500
|
|
|
|
Added barriers needed prior to doing scalar reset for rank-k updates.
|
|
|
|
commit 47a90e69dfde3f4f8fdf90654248a6b499fbadbc
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Apr 1 14:34:31 2014 -0500
|
|
|
|
Attempted to fix uninitialized variable warnings.
|
|
|
|
Details:
|
|
- Added initialization statements to various macros used in level 1m and
|
|
1m-like operations. I wasn't able to reproduce the reported behavior,
|
|
so hopefully this takes care of it. Thanks to Jeff Hammond for the
|
|
report.
|
|
|
|
commit d27b4f690c14b1f836f8c7a3c0e91e09d852f02e
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Apr 1 12:57:24 2014 -0500
|
|
|
|
Use generic paths for toolchain in POWER7.
|
|
|
|
Details:
|
|
- Fixed issue #4. Thanks to Jeff Hammond for contributing changes.
|
|
|
|
commit 1584ae1c83c3a8c1af76acb46404747507650f19
|
|
Author: Tyler Smith <tms@cs.utexas.edu>
|
|
Date: Fri Mar 28 15:15:48 2014 -0500
|
|
|
|
Fixed race condition involving scalar reset
|
|
|
|
commit 459dde4acc09e49380da58fb7b246db488884ad9
|
|
Author: Tyler Smith <tms@cs.utexas.edu>
|
|
Date: Thu Mar 27 17:06:45 2014 -0500
|
|
|
|
Made barrier after packing implicit.
|
|
|
|
This also fixed a bug where barriers in the blocked variants were inserted after the inner packing routines,
|
|
but not the outer packing routines.
|
|
This allowed, for instance, the block of B to not be finished being packed before computation to occur.
|
|
|
|
commit 9f78ec6e7e95fcad89a167b27cad7e2d74b6d122
|
|
Author: Tyler Smith <tms@cs.utexas.edu>
|
|
Date: Thu Mar 27 14:18:46 2014 -0500
|
|
|
|
Some fixes for the internal functions,
|
|
was innappropriately only having thread chief do some things.
|
|
|
|
commit a6fd48345424e097f71652be013aa897e098b41e
|
|
Author: Tyler Michael Smith <tmsmith@vestalac1.ftd.alcf.anl.gov>
|
|
Date: Wed Mar 26 17:19:46 2014 +0000
|
|
|
|
Added test drivers for level 3 BLAS that run tests in parallel using MPI
|
|
|
|
commit 73b3db594864be0f9be9a0eb29bf961fa9c95f29
|
|
Author: Tyler Michael Smith <tmsmith@vestalac1.ftd.alcf.anl.gov>
|
|
Date: Wed Mar 26 15:39:05 2014 +0000
|
|
|
|
Some fixes for the bgq configuration
|
|
|
|
commit f0824a04fc75e231c3a3d7757fa4e7294173282f
|
|
Author: Tyler Smith <tms@cs.utexas.edu>
|
|
Date: Mon Mar 24 15:21:42 2014 -0500
|
|
|
|
Initial commit to enable threading in TRSM,
|
|
|
|
Also enabled weighted partitioning for herk, trmm
|
|
Fixed bug where multiple threads would try to modify the same state in the internal level 3 functions
|
|
Correctly computed a_next and b_next for gemm, herk macrokernels
|
|
a_next and b_next point to the current micropanels in trmm
|
|
|
|
commit 23d9eab354fbc88165889832955e126772bf8488
|
|
Merge: 5d5dc2e fd3e32a
|
|
Author: Tyler Smith <tms@cs.utexas.edu>
|
|
Date: Thu Mar 20 16:54:35 2014 -0500
|
|
|
|
Merge https://github.com/flame/blis
|
|
|
|
commit 5d5dc2eedef2f7c90d61371a1b457be5c06cf583
|
|
Author: Tyler Smith <tms@cs.utexas.edu>
|
|
Date: Thu Mar 20 16:43:36 2014 -0500
|
|
|
|
Parallelized trmm and trmm3
|
|
|
|
Also fixed bugs in packm
|
|
|
|
commit fd3e32a5f419fa412f46afe4dd1c3a26e15f3eb4
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Mar 20 13:59:48 2014 -0500
|
|
|
|
Refined INSERT_GENTFUNC macro usage.
|
|
|
|
Details:
|
|
- Defined new INSERT_GENTFUNC macros so that the macro always takes
|
|
exactly the number of arguments needed for the particular operation or
|
|
variant being defined. Many operations were using INSERT_GENTFUNC
|
|
macros that expected one auxiliary argument even though none were
|
|
needed. Those instances have now been updated. Most of these instances
|
|
were in the level-0 and -1v operations, as well as some operations
|
|
defined in frame/util.
|
|
|
|
commit 9b0e715f29338a1a1d6445907d2445c35f011121
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed Mar 19 15:47:54 2014 -0500
|
|
|
|
Minor simplifications to trmm, trsm macro-kernels.
|
|
|
|
Details:
|
|
- Simplified some code that would have allowed the diagonal of a trmm
|
|
or trsm triangular matrix to intersect the short end of a micro-panel.
|
|
This is disallowed via higher-level constraints on cache blocksizes, so
|
|
this code was never needed and only served to obfuscate.
|
|
- Updated some comments in trmm, trsm macro-kernels.
|
|
|
|
commit a3902750b9ab4923433f7e353f3669c3c419f8e4
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed Mar 19 12:35:17 2014 -0500
|
|
|
|
Reorganized norm operations.
|
|
|
|
Details:
|
|
- Completely reoganized norm operations:
|
|
- Renames:
|
|
- fnormsc, fnormv, fnormm -> normfsc, normfv, normfm (2-norm)
|
|
- absumv -> norm1v (vector 1-norm)
|
|
- New operations:
|
|
- norm1m (matrix 1-norm)
|
|
- normiv, normim (infinity-norm)
|
|
- amaxv (BLAS-like absolute maximum value index)
|
|
- asumv (BLAS-like absolute sum)
|
|
- Deprecated absumm, as it did not correspond to any actual norm.
|
|
(However, an inlined version now exists in the testsuite module for
|
|
randm.)
|
|
|
|
commit c0140cb752f27e99742f85d23be2181c00a1335e
|
|
Author: Tyler Smith <tms@cs.utexas.edu>
|
|
Date: Wed Mar 19 11:21:16 2014 -0500
|
|
|
|
Fixed packm variants 3 and 4 where every thread was trying to manipulate the same state
|
|
|
|
Now just performed by the master thread.
|
|
|
|
commit fb42983bd9943711baa7d1c6496de1215bb816ef
|
|
Author: Tyler Smith <tms@cs.utexas.edu>
|
|
Date: Tue Mar 18 16:37:28 2014 -0500
|
|
|
|
Fixed a barrier bug and a thread decorator bug
|
|
|
|
commit aa2405f8b23d0f8d2ec04790882f2176ef2e8fd8
|
|
Author: Tyler Smith <tms@cs.utexas.edu>
|
|
Date: Tue Mar 18 15:23:09 2014 -0500
|
|
|
|
Fixing function pointer issues with thread decorator
|
|
|
|
commit ec8b88f93533942d3711191873310e7ff281bda6
|
|
Author: Tyler Smith <tms@cs.utexas.edu>
|
|
Date: Tue Mar 18 14:35:37 2014 -0500
|
|
|
|
Enabled threading for packm blocked variants 3 and 4
|
|
|
|
commit 0ac534cdf657bbf04601abfe719ba2887aab5da7
|
|
Author: Tyler Smith <tms@cs.utexas.edu>
|
|
Date: Tue Mar 18 13:26:27 2014 -0500
|
|
|
|
Added decorator for calling parallelized intermal functions
|
|
|
|
Will allow for easy support for different threading models
|
|
|
|
commit 5296f58975f7d351f88909cc80b6d0cffd73def7
|
|
Author: Tyler Smith <tms@cs.utexas.edu>
|
|
Date: Mon Mar 17 17:15:35 2014 -0500
|
|
|
|
Fixing some bugs with herk parallelization
|
|
|
|
commit c51d0110831eb89361b4720bf7ed75edbd26ebce
|
|
Author: Tyler Smith <tms@cs.utexas.edu>
|
|
Date: Mon Mar 17 15:00:47 2014 -0500
|
|
|
|
Initial multithreading support for HERK
|
|
|
|
commit c720b141568d1f289146bf34ded08001f2c0dfbb
|
|
Author: Tyler Smith <tms@cs.utexas.edu>
|
|
Date: Mon Mar 17 11:39:32 2014 -0500
|
|
|
|
Switched to using environment variables to control threading.
|
|
|
|
The environment variables all follow the format BLIS_X_NT,
|
|
where X is the index of the loop as described in our paper
|
|
Anatomy of High Performance Many-Threaded Matrix Multiplication.
|
|
These indices are IR, JR, IC, KC, and JC.
|
|
|
|
Also enabled parallelism for hemm and symm, but these are currently untested.
|
|
|
|
commit 92233cf64274b27b2217c5cfffe75443ff6137a4
|
|
Author: Tyler Smith <tms@cs.utexas.edu>
|
|
Date: Tue Mar 11 14:16:08 2014 -0500
|
|
|
|
Some fixes to gemm thread info tree creation,
|
|
Changed microkernel tests to use the new BLIS_PACKM_SINGLE_THREADED
|
|
instead of BLIS_SINGLE_THREADED
|
|
|
|
commit 020f80c30289d8bcaa688bf600b01fae9b23b54f
|
|
Author: Tyler Smith <tms@cs.utexas.edu>
|
|
Date: Tue Mar 11 12:08:17 2014 -0500
|
|
|
|
Added files specific to threading for gemm and packm operations
|
|
|
|
commit 8d8f4352a41926bc923e47be836365b6b726aff2
|
|
Author: Tyler Smith <tms@cs.utexas.edu>
|
|
Date: Mon Mar 10 15:47:28 2014 -0500
|
|
|
|
Added single threaded thread info data structures specifically for gemm and packm
|
|
|
|
commit 0e8677761175189583ca7d855e24b2bbdd2dada8
|
|
Merge: 2e727a0 b3bff63
|
|
Author: Tyler Smith <tms@cs.utexas.edu>
|
|
Date: Mon Mar 10 15:16:21 2014 -0500
|
|
|
|
Merge branch 'master' of https://github.com/tlrmchlsmth/blis
|
|
|
|
commit 2e727a025a8f796d2b6bd14f489d0ee72e7d1fc7
|
|
Author: Tyler Smith <tms@cs.utexas.edu>
|
|
Date: Mon Mar 10 15:14:33 2014 -0500
|
|
|
|
Modifying the thread info data structures
|
|
|
|
This change makes each operation have its own thread info type,
|
|
allowing more fine control of threading in operations that have different types of suboperations
|
|
|
|
commit a770590cf21a459f04bf941c58ee2afd272cc441
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Mar 3 14:31:44 2014 -0600
|
|
|
|
Minor fixes to sumsqv, abmaxv.
|
|
|
|
Details:
|
|
- Minor update to bli_sumsqv_unb_var1() to bring it up-to-date with
|
|
LAPACK 3.5.0's zlassq.f, which, starting with 3.4.2, returns NaN when
|
|
the vector (or matrix) contains a NaN.
|
|
- Minor change to bli_abmaxv_unb_var1() to more closely mimic the
|
|
behavior of netlib BLAS's izamax(). There, a "less than or equal to"
|
|
operator is used in the search instead of "less than", which would
|
|
change the element index returned if there were multiple maximum values.
|
|
- Added macro function definitions for bli_isinf() and bli_isnan(), which
|
|
are currently implemented in terms of isinf() and isnan() from math.h.
|
|
|
|
commit b3bff631eadf98b15cb422fb4a8e2f855c23e8a7
|
|
Merge: 2c158fb e8757b0
|
|
Author: Tyler Smith <tms@cs.utexas.edu>
|
|
Date: Thu Feb 27 16:53:24 2014 -0600
|
|
|
|
Merge https://github.com/flame/blis
|
|
|
|
commit 2c158fb885c27f7b599dc1e85b57edd684f19223
|
|
Merge: e4738c4 c2b2ab6
|
|
Author: Tyler Smith <tms@cs.utexas.edu>
|
|
Date: Thu Feb 27 16:46:23 2014 -0600
|
|
|
|
Merge https://github.com/flame/blis
|
|
|
|
Conflicts:
|
|
frame/1m/packm/bli_packm_blk_var1.c
|
|
|
|
commit e8757b03a74f9891632242e9a90efb32150826f5
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Feb 27 16:40:07 2014 -0600
|
|
|
|
Use "%ld" as int format specifier in fprintm.
|
|
|
|
Details:
|
|
- Changed "%d" to "%ld" when printing integers via bli_fprintm().
|
|
- Meant to include this in previous commit.
|
|
|
|
commit c663ce3b5170fee7dfb5b528b650d70c8e932cac
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Feb 27 16:32:57 2014 -0600
|
|
|
|
Fixed various bugs when C99 complex is enabled.
|
|
|
|
Details:
|
|
- Fixed various bugs in packm_*_cxk(), the 4m/3m micro-kernels, and
|
|
elsewhere in the framework that were not yet set up to work properly
|
|
when BLIS_ENABLE_C99_COMPLEX is defined in bli_config.h
|
|
- Extensive changes to f2c-derived files in frame/compat/f2c to allow
|
|
C99 complex storage. Most of these changes center around accessing
|
|
real and imaginary components via bli_?real()/bli_?imag() accessor
|
|
macros, and setting of values via bli_?sets() assignment macros.
|
|
(Thanks to Vladimir Sukarev for pointing out that _ENABLE_C99_COMPLEX
|
|
was broken.)
|
|
|
|
commit e4738c48e00b89391d9baa1fd0aa62d1ea2f95e6
|
|
Author: Tyler Smith <tms@cs.utexas.edu>
|
|
Date: Thu Feb 27 16:29:46 2014 -0600
|
|
|
|
Added support for parallelism in gemm micro-kernel
|
|
|
|
commit bfe214b633765ed40b57b330fbb84c332663aa40
|
|
Author: Tyler Smith <tms@cs.utexas.edu>
|
|
Date: Thu Feb 27 15:53:10 2014 -0600
|
|
|
|
Fixed bug with parallel packing, and bug with allocating an array of thread infos
|
|
|
|
In packm variant 1, the variable p_begin was incremented each iteration, causing a dependency.
|
|
This dependeny was removed, allowing each iteration to be executed in parallel.
|
|
|
|
Somewhere in bli_threading.c, I was allocating an array of pointers instead of an array of structs.
|
|
|
|
commit 6193d9ceea552e67170dba45abde04c64271c705
|
|
Author: Tyler Smith <tms@cs.utexas.edu>
|
|
Date: Thu Feb 27 14:09:19 2014 -0600
|
|
|
|
Fixed bug in thread trees
|
|
|
|
commit ac5a2de1d17ffd460b00fee9757898525a09abae
|
|
Merge: 01b125e bd3c7ec
|
|
Author: Tyler Smith <tms@cs.utexas.edu>
|
|
Date: Thu Feb 27 11:59:33 2014 -0600
|
|
|
|
Merge branch 'master' of https://github.com/tlrmchlsmth/blis
|
|
|
|
commit 01b125e815f19410e8e0611d088b84570e499e93
|
|
Author: Tyler Smith <tms@cs.utexas.edu>
|
|
Date: Thu Feb 27 11:55:45 2014 -0600
|
|
|
|
First pass at adding parallelism to BLIS.
|
|
|
|
Added a multithreading infrastructure that should be independent of multithreading implementation in the future.
|
|
Currently, gemm blocked variants 1f and 2f, and packm variant blocked variant 1 is parallelized.
|
|
|
|
commit c2b2ab62707e4174892aff3ce65f36f54878fae5
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed Feb 26 12:46:45 2014 -0600
|
|
|
|
Deprecated panel stride alignment in bli_config.h.
|
|
|
|
Details:
|
|
- Removed BLIS_CONTIG_STRIDE_ALIGN_SIZE from bli_config.h of all
|
|
configurations. It was already going unused in packm_init() since the
|
|
recent 4m/3m commit. This setting was rarely, if ever, useful, and its
|
|
existence only posed a potential risk for 4m/3m-based implementations.
|
|
- Removed BLIS_CONTIG_STRIDE_ALIGN_SIZE usage from mem_pool_macro_defs.h.
|
|
- Updated comments regarding CONTIG_STRIDE_ALIGN_SIZE in template
|
|
micro-kernels.
|
|
|
|
commit f18aee83a5ac1b14808686fc3c5a3c846a1d99b9
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Feb 25 17:58:42 2014 -0600
|
|
|
|
CHANGELOG update (for 0.1.1).
|
|
|
|
commit fde5f1fdece19881f50b142e8611b772a647e6d2 (tag: 0.1.1)
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Feb 25 13:34:56 2014 -0600
|
|
|
|
Added extensive support for configuration defaults.
|
|
|
|
Details:
|
|
- Standard names for reference kernels (levels-1v, -1f and 3) are now
|
|
macro constants. Examples:
|
|
BLIS_SAXPYV_KERNEL_REF
|
|
BLIS_DDOTXF_KERNEL_REF
|
|
BLIS_ZGEMM_UKERNEL_REF
|
|
- Developers no longer have to name all datatype instances of a kernel
|
|
with a common base name; [sdcz] datatype flavors of each kernel or
|
|
micro-kernel (level-1v, -1f, or 3) may now be named independently.
|
|
This means you can now, if you wish, encode the datatype-specific
|
|
register blocksizes in the name of the micro-kernel functions.
|
|
- Any datatype instances of any kernel (1v, 1f, or 3) that is left
|
|
undefined in bli_kernel.h will default to the corresponding reference
|
|
implementation. For example, if BLIS_DGEMM_UKERNEL is left undefined,
|
|
it will be defined to be BLIS_DGEMM_UKERNEL_REF.
|
|
- Developers no longer need to name level-1v/-1f kernels with multiple
|
|
datatype chars to match the number of types the kernel WOULD take in
|
|
a mixed type environment, as in bli_dddaxpyv_opt(). Now, one char is
|
|
sufficient, as in bli_daxpyv_opt().
|
|
- There is no longer a need to define an obj_t wrapper to go along with
|
|
your level-1v/-1f kernels. The framework now prvides a _kernel()
|
|
function which serves as the obj_t wrapper for whatever kernels are
|
|
specified (or defaulted to) via bli_kernel.h
|
|
- Developers no longer need to prototype their kernels, and thus no
|
|
longer need to include any prototyping headers from within
|
|
bli_kernel.h. The framework now generates kernel prototypes, with the
|
|
proper type signature, based on the kernel names defined (or defaulted
|
|
to) via bli_kernel.h.
|
|
- If the complex datatype x (of [cz]) implementation of the gemm micro-
|
|
kernel is left undefined by bli_kernel.h, but its same-precision real
|
|
domain equivalent IS defined, BLIS will use a 4m-based implementation
|
|
for the datatype x implementations of all level-3 operations, using
|
|
only the real gemm micro-kernel.
|
|
|
|
commit 15b51e990f1d21333b5f7af97c211756247336e5
|
|
Merge: 6363a9f fc04b5e
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Feb 21 09:04:32 2014 -0600
|
|
|
|
Merge branch 'master' of github.com:fgvanzee/blis
|
|
|
|
commit fc04b5eb69868c341ce03f5ef1f02de4b8c121b0
|
|
Merge: b29e1c2 d1813c9
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Feb 21 09:04:13 2014 -0600
|
|
|
|
Merge pull request #3 from figual/master
|
|
|
|
New ARM armv7a kernels and Assembly file consideration in Makefile
|
|
|
|
commit d1813c9dee34410833db5061e6588ec1a6c9ecd4
|
|
Author: Francisco Igual <figual@pandaboard.(none)>
|
|
Date: Fri Feb 21 15:14:31 2014 +0100
|
|
|
|
Added new armv7a micro-kernels and configuration files from Werner Saar.
|
|
|
|
commit 0cd098c03a000ed9426a7e9135190696da8cadbc
|
|
Author: Francisco Igual <figual@pandaboard.(none)>
|
|
Date: Fri Feb 21 15:12:30 2014 +0100
|
|
|
|
o Modified Makefile to consider .S assembly microkernels.
|
|
|
|
commit 6363a9f658257fe3d814a3dce5308f807adb54a2
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed Feb 19 17:00:52 2014 -0600
|
|
|
|
Added level-3 support for complex via 4m-/3m.
|
|
|
|
Details:
|
|
- Added the ability to induce complex domain level-3 operations via new
|
|
virtual complex micro-kernels which are implemented via only real
|
|
domain micro-kernels. Two new implementations are provided: 4m and 3m.
|
|
4m implements complex matrix multiplication in terms of four real
|
|
matrix multiplications, where as 3m uses only three and thus is
|
|
capable of even higher (than peak) performance. However, the 3m method
|
|
has somewhat weaker numerical properties, making it less desirable
|
|
in general.
|
|
- Further refined packing routines, which were recently revamped, and
|
|
added packing functionality for 4m and 3m.
|
|
- Some modifications to trmm and trsm macro-kernels to facilitate indexing
|
|
into micro-panels which were packed for 4m/3m virtual kernels.
|
|
- Added 4m and 3m interfaces for each level-3 operation.
|
|
- Various other minor changes to facilitate 4m/3m methods.
|
|
|
|
commit b29e1c2b278c177e104c84ba462820ee8296df6c
|
|
Merge: ee60377 bd3c7ec
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Feb 14 14:11:54 2014 -0600
|
|
|
|
Merge pull request #2 from tlrmchlsmth/master
|
|
|
|
Fixes and improvements to xeon phi implementation.
|
|
|
|
commit bd3c7ecfb54a9b9851c7d364f41c21e4cff52f6f
|
|
Author: Tyler Smith <tms@cs.utexas.edu>
|
|
Date: Fri Feb 14 14:05:57 2014 -0600
|
|
|
|
Removing changes to input.general and input.operations
|
|
|
|
commit ce066863683cb4e910270cf8ab8e138b01ff3358
|
|
Author: Tyler Smith <tms@cs.utexas.edu>
|
|
Date: Fri Feb 14 13:40:24 2014 -0600
|
|
|
|
Fixed more Xeon Phi bugs, especially with scattered update
|
|
|
|
commit 31134b5c7076423aee1b4f494e925f27171d97e6
|
|
Author: Tyler Smith <tms@cs.utexas.edu>
|
|
Date: Fri Feb 14 11:19:44 2014 -0600
|
|
|
|
Some fixes, changes, and improvements to the microkernel to the Xeon Phi
|
|
|
|
commit ee60377e467862b9d8a7205c45dce5cf66c78c46
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Feb 13 14:03:31 2014 -0600
|
|
|
|
Shifted some fields in info_t.
|
|
|
|
Details:
|
|
- Shifted the pack order, pack buffer type, and structure type fields
|
|
to make room for an extra bit in the pack type/status field.
|
|
|
|
commit bd3ab1ad4cf42f8bc30ab262acf8eccb49bb1a08
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Feb 13 09:29:55 2014 -0600
|
|
|
|
Minor fixes to trsm consistent with prev on trmm.
|
|
|
|
Details:
|
|
- Removed use of bli_min() and bli_max() that were only being used to
|
|
try to support situations where the diagonal would intersect the
|
|
short end of some micro-panels, which is situation that is disallowed
|
|
at a higher level by various constraints on the register and cache
|
|
blocksize. This only affected trsm_ll and trsm_lu.
|
|
- Use panel stride as passed into the macro-kernel rather than compute
|
|
it via k and PACKMR/PACKNR. This affects all macro-kernels of trsm.
|
|
|
|
commit 6260b0b5f8bd248f3f66e5a1c6854bdbd9d02ad0
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Feb 13 09:19:56 2014 -0600
|
|
|
|
Fixed obscure bug in trmm_ll, trmm_lu.
|
|
|
|
Details:
|
|
- Fixed an obscure bug in left-hand trmm that would only manifest when
|
|
non-zero register blocksize extensions (PACKMR > MR or PACKNR > NR)
|
|
are used.
|
|
- Removed use of bli_min() and bli_max() that were only being used to
|
|
try to support situations where the diagonal would intersect the
|
|
short end of some micro-panels, which is situation that is disallowed
|
|
at a higher level by various constraints on the register and cache
|
|
blocksize. This only affected trmm_ll and trmm_lu.
|
|
- Use panel stride as passed into the macro-kernel rather than compute
|
|
it via k and PACKMR/PACKNR. This affects all macro-kernels of trmm.
|
|
|
|
commit 16915c1c1e55c660bf82141cdadf7c0860d5b464
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Feb 11 10:54:19 2014 -0600
|
|
|
|
Fixed an obscure bug in packm_cxk().
|
|
|
|
Details:
|
|
- Fixed a bug in packm_cxk() whereby the packm ukernel was being chosen
|
|
from ldp, which is always equal to PACKMR or PACKNR. The problem with
|
|
this is that the pack ukernels were implicitly assuming that the
|
|
panel dimension of the panel being packed was equal to ldp, which
|
|
is not the case when the register blocksizes extensions are non-zero
|
|
(ie: when PACKMR > MR or PACKNR > NR, whichever is applicable). This
|
|
problem has been fixed by passing ldp into the pack ukernels, which
|
|
now walk through the packed micro-panel region by incrementing by this
|
|
value, rather than incrementing by the inherent panel dimension value
|
|
assumed by each packm ukernel (e.g. 4 in the case of packm_ref_4xk).
|
|
- Also fixed a very minor edge case inefficiency whereby pack ukernels
|
|
smaller than the default were not being used in edge cases, and instead
|
|
those situations were being handled by scal2m. This is related to the
|
|
issue above, because the pack ukernel itself was being chosen based on
|
|
ldp instead of the panel dimension.
|
|
|
|
commit b7da57b282c5a5e2208946e60309d2352f55351d
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Feb 11 10:28:23 2014 -0600
|
|
|
|
Updated calls to packm_blk_var2() in testsuite.
|
|
|
|
Details:
|
|
- In ukernel testsuite modules, replaced calls to packm_blk_var2() with
|
|
_var1(). Meant to include this in previous commit.
|
|
|
|
commit c255a293e25b2223c88e8800267cd06ad2a90041
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Feb 10 14:31:24 2014 -0600
|
|
|
|
Consolidated packm_blk_var2 and var3.
|
|
|
|
Details:
|
|
- Consolidated the functionality previously supported by packm_blk_var2()
|
|
and packm_blk_var3() into a new variant, packm_blk_var1().
|
|
- Updates to packm_gen_cxk(), packm_herm_cxk.c(), and packm_tri_cxk()
|
|
to accommodate above changes.
|
|
- Removed packm_blk_var3() and retired packm_blk_var2() to
|
|
frame/1m/packm/old.
|
|
- Updated all level-3 _cntl_init() functions so that the new, more
|
|
versatile packm_blk_var1 is used for all level-3 matrix packing.
|
|
|
|
commit 32d8f264ae7b28155f5d7b21dcc5ecb78da2e0ab
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Sun Feb 9 10:07:37 2014 -0600
|
|
|
|
Refactored packm variants.
|
|
|
|
Details:
|
|
- Revised packm_blk_var2() and _var3() by encapsulating the general,
|
|
hermitian/symmetric, and triangular panel-packing subproblems into
|
|
separate functions: packm_gen_cxk(), packm_herm_cxk(), and
|
|
packm_tri_cxk(), respectively. Also, homogenized the packm code as
|
|
well as the new specialized packm_*_cxk() code to further improve
|
|
readability.
|
|
|
|
commit 6c8067028707947fcdf4f856a272e15bb9ed91e3
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Feb 7 11:27:15 2014 -0600
|
|
|
|
Renamed enumerated type in testsuite and modules.
|
|
|
|
Details:
|
|
- Renamed the test suite's "mt_impl_t" enumerated type to "iface_t", and
|
|
renamed all corresponding "impl" variables to "iface".
|
|
|
|
commit 6c12598b1bc567f0b08f58aebdc753a1c1390378
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Feb 6 18:26:35 2014 -0600
|
|
|
|
Employ simpler INSERT_ macro for ref ukernels.
|
|
|
|
Details:
|
|
- Defined a new macro, INSERT_GENTFUNC_BASIC0, which takes only one
|
|
argument--the base name of the function--and employed this macro
|
|
in the reference micro-kernel files instead of the _BASIC macro,
|
|
which takes one auxiliary argument. That argument was not being
|
|
used and probably just acted to unnecessarily obfuscate.
|
|
|
|
commit 32cae66326b68706d0e695cfd60c9ca5bc32c534
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Feb 6 18:06:42 2014 -0600
|
|
|
|
Fixed some instances of sloppy 'restrict' usage.
|
|
|
|
Details:
|
|
- Fixed some technical incorrectness with some usage of the 'restrict'
|
|
keyword in the reference trsm micro-kernels.
|
|
- Tweak to testsuite/Makefile that causes rebuild if libblis was
|
|
touched.
|
|
|
|
commit 7aceef7683e2a2aff3c7ec2a73508036af2e19e2
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Feb 6 17:31:19 2014 -0600
|
|
|
|
Updated comments in macro-kernels.
|
|
|
|
Details:
|
|
- Updated (and fixed some errors in) the "Assumptions/assertions" comment
|
|
section of macro-kernels.
|
|
- Changed register blocksizes of reference configuration to MR = 8 and
|
|
NR = 4. It's always good for MR != NR in the reference configuration
|
|
since it may help uncover bugs related to non-square micro-kernels.
|
|
|
|
commit 8fd292aa78950bcdf556605718f09d13f9575abc
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Feb 6 14:32:21 2014 -0600
|
|
|
|
Pass panel dimensions into macro-kernels.
|
|
|
|
Details:
|
|
- Modified the interfaces to the datatype-specific macro-kernels so that:
|
|
- pd_a and pd_b are passed in (which contain the panel dimensions of
|
|
packed panels of a and b).
|
|
- rs_a and cs_b are no longer passed in (they were guaranteed to be 1).
|
|
- Modified implementations of datatype-specific macro-kernels so pd_a,
|
|
pd_b, cs_a, and rs_b are used instead of cpp macros for MR, NR, PACKMR,
|
|
and PACKNR, respectively.
|
|
- Declare temporary c matrices (ct) as being maxmr-by-maxnr, which for now
|
|
is equivalent to being mr-by-nr. maxmr and maxnr are declared in a new
|
|
header file bli_kernel_post_macro_defs.h.
|
|
|
|
commit 3404e6657eabb017cd1580a2f1dd8e6fb13df923
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed Feb 5 11:19:10 2014 -0600
|
|
|
|
Deprecated incremental blocksize macro const defs.
|
|
|
|
Details:
|
|
- Removed macro constant definitions related to incremental blocksizes
|
|
from all configurations' bli_kernel.h files. This change is minor and
|
|
is mostly a cleanup related to a previous commit.
|
|
|
|
commit 1e9afd39a63e0a58167d4439c1a0a880a4a35657
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Feb 4 20:15:19 2014 -0600
|
|
|
|
Comment updates (removed vestiges of "bd").
|
|
|
|
commit 5cf58f7c2d5bc0d2d94d9576f7158d8f133b7aac
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Feb 4 09:15:19 2014 -0600
|
|
|
|
Added early returns for "object is zeros" case.
|
|
|
|
Details:
|
|
- Added some logic to packm_init(), pack_int() and gemm_int() so that
|
|
(a) objects marked as BLIS_ZEROS are not packed, and (b) those
|
|
objects are not computed with. This functionality is not currently
|
|
needed by any existing implementations, but may be used in the
|
|
future.
|
|
|
|
commit 6bbd4be769a9b344a55abe5ddaca1a99fd29f7b4
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Feb 3 13:15:25 2014 -0600
|
|
|
|
Added 'f' on some gemm and trmm blocked variants.
|
|
|
|
Details:
|
|
- Added 'f' to some block variant files/functions to be consistent with
|
|
other file/functions' naming convention. Here, the f indicates
|
|
partitioning in the "forward" direction.
|
|
|
|
commit eb13cb2c6b182df5e2a9b88c76f50e2cee25b9e0
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Feb 3 11:07:01 2014 -0600
|
|
|
|
Removed redundant non-gemm blksz_t creation.
|
|
|
|
Details:
|
|
- Removed code that creates duplicate blksz_t objects for herk, trmm,
|
|
and trsm. Instead, the gemm blksz_t objects are accessed via extern
|
|
and used directly. This reduces the amount of code associated with
|
|
each of the three _cntl_init() and _cntl_finalize() function.
|
|
|
|
commit 0a023a7d9e58e53b8c204a5f49aa8ca9afeba938
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed Jan 29 14:02:08 2014 -0600
|
|
|
|
Introduced new level-3 front-end layer.
|
|
|
|
Details:
|
|
- Added new _front() functions for each level-3 operation. This is done
|
|
so that the choosing of the control tree (and *only* the choosing of
|
|
the control tree) happens in what was previously the "front end"
|
|
(e.g. bli_gemm()). That control tree is then passed into the _front()
|
|
function, which then performs up-front tasks such as parameter
|
|
checking.
|
|
|
|
commit 251c5d112196d37b183e554bc9d406104aed65fb
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Jan 28 19:40:29 2014 -0600
|
|
|
|
Removed redundant hemm, her2k control trees.
|
|
|
|
Details:
|
|
- Removed code that generated a control tree specifically for hemm and
|
|
symm. Instead, the gemm control tree is now configured so that it
|
|
works for gemm, hemm, or symm.
|
|
- Retired most her2k code, as it was not being used. (Currently, her2k is
|
|
implemented as two invocations of herk.) I couldn't think of many
|
|
situations where her2k variants were needed.
|
|
- Removed some older her2k code.
|
|
|
|
commit 5a36e5bf2f59d1e85d6dbce32a07d604c5e82d11
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Jan 27 11:13:00 2014 -0600
|
|
|
|
Embed func_t microkernel objects in control trees.
|
|
|
|
Details:
|
|
- Modified all control tree node definitions to include a new field of
|
|
type func_t*, which is similar to a blksz_t except that it contains
|
|
one function pointer (each typed simply as void*) for each datatype.
|
|
We use the func_t* to embed pointers to the micro-kernels to use for
|
|
the leaf-level nodes of each control tree. This change is a natural
|
|
extension of control trees and will allow more flexibility in the
|
|
future.
|
|
- Modified all macro-kernel wrappers to obtain the micro-kernel pointers
|
|
from the incomming (previously ignored) control tree node and then pass
|
|
the queried pointer into the datatype-specific macro-kernel code, which
|
|
then casts the pointer to the appropriate type (new typedefs residing
|
|
in bli_kernel_type_defs.h) and then uses the pointer to call the micro-
|
|
kernel. Thus, the micro-kernel function is no longer "hard-coded" (that
|
|
is, determined when the datatype-specific macro-kernel functions are
|
|
instantiated by the C preprocessor).
|
|
- Added macros to bli_kernel_macro_defs.h that build datatype-specific
|
|
base names if they do not exist already, and then uses those to build
|
|
datatype-specific micro-kernel function names. This will allow
|
|
developers extra flexibility if they wanted to, for example, name each
|
|
of their datatype-specific micro-kernels differently (e.g. double
|
|
real might be named bli_dgemm_opt_4x4() while double complex might be
|
|
named bli_zgemm_opt_2x2()).
|
|
- Inserted appropriate code into _cntl_init() functions that allocates
|
|
and initializes a func_t object for the corresponding micro-kernels.
|
|
The gemm ukernel func_t object is created once, in bli_gemm_cntl_init(),
|
|
and then reused via extern wherever possible.
|
|
|
|
commit 6cbd6f1c7f1915180aa28939833afde48665c5ae
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Jan 24 10:38:29 2014 -0600
|
|
|
|
Removed commented mixed domain macro-kernel code.
|
|
|
|
Details:
|
|
- Removed commented-out code from macro-kernels that was supposed to
|
|
facilitate implementing mixed domain (complex times real) matrix
|
|
multiplication. This functionality is still (probably possible),
|
|
but I'm getting tired of looking at the code every time I edit
|
|
a macro-kernel. Plus, there are probably ways of doing it at a
|
|
higher level, via control trees.
|
|
|
|
commit 29778be1119f1a884330d7f8dc424a2df4101d58
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed Jan 22 16:03:11 2014 -0600
|
|
|
|
Removed b_aux field from cntl nodes.
|
|
|
|
Details:
|
|
- Removed b_aux field from all control tree node definitions. This field
|
|
was being used in certain optimizations (incremental blocking) that were
|
|
not actually being employed within BLIS, and are probably not employed
|
|
by others.
|
|
- Updated all _cntl_obj_create() function definitions and invocations
|
|
according to above change.
|
|
- Retired bli_gemm_blk_var4.c, which was one such function that employed
|
|
incremental blocking, but which was never called by BLIS itself.
|
|
|
|
commit 06ac727a42ec9e832c7832745036702014638f99
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed Jan 15 16:44:52 2014 -0600
|
|
|
|
Updated some comments in level-3 front ends.
|
|
|
|
commit d628bf1da1560f1f5126a1ddfed8714f0a4b8da3
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed Jan 15 11:40:12 2014 -0600
|
|
|
|
Consolidated pack_t enums; retired VECTOR value.
|
|
|
|
Details:
|
|
- Changed the pack_t enumerations so that BLIS_PACKED_VECTOR no longer has
|
|
its own value, and instead simply aliases to BLIS_PACKED_UNSPEC. This
|
|
makes room in the three pack_t bits of the info field of obj_t so that
|
|
two values are now unused, and may be used for other future purposes.
|
|
- Updated sloppy terminology usage in comments in level-2 front-ends.
|
|
(Replaced "is contiguous" with more accurate "has unit stride".)
|
|
|
|
commit ddc8c1c379b4787be5954802906593d7ea144452
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Jan 13 14:55:43 2014 -0600
|
|
|
|
Suppress warning in Makefile (UNINSTALL_LIBS).
|
|
|
|
Details:
|
|
- Redirect errors to /dev/null when using 'find' to locate libraries that
|
|
would be uninstalled upon executing "make uninstall-old". Before, if the
|
|
Makefile was read before $(INSTALL_PREFIX)/lib existed, a "No such file
|
|
or directory" message was emitted. This message was harmless, but is now
|
|
suppressed in this situation.
|
|
|
|
commit f8f67d7251bffc05020e20527c100c8115fd5e55
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Jan 10 09:06:11 2014 -0600
|
|
|
|
Typecast bli_getopt() return value in testsuite.
|
|
|
|
Details:
|
|
- In the test suite driver, inserted an explicit typecast of the return
|
|
value of bli_getopt() prior parsing. The lack of typecast caused a
|
|
problem on at least one system whereby a return value of -1 was
|
|
interpreted as garbage character. Thanks to Francisco Igual for finding
|
|
and submitting this fix.
|
|
|
|
commit e7f154fe2ed3e10e2323cefe5d25c2c23ac902c4
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Jan 10 08:48:07 2014 -0600
|
|
|
|
Applied edge case fix to arm/neon microkernel.
|
|
|
|
Details:
|
|
- Applied an edge case bugfix, courtesy of Francisco Igual, to the current
|
|
double precision real gemm microkernel in kernels/arm/neon/3.
|
|
|
|
commit 89c76a8a51d070d263c13bfa5ace65769509f2b4
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Jan 9 12:08:37 2014 -0600
|
|
|
|
Allow building outside source distribution.
|
|
|
|
Details:
|
|
- Modified build system (mostly configure and top-level Makefile) so that
|
|
a user can build a BLIS library outside of the top-level directory of
|
|
the source distribution.
|
|
- Added "test" target to Makefile so that the user can run "make test",
|
|
which will compile, link, and run the testsuite binary. This works even
|
|
if the build directory is externally located, thanks to the test suite
|
|
binary's new -g and -o command-line options. Also, when creating the
|
|
test suite via the top-level Makefile, the linking is against the
|
|
local archive, in lib/<configname>, rather than at <install_prefix>/lib.
|
|
- Modified testsuite/Makefile so that it links against the library built
|
|
locally, in ../lib/<configname>.
|
|
- Added "-lm" to LDFLAGS of most configurations' make_defs.mk.
|
|
- Various other cleanups to build system.
|
|
|
|
commit 12fa82ec12cc340ab28552997d9d50f7c98691f8
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed Jan 8 16:09:26 2014 -0600
|
|
|
|
Implemented bli_getopt().
|
|
|
|
Details:
|
|
- Added bli_getopt.c and .h files to frame/base. These files implement
|
|
a custom version of getopt(), which may be used to parse command line
|
|
options passed into a program via argc/argv. I am implementing this
|
|
function myself, as opposed to using the version available via unistd.h,
|
|
for portability reasons, as the only requirements are string.h (which
|
|
is available via the standard C library).
|
|
- Modified test suite to allow the user to specify the file name (and/or
|
|
path) to the parameters and operations input files: -g may be used to
|
|
specify the general input file and -o to specify the operations input
|
|
file). If -g or -o or both are not given, default filenames are assumed
|
|
(as well as their existence in the current directory).
|
|
|
|
commit cafb58e86ea5cfb21b9eedc57ca8ebbf24252098
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Jan 6 13:28:36 2014 -0600
|
|
|
|
Updated template micro-kernels to use auxinfo_t.
|
|
|
|
Details:
|
|
- Updated template micro-kernel implementations (located in
|
|
config/template/kernels), to adhere to the new auxinfo_t interface.
|
|
Meant to include this change in a0331fb1.
|
|
- Changed template configuration to use 64-bit integers (for both BLIS
|
|
and the BLAS compatibility layer).
|
|
|
|
commit 9ab126b499c3805045020cb89a8a5848e28d3bf5
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Jan 6 12:13:26 2014 -0600
|
|
|
|
Removed error checks in netlib->BLIS param mapping
|
|
|
|
Details:
|
|
- Disabled error checking in netlib-to-BLIS parameter mapping functions.
|
|
If the char value input to these functions was not one of the defined
|
|
values, bli_check_error_code() with the appropriate error code value
|
|
would be called, resulting in an abort(). This was unnecessary and
|
|
redundant since these routines are currently only used within the
|
|
BLAS compatibility layer, and they are only called AFTER parameter
|
|
checking has already been performed on the original BLAS char values.
|
|
If the application tried to override xerbla() to prevent an abort()
|
|
from being called, this error checking would still get in the way.
|
|
Thus, instead of reporting the error situation to the framework (ie:
|
|
calling abort()), an arbitrary BLIS parameter value is now chosen and
|
|
the function returns normally. Thanks to Jeff Hammond for finding and
|
|
reporting this issue.
|
|
|
|
commit 2cb13600f9f9601c60e7f96f4ca159d169ade9cb
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Jan 3 12:29:13 2014 -0600
|
|
|
|
Updated year in copyright headers to 2014.
|
|
|
|
commit 290fa54e0083c9c837188b8321b13b1b282e7b0c
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Dec 20 14:10:26 2013 -0600
|
|
|
|
Store variable panel strides in trmm/trsm auxinfo.
|
|
|
|
Details:
|
|
- Changed the value being stored into the auxinfo_t structure in trmm
|
|
and trsm macro-kernels. Whereas before we stored whatever value was
|
|
provided to the macro-kernel implementation via ps_a/ps_b, now we
|
|
store the stride that will advance to the next variable-length
|
|
micro-panel of the triangular matrix A (left) or B (right).
|
|
- Whitespace changes to the files affected above.
|
|
|
|
commit e3a6c7e77667fd749248df3f75f880266c3136ec
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Dec 19 16:29:31 2013 -0600
|
|
|
|
Macroized conditionals for a2/b2 in macro-kernels.
|
|
|
|
Details:
|
|
- Replaced conditional expressions in macro-kernels related to computing
|
|
the addresses a2 and b2 (a_next and b_next) with a preprocessor macro
|
|
invocation, bli_is_last_iter(), that tests the same condition.
|
|
- Updated gemm_ukr module to use auxinfo_t argument.
|
|
- Whitespace changes in test suite ukr modules.
|
|
|
|
commit a0331fb10a50393e31d16339053b75b944132da1
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Dec 19 14:50:11 2013 -0600
|
|
|
|
Introduced auxinfo_t argument to micro-kernels.
|
|
|
|
Details:
|
|
- Removed a_next and b_next arguments to micro-kernels and replaced them
|
|
with a pointer to a new datatype, auxinfo_t, which is simply a struct
|
|
that holds a_next and b_next. The struct may hold other auxiliary
|
|
information that may be useful to a micro-kernel, such as micro-panel
|
|
stride. Micro-kernels may access struct fields via accessor macros
|
|
defined in bli_auxinfo_macro_defs.h.
|
|
- Updated all instances of micro-kernel definitions, micro-kernel calls,
|
|
as well as macro-kernels (for declaring and initializing the structs)
|
|
according to above change.
|
|
|
|
commit 392428dea4001fe4384efe29f6cde32f8abeeb35
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Dec 12 19:01:47 2013 -0600
|
|
|
|
Added "ri" scalar macros.
|
|
|
|
Details:
|
|
- Added set of basic scalar macros that take arguments' real and
|
|
imaginary components separately, named like the previous set except
|
|
with the "ris" (instead of "s") suffix.
|
|
- Redefined the previous set of scalar macros (those that take arguments
|
|
"whole") in terms of the new "ri" set.
|
|
- Renamed setris and getris macros to sets and gets.
|
|
- Renamed setimag0 macros to seti0s.
|
|
- Use bli_?1 macro instead of a local constant in bla_trmv.c, bla_trsv.c.
|
|
|
|
commit f60c8adc2f61eaba06b892f4e73000159de93056
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Dec 10 14:39:56 2013 -0600
|
|
|
|
Minor updates to dunnington configuration.
|
|
|
|
Details:
|
|
- Added commented alternatives to dunnington configuration's bli_kernel.h.
|
|
- Minor reformatting of optimization flag variables in make_defs.mk.
|
|
|
|
commit 4ef20150492db254b5baf2368add62e19b0ac11b
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Dec 9 18:53:03 2013 -0600
|
|
|
|
Tweaks to dunnington configuration (x86_64/core2).
|
|
|
|
Details:
|
|
- Updated BLIS_DEFAULT_KC_D from 256 to 384.
|
|
- Enabled cache blocksize extension of up to 25% for MC and KC (for
|
|
double-precision real).
|
|
|
|
commit 5ad2ce7bf5ba3ea955e6d517bfd270e02820263b
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Dec 9 18:30:49 2013 -0600
|
|
|
|
Minor x86_64 (core2) kernel fixes.
|
|
|
|
Details:
|
|
- Fixed copy-and-paste bug whereby [scz]gemmtrsm_u_opt_d4x4 kernels
|
|
for x86_64/core2 were calling the wrong reference code (l instead
|
|
of u).
|
|
- Fixed some unused variables in x86_64/core2 dotaxpyv and dotxaxpyf
|
|
kernels.
|
|
- Minor typecasting fix in testsuite/src/test_libblis.c.
|
|
- Makefile updates.
|
|
|
|
commit d289f5d3a9c0e1a68a17c1c32b736e282a289c4c
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Dec 5 10:56:13 2013 -0600
|
|
|
|
Whitespace changes to level-2 blocked variants.
|
|
|
|
Details:
|
|
- Joined some lines in level-2 blocked variants to match formatting used
|
|
in level-3 blocked variants.
|
|
- Streamlined implementation of bli_obj_equals() in bli_query.c.
|
|
|
|
commit b444489f100d218bc8ef29b01ff8489c358559f9
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Dec 3 16:08:30 2013 -0600
|
|
|
|
Added new "attached" scalar representation.
|
|
|
|
Details:
|
|
- Added infrastructure to support a new scalar representation, whereby
|
|
every object contains an internal scalar that defaults to 1.0. This
|
|
facilitates passing scalars around without having to house them in
|
|
separate objects. These "attached" scalars are stored in the internal
|
|
atom_t field of the obj_t struct, and are always stored to be the same
|
|
datatype as the object to which they are attached. Level-3 variants no
|
|
longer take scalar arguments, however, level-3 internal back-ends stll
|
|
do; this is so that the calling function can perform subproblems such
|
|
as C := C - alpha * A * B on-the-fly without needing to change either
|
|
of the scalars attached to A or B.
|
|
- Removed scalar argument from packm_int().
|
|
- Observe and apply attached scalars in scalm_int(), and removed scalar
|
|
from interface of scalm_unb_var1().
|
|
- Renamed the following functions (and corresponding invocations):
|
|
|
|
bli_obj_init_scalar_copy_of()
|
|
-> bli_obj_scalar_init_detached_copy_of()
|
|
bli_obj_init_scalar() -> bli_obj_scalar_init_detached()
|
|
bli_obj_create_scalar_with_attached_buffer()
|
|
-> bli_obj_create_1x1_with_attached_buffer()
|
|
bli_obj_scalar_equals() -> bli_obj_equals()
|
|
|
|
- Defined new functions:
|
|
|
|
bli_obj_scalar_detach()
|
|
bli_obj_scalar_attach()
|
|
bli_obj_scalar_apply_scalar()
|
|
bli_obj_scalar_reset()
|
|
bli_obj_scalar_has_nonzero_imag()
|
|
bli_obj_scalar_equals()
|
|
|
|
- Placed all bli_obj_scalar_* functions in a new file, bli_obj_scalar.c.
|
|
- Renamed the following macros:
|
|
|
|
bli_obj_scalar_buffer() -> bli_obj_buffer_for_1x1()
|
|
bli_obj_is_scalar() -> bli_obj_is_1x1()
|
|
|
|
- Defined new macros to set and copy internal scalars between objects:
|
|
|
|
bli_obj_set_internal_scalar()
|
|
bli_obj_copy_internal_scalar()
|
|
|
|
- In level-3 internal back-ends, added conditional blocks where alpha and
|
|
beta are checked for non-unit-ness. Those values for alpha and beta are
|
|
applied to the scalars attached to aliases of A/B/C, as appropriate,
|
|
before being passed into the variant specified by the control tree.
|
|
- In level-3 blocked variants, pass BLIS_ONE into subproblems instead of
|
|
alpha and/or beta.
|
|
- In level-3 macro-kernels, changed how scalars are obtained. Now, scalars
|
|
attached to A and B are multiplied together to obtain alpha, while beta
|
|
is obtained directly from C.
|
|
- In level-3 front-ends, removed old function calls meant to provide
|
|
future support for mixed domain/precision. These can be added back later
|
|
once that functionality is given proper treatment. Also, removed the
|
|
creating of copy-casts of alpha and beta since typecasting of scalars
|
|
is now implicitly handled in the internal back-ends when alpha and
|
|
beta are applied to the attached scalars.
|
|
|
|
commit 992de486d6f23e69a623abd15ae77d7881d13871
|
|
Merge: 9552e6e fd4ac63
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Dec 2 13:58:46 2013 -0600
|
|
|
|
Unimplemented kernels now call reference.
|
|
|
|
Details:
|
|
- Updated arm, bgq, loongson3a, and x86_64 kernels so that unimplemented
|
|
datatypes call the corresponding reference kernel. Previously, these
|
|
kernel functions called abort() with a "not yet implemented" error
|
|
message.
|
|
|
|
commit fd4ac636d9a55cec1476a444bd4e70def219dc8f
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Dec 2 13:50:36 2013 -0600
|
|
|
|
Unimplemented kernels now call reference.
|
|
|
|
Details:
|
|
- Updated micro-kernels for arm, bgq, loongson3a, and x86_64 so that
|
|
unimplemented kernel functions simply call the corresponding reference
|
|
implementation. (Previously, these unimplemented functions would
|
|
abort() with a "not yet implemented" message.)
|
|
|
|
commit 9552e6ee824d4345d5e908e869e071d19829819a
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Sun Nov 24 11:40:31 2013 -0600
|
|
|
|
Removed optional scaling from packm control tree.
|
|
|
|
Details:
|
|
- Removed does_scale field from packm control tree node and
|
|
bli_packm_cntl_obj_create() interface. Adjusted all invocations of
|
|
_cntl_obj_create() accordingly.
|
|
- Redefined/renamted macros that are used in aliasing so that now,
|
|
bli_obj_alias_to() does a full alias (shallow copy) while
|
|
bli_obj_alias_for_packing() does a partial alias that preserves the
|
|
pack_mem-related fields of the aliasing (destination) object.
|
|
- Removed bli_trmm3_cntl.c, .h after realizing that the trmm control tree
|
|
will work just fine for bli_trmm3().
|
|
- Removed some commented vestiges of the typecasting functionality needed
|
|
to support heterogeneous datatypes.
|
|
|
|
commit e65c476284db9ef64b23191a21c2584b1083342f
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Nov 19 10:05:35 2013 -0600
|
|
|
|
Minor updates to packm_blk_var2.c and _blk_var3.c.
|
|
|
|
Details:
|
|
- Comment updates to packm_blk_var2.c and packm_blk_var3.c.
|
|
- In packm_blk_var2(), call setm_unb_var1(), scal2m_unb_var1() directly
|
|
instead of setm(), scal2m().
|
|
|
|
commit 9e1d0d4bca48eda54301d8976f203e2544c9df3a
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Nov 18 18:11:07 2013 -0600
|
|
|
|
Added trsm_l, trsm_u ukernels for x86_64/core2.
|
|
|
|
Details:
|
|
- Added standalone trsm_l/trsm_u micro-kernels for x86_64 (core2).
|
|
These kernels are based on the gemmtrsm_l/gemmtrsm_u micro-kernels
|
|
that already existed in kernels/x86_64/core2-sse3/3.
|
|
|
|
commit 85e7e02ea3a9190b6fcff5d46b00d41c79cb1242
|
|
Merge: 67761e2 7072005
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Nov 18 12:02:00 2013 -0600
|
|
|
|
Merge branch 'master'. Forgot to git-pull.
|
|
|
|
commit 67761e224c92500eecf9c1540cc72bdd2fb27679
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Nov 18 11:57:40 2013 -0600
|
|
|
|
Attempting to fix errors in bgq build.
|
|
|
|
Details:
|
|
- Removed restrict declaration from b_cast and c_cast from
|
|
bli_trsm_lu_ker_var2.c and bli_trsm_rl_ker_var2.c. Curiously, they
|
|
are causing problems for xlc only in those two files and no other
|
|
macro-kernels.
|
|
- Fixed (hopefully) kernel function parameter type declarations in
|
|
kernels/bgq/1f/bli_axpyf_opt_var1.c and kernels/bgq/3/bli_gemm_8x8.c.
|
|
|
|
commit 707200541d344f98cf34c9801954dbb36fbe0447
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Nov 18 11:17:31 2013 -0600
|
|
|
|
Syntax error fix in x86_64/core2 gemmtrsm_u ukr.
|
|
|
|
commit bbe2b84a49e7785d4d0c514cda34adfbe66478b0
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Nov 18 11:11:06 2013 -0600
|
|
|
|
Updated Makefile in test, testsuite.
|
|
|
|
Details:
|
|
- Updated Makefiles in test and testsuite directories to use the new
|
|
BLIS header installation directory scheme, which is to compile with
|
|
-I<PREFIX>/include/blis instead of -I<PREFIX>/include.
|
|
|
|
commit 9bd7fcfd436625ca2108128086671319362f4d92
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Nov 18 10:58:09 2013 -0600
|
|
|
|
Outer-to-inner 'restrict' fix in macro-kernels.
|
|
|
|
Details:
|
|
- Fixed sloppy placement of 'restrict' pointer declarations in level-3
|
|
macro-kernels. Previously, all restricted pointers were being declared
|
|
at the outer-most function scope level. While this violates the C99
|
|
standard, very few of the compilers used with BLIS so far have seemed
|
|
to care. The lone exception has been IBM's xlc. Thanks to Tyler Smith
|
|
for identifying this bug (and suggesting the fix).
|
|
|
|
commit 50549a6a31dd26cf63a013e0ede16b2c7ce835b6
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Sun Nov 17 18:31:27 2013 -0600
|
|
|
|
Changed header install directory to include/blis.
|
|
|
|
Details:
|
|
- Changed top-level Makefile so that headers are installed to
|
|
$(INSTALL_PREFIX)/include/blis/. (Header directories are no longer
|
|
named by version/configuration and then symlinked.)
|
|
- Added uninstall targets, including uninstall-old to clean out old
|
|
library archives.
|
|
- Added GREP makefile definitions to all configurations' make_defs.mk.
|
|
|
|
commit d70733abddfb9a95661897e1e4f3c1f3cfa7cbaa
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Sat Nov 16 17:34:25 2013 -0600
|
|
|
|
Added ARM kernels, configurations.
|
|
|
|
Details:
|
|
- Added kernels for ARM, and configurations for Cortex-A9 and Cortex-A15.
|
|
Thanks to Francisco Igual for contributing these kernels and
|
|
configurations.
|
|
|
|
commit d37c2cff62089c86983c2f79762f4b5329037373
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed Nov 13 10:47:11 2013 -0600
|
|
|
|
Minor comment and Makefile changes.
|
|
|
|
Details:
|
|
- Added missing 'check-config' and 'check-make-defs' targets to
|
|
testsuite/Makefile.
|
|
- Removed unused 'test' target from top-level Makefile.
|
|
- Comment changes to testsuite input files.
|
|
|
|
commit 19885f893a17b91ee79bead0620d0f913392d4c5
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Nov 11 12:09:21 2013 -0600
|
|
|
|
Updated some kernel comment headers.
|
|
|
|
Details:
|
|
- Updated bgq and piledriver comment headers to use BLIS copyright header
|
|
instead of libflame.
|
|
|
|
commit 1a4d698f42981d74fe5f29b980031e1ee7dc42d5
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Nov 11 10:15:40 2013 -0600
|
|
|
|
CHANGELOG update (for 0.1.0).
|
|
|
|
commit 089048d5895a30221b6b1976c9be93ad6443420d (tag: 0.1.0)
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Sat Nov 9 17:18:00 2013 -0600
|
|
|
|
Added object wrappers to 1f test suite modules.
|
|
|
|
Details:
|
|
- Added missing object wrappers to level-1f test suite modules. This was
|
|
only apparent if you were configuring with something other than the
|
|
reference configuration.
|
|
- Commented out object-wrappers in level-1f front-ends. These were not
|
|
working as intended the reference configuration was selected, because
|
|
most kernel sets, such as those in the template set, do not have object
|
|
wrappers.
|
|
- Whitespace changes to template micro-kernels.
|
|
- Comment changes to template level-1f kernel headers.
|
|
|
|
commit 9ef3752079de10124bed906b5d28479d04aa8187
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Nov 8 17:20:47 2013 -0600
|
|
|
|
Updated template kernels wrt KernelsHowTo wiki.
|
|
|
|
Details:
|
|
- Merged latest state of KernelsHowTo wiki into template micro-kernels
|
|
located in config/template/kernels/3.
|
|
|
|
commit 376bbb59c8944e29c5c1ff6637920d8451370afa
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Nov 8 11:17:34 2013 -0600
|
|
|
|
Removed support for duplication.
|
|
|
|
Details:
|
|
- Removed support for duplication from the gemmtrsm/trsm micro-kernels
|
|
and all framework code.
|
|
- Updated test suite modules according to above changes.
|
|
|
|
commit 68a5910974b62b4df853fae2a68cb04df9d5a19c
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Nov 7 11:36:11 2013 -0600
|
|
|
|
Added comments to testsuite/input.operations.
|
|
|
|
Details:
|
|
- Added extensive comments to the top of testsuite/input.operations,
|
|
which describe how to edit the file.
|
|
- Removed input.operations.0 and input.operations.1.
|
|
- Changed input.general to test all datatypes ("sdcz") by default.
|
|
|
|
commit a98f78b715fb256a519870071bb5266130d70b21
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed Nov 6 15:32:47 2013 -0600
|
|
|
|
Changed dim_t and inc_t to be signed integers.
|
|
|
|
Details:
|
|
- Redefined dim_t and inc_t in terms of gint_t (instead of guint_t).
|
|
This will facilitate interoperability with Fortran in the future.
|
|
(Fortran does not support unsigned integers.)
|
|
- Redefined many instances of stride-related macros so that they return
|
|
or use the absolute value of the strides, rather than the raw strides
|
|
which may now be signed. Added new macros bli_is_row_stored_f() and
|
|
bli_is_col_stored_f(), which assume positive (forward-oriented) strides,
|
|
and changed the packm_blk_var[23] variants to use these macros instead
|
|
of the existing bli_is_row_stored(), bli_is_col_stored().
|
|
- Added/adjusted typecasting to to various functions/macros, including
|
|
bli_obj_alloc_buffer(), bli_obj_buffer_at_off(), and various pointer-
|
|
related macros in bli_param_macro_defs.h.
|
|
- Redefined bli_convert_blas_incv() macro so that the BLAS compatibility
|
|
layer properly handles situations where vector increments are negative.
|
|
Thanks to Vladimir Sukharev for pointing out this issue.
|
|
- Changed type of increment parameters in bli_adjust_strides() from dim_t
|
|
to inc_t. Likewise in bli_check_matrix_strides().
|
|
- Defined bli_check_matrix_object(), which checks for negative strides.
|
|
- Redefined bli_check_scalar_object() and bli_check_vector_object() so
|
|
that they also check for negative stride.
|
|
- Added instances of bli_check_matrix_object() to various operations'
|
|
_check routines.
|
|
|
|
commit 1f8afc3e08a4312cfe810be86aedeacbc57275c5
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed Nov 6 10:09:10 2013 -0600
|
|
|
|
Minor comment update to BLAS compat files.
|
|
|
|
commit 1abbf768afafc158d44e4d5c4a135cfd9e277f13
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Nov 4 15:50:00 2013 -0600
|
|
|
|
Fixed bugs in scalv and setv.
|
|
|
|
Details:
|
|
- Fixed bugs similar to those addressed in cca1e1f51dc6, whereby
|
|
a segmentation fault may occur if beta is not the same type as
|
|
the vector operand for scalv and setv.
|
|
- Changed axpyv and scal2v front-ends in a similar fashion.
|
|
|
|
commit f5953259a1842ee48e5833c22ac86e68a337bfe1
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Nov 4 14:43:55 2013 -0600
|
|
|
|
Fixed a bug related to Hermitian matrix diagonals.
|
|
|
|
Details:
|
|
- Fixed a bug whereby BLIS assumed that the imaginary components of the
|
|
diagonal elements of Hermitian matrices were already zero. This property
|
|
is now enforced when the matrix is packed (bli_packm_blk_var2). Thanks
|
|
to Vladimir Sukharev for reporting this bug.
|
|
- Minor comment updates to template kernels.
|
|
|
|
commit d70f2b089dac8b9e4c19295dfa6014c36afee2ec
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Sat Nov 2 17:19:40 2013 -0500
|
|
|
|
Added scaling to abval2s, sqrt2s macros.
|
|
|
|
Details:
|
|
- Re-defined abval2s and sqrt2s macros to use scaling to avoid underflow
|
|
and overflow from squaring the real and imaginary components. (This is
|
|
the same technique used to fix recent bugs in invscals/invscaljs and
|
|
inverts.)
|
|
|
|
commit c5b1ed9409ae2f71d04041eef5da9a0080b5784a
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Nov 1 10:28:04 2013 -0500
|
|
|
|
Added new dotxaxpyf variant 2.
|
|
|
|
Details:
|
|
- Added a new variant for dotxaxpyf that is based on dotxf and axpyf
|
|
kernels. By default, this variant is not used by any other operation.
|
|
|
|
commit 97f89fbcf202d72fc440b614708e352ea31633e2
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Nov 1 10:16:39 2013 -0500
|
|
|
|
Fixed bug in complex invscals.
|
|
|
|
Details:
|
|
- Fixed complex inversion in invscals and invscaljs whereby the
|
|
imaginary component was being computed incorrectly.
|
|
- Use bli_fmaxabs() instead of bli_fabs() when choosing the scalar
|
|
in inverts, invscals, and invscaljs.
|
|
- Changed bli_abs() and bli_fabs() macro definitions to use "<="
|
|
operator instead of "<".
|
|
|
|
commit eda42a21d17a2742eab69ab801ed530b82488c8a
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Oct 31 18:00:44 2013 -0500
|
|
|
|
Defined missing symbols in bla_rotg.c
|
|
|
|
Details:
|
|
- Defined local equivalents of libf2c's r_sign(), d_sign(), c_abs(), and
|
|
z_abs(), which are needed by bla_rotg.c. Also defined r_abs() and
|
|
d_abs() for completeness. Thanks to Vladimir Sukharev for reporting
|
|
these bugs.
|
|
|
|
commit cca1e1f51dc67a2c3725d5c1837256831aaf70f8
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed Oct 30 14:39:01 2013 -0500
|
|
|
|
Fixed bugs in scalm and setm.
|
|
|
|
Details:
|
|
- Fixed bugs in scalm and setm that resulted in segmentation faults when
|
|
beta is not the same type as the matrix operand. Thanks to Vladimir
|
|
Sukharev for reporting this bug.
|
|
- Changed axpym and scal2m front-ends in fashion similar to that of scalm
|
|
and setm; namely, the alpha scalar is copy-cast the type of the first
|
|
matrix operand.
|
|
- Changed the template and reference configurations' bli_config.h files
|
|
so that the number of memory allocator blocks of A and B are set based
|
|
on BLIS_MAX_NUM_THREADS.
|
|
- Comment updates to bli_obj.c and variable rename in bla_nrm2.c.
|
|
|
|
commit 2807013a4761c2b84b3944de64d23483ad7ef2fb
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Oct 24 14:32:20 2013 -0500
|
|
|
|
Fixed over/under-flow in complex inversion.
|
|
|
|
Details:
|
|
- Fixed the complex bli_?inverts() macros, which were inverting elements
|
|
in an "unsafe" manner, such that very large and very small values were
|
|
unnecessarily over/under-flowing. Thanks for Vladimir Sukharev for
|
|
reporting this bug.
|
|
- Comment update to bli_sumsqv_unb_var1.c.
|
|
- Removed redundant bli_min() macro in bli_scalar_macro_defs.h.
|
|
- Changed 1.0F to 1.0 for bli_drands() macro.
|
|
|
|
commit 45a80c625f84edb2ade6ac25efe2b9c589d7e0df
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed Oct 23 12:15:25 2013 -0500
|
|
|
|
Fixed parameter checking issue in BLAS syr[2]k.
|
|
|
|
Details:
|
|
- Fixed a minor parameter checking bug in the BLAS compatibility layer
|
|
for [sd]syrk and [sd]syr2k. Specifically, if 'C' is passed in for the
|
|
trans parameter of either operation, it is (a) allowed, and (b) treated
|
|
as 'T' (whereas previously it was disallowed). Thanks for Vladimir
|
|
Sukharev for finding and reporting this bug.
|
|
|
|
commit a091a219bda55e56817acd4930c2aa4472e53ba5
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Oct 14 10:11:29 2013 -0500
|
|
|
|
Minor fixes to piledriver configuration, ukernel.
|
|
|
|
Details:
|
|
- Applied a patch from Tyler that fixes minor staleness in the piledriver
|
|
configuration and gemm micro-kernel.
|
|
- Very minor changes to test suite input files.
|
|
|
|
commit dacdde27aee4fb90b14880136d7f20c6b234e2c6
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Oct 11 11:37:19 2013 -0500
|
|
|
|
Added Fran's Sandy Bridge kernels/configuration.
|
|
|
|
Details:
|
|
- Added a kernel directory for kernels developed by Francisco Igual for
|
|
the Sandy Bridge architecture, including a dgemm ukernel coded with
|
|
AVX intrinsics.
|
|
- Added a configuration for Sandy Bridge using values supplied by Fran.
|
|
|
|
commit 03106d650e4030d4c9831683448376f92fc52d41
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Oct 11 10:40:38 2013 -0500
|
|
|
|
Fixed minor perf bug in gemm_ker_var2.
|
|
|
|
Details:
|
|
- Fixed a minor performance bug in bli_gemm_ker_var2.c (and the experimental
|
|
bli_gemm_ker_var5.c) whereby the addresses for a_next and b_next are not
|
|
computed correctly (ie: do not wraparound) at the edge cases. Thanks to
|
|
Tze Meng for helping me identify this bug.
|
|
|
|
commit b053337387dbdef9035be03538222670a21707ca
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Oct 10 18:26:55 2013 -0500
|
|
|
|
Added fusing factors, MR/NR to test suite output.
|
|
|
|
Details:
|
|
- Updated the test suite driver (and modules where appropriate) so that
|
|
the level-1f fusing factors are output along with the variable dimension.
|
|
While this is not strictly necessary, since the fusing factors are output
|
|
in the initial parameter summary, it allows extra reassurance to the user
|
|
since the fusing factors appear alongside the variable dimension, which
|
|
together give a complete picture of the problem size. Similar changes were
|
|
made for outputting the register blocksizes when reporting results for the
|
|
micro-kernel test modules.
|
|
|
|
commit be4833bd91c5a58d0bfc52daaadf7ba543a77acf
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Oct 10 14:20:06 2013 -0500
|
|
|
|
Added test suite modules for level-1f, 3 kernels.
|
|
|
|
Details:
|
|
- Added test modules in test suite for level-1f kernels and level-3
|
|
micro-kernels. (Duplication in the micro-kernels, for now, is NOT
|
|
supported by these test modules.)
|
|
- Added section override switches to test suite's input.operations file.
|
|
- Added obj_t APIs for level-1f front-ends and their unblocked variants to
|
|
facilitate the level-1f test modules. Also added front-end for dupl
|
|
operation.
|
|
- Added obj_t-based check routines for level-1f operations, which are
|
|
called from the new front-ends mentioned above.
|
|
- Added query routines for axpyf, dotxf, and dotxaxpyf that return fusing
|
|
factors as a function of datatype, which is needed by their respective
|
|
test modules.
|
|
- Whitespace changes to bli_kernel.h of all existing configurations.
|
|
|
|
commit 680188d46bb15b9a1a2867638104939dc77ca2a1
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Oct 10 13:23:37 2013 -0500
|
|
|
|
Cleaned up old test drivers.
|
|
|
|
Details:
|
|
- Minor updates to old test drivers in preparation for our participation
|
|
in ACM TOMS's replicated results initiative.
|
|
|
|
commit 3690bdd4f95769c935c410414112102cc3e108b1
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Oct 10 11:45:33 2013 -0500
|
|
|
|
More updates to level-1f kernels for core2-sse3.
|
|
|
|
Details:
|
|
- Changed types in function signatures to match new prototypes. Meant to
|
|
include this in previous commit.
|
|
|
|
commit 661d5120cd7071f9b0c5cefc95f99f1361370ade
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Oct 10 11:27:27 2013 -0500
|
|
|
|
Fixed outdated fusing factor macros in 1f kernels.
|
|
|
|
Details:
|
|
- Updated level-1f kernels for x86_64 and bgq to use renamed fusing factor
|
|
macros. Meant to include this in 5e54f46c. Thanks to Fran for pointing
|
|
this out.
|
|
|
|
commit 73aa1e9f31d1b2a319c7e711ced6db3f9835c832
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Oct 1 17:01:18 2013 -0500
|
|
|
|
Added section overrides to test suite.
|
|
|
|
Details:
|
|
- Added new lines of input to the test suite's input.operations file, which
|
|
allows the user to disable entire sections (levels) of tests. Before this
|
|
change, the user had to manually disable each operation tests's "master
|
|
switch". (This is why input.operations.0 existed: to allow a more
|
|
convenient starting point for someone who only wanted to test one or a
|
|
few operations.)
|
|
|
|
commit 5e54f46ccb76beab892d530b693e07c6bf6db7cf
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Sep 30 12:58:18 2013 -0500
|
|
|
|
Added template implementations and other tweaks.
|
|
|
|
Details:
|
|
- Added a 'template' configuration, which contains stub implementations of the
|
|
level 1, 1f, and 3 kernels with one datatype implemented in C for each, with
|
|
lots of in-file comments and documentation.
|
|
- Modified some variable/parameter names for some 1/1f operations. (e.g.
|
|
renaming vector length parameter from m to n.)
|
|
- Moved level-1f fusing factors from axpyf, dotxf, and dotxaxpyf header files
|
|
to bli_kernel.h.
|
|
- Modifed test suite to print out fusing factors for axpyf, dotxf, and
|
|
dotxaxpyf, as well as the default fusing factor (which are all equal
|
|
in the reference and template implementations).
|
|
- Cleaned up some sloppiness in the level-1f unb_var1.c files whereby these
|
|
reference variants were implemented in terms of front-end routines rather
|
|
that directly in terms of the kernels. (For example, axpy2v was implemented
|
|
as two calls to axpyv rather than two calls to AXPYV_KERNEL.)
|
|
- Changed the interface to dotxf so that it matches that of axpyf, in that
|
|
A is assumed to be m x b_n in both cases, and for dotxf A is actually used
|
|
as A^T.
|
|
- Minor variable naming and comment changes to reference micro-kernels in
|
|
frame/3/gemm/ukernels and frame/3/trsm/ukernels.
|
|
|
|
commit 97aaf220a847363b4da35935eca17790c0ef71f6
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Sep 17 10:51:36 2013 -0500
|
|
|
|
Added new kernels, configurations.
|
|
|
|
Details:
|
|
- Added various micro-kernels for the following architectures:
|
|
Intel MIC
|
|
IBM BG/Q
|
|
IBM Power7
|
|
AMD Piledriver
|
|
Loogson 3A
|
|
and reorganized kernels directory. Thanks to Tyler Smith, Mike Kistler,
|
|
and Xianyi Zhang for contributing these kernels.
|
|
- Added configurations corresponding to above architectures, and renamed
|
|
"clarksville" configuration to "dunnington".
|
|
|
|
commit fe979c5a114c877506a5697cdab1fc8cf2bcd303
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Sep 13 14:31:53 2013 -0500
|
|
|
|
Removed default configuration behavior.
|
|
|
|
Details:
|
|
- Changed the configure script so that it no longer defaults to the
|
|
reference configuration. This change is being made so that the
|
|
developer has a firm awareness of which configuration is being used
|
|
to configure BLIS. Thanks to Mike Kistler and Bryan Marker for this
|
|
suggested change.
|
|
|
|
commit da77e9614f54f92f703f01e3b9bd67a83280150c
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Sep 13 12:00:37 2013 -0500
|
|
|
|
Minor improvements to static memory allocator.
|
|
|
|
Details:
|
|
- Expanded on cpp macro definitions from bli_mem.c and relocated them to
|
|
a new header file, frame/include/bli_mem_pool_macro_defs.h. The expanded
|
|
functionality includes computing the pool size for each datatype (using
|
|
that datatype's cache blocksizes) and using the maximum to size the
|
|
actual pool array. This addresses the somewhat common pitfall whereby a
|
|
developer updates cache blocksizes in bli_kernel.h for only one datatype
|
|
(say, single-precision real), while the memory pools are sized using the
|
|
double-precision real values. Then, when the developer attempts to link
|
|
to and run a level-3 BLIS routine (e.g. dgemm), the library aborts with
|
|
a message saying the static memory pool was exhausted. Clearly, this
|
|
message is misleading when the pool was not sized properly to begin with.
|
|
- Removed previously disabled code in bli_kernel_macro_defs.h that was
|
|
meant to check for size consistency among the various cache blocksizes.
|
|
(Obviously the memory pool size-based solution mentioned above is better.)
|
|
- Added BLIS_SIZEOF_? cpp macros to bli_type_defs.h. This seemed like a
|
|
reasonable place to put these constants, rather than further crowd up
|
|
bli_config.h.
|
|
- Updated testsuite driver to output memory pool sizes for A, B, and C.
|
|
- Minor comment updates to bli_config.h.
|
|
- Removed 'flame' configuration. It was beginning to get out-of-date, and
|
|
I hadn't used it in months. We can always re-create it later.
|
|
|
|
commit 631f347b7a99cb02757c534fd3ec5f723a2fdb0e
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Sep 10 17:17:28 2013 -0500
|
|
|
|
Added ESSL and Accelerate targets to test drivers.
|
|
|
|
Details:
|
|
- Added ESSL and Accelerate (OS X) targets to standalone test drivers'
|
|
Makefile in "test" directory. Thanks to Jeff Hammond for suggesting
|
|
/ providing this patch.
|
|
|
|
commit 7ae4d7a41d13ef5f1ceee217c000a5cf77a11128
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Sep 10 16:35:12 2013 -0500
|
|
|
|
Various changes to treatment of integers.
|
|
|
|
Details:
|
|
- Added a new cpp macro in bli_config.h, BLIS_INT_TYPE_SIZE, which can be
|
|
assigned values of 32, 64, or some other value. The former two result in
|
|
defining gint_t/guint_t in terms of 32- or 64-bit integers, while the latter
|
|
causes integers to be defined in terms of a default type (e.g. long int).
|
|
- Updated bli_config.h in reference and clarksville configurations according
|
|
to above changes.
|
|
- Updated test drivers in test and testsuite to avoid type warnings associated
|
|
with format specifiers not matching the types of their arguments to printf()
|
|
and scanf().
|
|
- Inserted missing #include "bli_system.h" into blis.h (which was slated for
|
|
inclusion in d141f9eeb6d1).
|
|
- Added explicit typecasting of dim_t and inc_t to macros in
|
|
bli_blas_macro_defs.h (which are used in BLAS compatibility layer).
|
|
- Slight changes to CREDITS and INSTALL files.
|
|
- Slight tweaks to Windows build system, mostly in the form of switching to
|
|
Windows-style CRLF newlines for certain files.
|
|
|
|
commit 068437736b41d51a1f5ec47839f059bf58a20413
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Sep 9 14:07:58 2013 -0500
|
|
|
|
Fixed set-but-not-used compiler (gcc) warnings.
|
|
|
|
Details:
|
|
- Used void-casts of certain variables to appease gcc (and perhaps other
|
|
compilers) when such variables are only used in the complex instances of
|
|
the functions. Special thanks to Karl Rupp for suggesting a portable fix
|
|
for these warnings.
|
|
|
|
commit 6dc85f63dcd5282340c9e00d585e97d70a21edc3
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Sep 9 13:48:52 2013 -0500
|
|
|
|
Small fix to Windows defs.mk makefile fragment.
|
|
|
|
Details:
|
|
- Commented out a !include statement that was attempting to include a
|
|
version file that does not yet exist. For now, the version string is
|
|
hard-coded into defs.mk.
|
|
|
|
commit d141f9eeb6d1de7044b7429adf52d11c6fca620c
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Sep 9 13:09:16 2013 -0500
|
|
|
|
Added Windows build system.
|
|
|
|
Details:
|
|
- Added a 'windows' directory, which contains a Windows build system
|
|
similar to that of libflame's. Thanks to Martin for getting this up
|
|
and running.
|
|
- Spun off system header #includes into bli_system.h, which is included
|
|
in blis.h
|
|
- Added a Windows section to bli_clock.c (similar to libflame's).
|
|
|
|
commit 9b320e7406fb69e8b61a0085abe2ed89a96bdb68
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Sep 9 11:04:46 2013 -0500
|
|
|
|
Edited bli_?lamch.c to avoid Windows keyword.
|
|
|
|
Details:
|
|
- Renamed "small" variable to "smnum" to avoid collision with Windows type
|
|
by the same name. This change is needed in advance of the upcoming Windows
|
|
build system.
|
|
|
|
commit 9013ad6ff2e9ace35e0cf44c32795c2f3d5be628
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed Sep 4 13:36:07 2013 -0500
|
|
|
|
Switched integer typedefs (again) to C types.
|
|
|
|
Details:
|
|
- Redefined gint_t and guint_t in terms of the standard C types long int
|
|
and unsigned long int, respectively.
|
|
- Changed testsuite default max problem size to 500.
|
|
- Changed testsuite input.operations to use square problems for level-3
|
|
operation tests.
|
|
|
|
commit 981a60cfa07abac2e93697dfe12b0f076ab00a38
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed Sep 4 12:09:11 2013 -0500
|
|
|
|
Falling back to 32-bit integers for dim_t, etc.
|
|
|
|
Details:
|
|
- In light of recent segfaulting issues when compiling on 32-bit systems,
|
|
I've changed the default typedef for gint_t and guint_t from int64_t and
|
|
uint64_t to int32_t and uint32_t, respectively.
|
|
- Disabled 64-bit integers in the blas2blis layer for the reference
|
|
configuration.
|
|
- Added type sizes of gint_t, guint_t, and the four floating-point datatypes
|
|
to introductory output of the testsuite.
|
|
|
|
commit b776ddcd4338b34f172ef78da0ac1d771a771ab4
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Sep 3 21:58:07 2013 -0500
|
|
|
|
Applied temp fix to typecasting bug in testsuite.
|
|
|
|
Details:
|
|
- Applied a temporary fix to the typecasting bug in the testsuite driver.
|
|
The fix involves casting both numerator and denominator to unsigned long.
|
|
This fix is more voodoo than science, as I can't be sure why it even
|
|
works.
|
|
|
|
commit 9ee6e125373869c4213c017ce772c38ecefba103
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Sep 3 21:53:27 2013 -0500
|
|
|
|
Changed dimension spec for gemm in testsuite.
|
|
|
|
Details:
|
|
- Encounted a bizarre typecasting bug whereby the test suite was not
|
|
computing the proper dimension from the problem size and dimension
|
|
specification when the latter was set to -3. Will investigate.
|
|
Thanks to Fran for finding this "bug".
|
|
|
|
commit e8be081e68c385ab44d0fea8dade21d40c200b79
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed Aug 28 15:52:34 2013 -0500
|
|
|
|
Generalized matlab and file output in testsuite.
|
|
|
|
Details:
|
|
- Added a new option in input.general that allows outputting in
|
|
matlab/octave format so that one can output in matlab format
|
|
independently from outputting to files.
|
|
- Adjusted input.operations according to above.
|
|
- Added input.operations.0 and input.operations.1 with all options
|
|
disabled and enabled, respectively.
|
|
|
|
commit d352c746e5683037d41b5061dfb5ce08e1d0843b
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Aug 27 13:41:46 2013 -0500
|
|
|
|
Added single/real gemm micro-kernel for x86_64.
|
|
|
|
Details:
|
|
- Added a single-precision real gemm micro-kernel in
|
|
kernels/x86_64/3/bli_gemm_opt_d4x4.c.
|
|
- Adjusted the single-precision real register blocksizes in
|
|
config/clarksville/bli_kernel.h to be 8x4.
|
|
- Added a missing comment to bli_packm_blk_var2.c that was present in
|
|
bli_packm_blk_var3.c
|
|
|
|
commit dedda523dc5dc779ecc34e6a03dc74cb8eb220de
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Aug 19 12:07:41 2013 -0500
|
|
|
|
Fixed bug in bli_acquire_mpart_t2b(), _l2r().
|
|
|
|
Details:
|
|
- Fixed a bug in bli_acquire_mpart_t2b() and bli_acquire_mpart_l2r()
|
|
that cause incorrect partitioning when SUBPART0 was requested. This
|
|
bug was introduced in 46d3d09d49ad. Thanks to Bryan for isolating
|
|
this bug.
|
|
- Removed dupl kernels from kernels/x86_64/3 directory.
|
|
- Uncommented beta == 0 optimizaition code in
|
|
kernels/x86_64/3/bli_gemm_opt_d4x4.c.
|
|
|
|
commit 12dbd2f33455e9384fe2070cbdd660fd4a7fceb5
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Aug 8 14:39:35 2013 -0500
|
|
|
|
Moved init_safe(), finalize_safe() to BLAS compat.
|
|
|
|
Details:
|
|
- Moved the bli_init_safe() and bli_finalize_safe() function calls from the
|
|
BLAS-like BLIS layer to the BLAS compatibility layer. Having these auto-
|
|
initializers in the BLIS layer wasn't buying us anything because the user
|
|
could still call the library with uninitialized global scalar constants,
|
|
for example. Thus, we will just have to live with the constraint that
|
|
bli_init() MUST be called before calling ANY routine with a bli_ prefix.
|
|
- Added the missing _init_safe() and finalize_safe() calls to the level-1
|
|
BLAS compatibility wrappers.
|
|
|
|
commit 8abfe55f2ae5d89df18e1b26a5a28d94b0936683
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Aug 8 13:30:19 2013 -0500
|
|
|
|
Miscellaneous updates.
|
|
|
|
Details:
|
|
- Changed the BLIS_HEAP_STRIDE_ALIGN_SIZE in the configurations from 16 to
|
|
BLIS_CACHE_LINE_SIZE (typically 64).
|
|
- Changed the use of nr in sizing of bd buffer to packnr in level-3 macro-
|
|
kernels.
|
|
- Reformulated gemm_ker_var2 to look more like the other level-3 macro-
|
|
kernels, in that the interior and edge-case handling is expressed once
|
|
inside the loops in the n and m dimensions, rather than the edge-case
|
|
handling being "unrolled" and expressed as distinct code regions. The
|
|
previous macro-kernel now lives in retired form in the subdirectory
|
|
other/bli_gemm_ker_var2.c.old.
|
|
- Updated experimental gemm_ker_var5 according to above change.
|
|
- Fixed bug in bli_her2k.c whereby incorrect transformations were being
|
|
applied to optimize the macro-kernel accesses pattern on C when C is
|
|
row-stored.
|
|
- Various updates inside of test/exec_sizes.
|
|
|
|
commit 1aa05736ff49e7cc5f121acf615460fe9a87852c
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed Aug 7 12:27:04 2013 -0500
|
|
|
|
Fixed bug in interface of bla_ger_check().
|
|
|
|
Details:
|
|
- Fixed the misplaced lda parameter in the function signature of
|
|
bla_ger_check(). Thanks to Tyler for finding this bug.
|
|
|
|
commit 685aad25353fb200de4ca97a8bc0feeebde51d0f
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Aug 6 12:25:51 2013 -0500
|
|
|
|
Fixed cpp guard typos in frame/compat/check files.
|
|
|
|
Details:
|
|
- Fixed instances of BLIS_ENABLE_BLIS2BLAS that should have been
|
|
BLIS_ENABLE_BLAS2BLIS. Thanks to Tyler for catching this.
|
|
- Fixed various syntax errors in the code that had yet to be compiled
|
|
due to the aforementioned bug.
|
|
|
|
commit f4ec28e723d28d998f1038f82da6986e44320ef6
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Aug 1 11:24:23 2013 -0500
|
|
|
|
Added basic OpenMP-based gemm and packm files.
|
|
|
|
Details:
|
|
- Integrated Tyler's parallelized packm_blk_var2 and gemm_ker_var2
|
|
into the following auxiliary files
|
|
|
|
frame/1m/packm/other/bli_packm_blk_var2.c
|
|
frame/3/gemm/other/bli_gemm_ker_var2.c
|
|
|
|
The routine in the first file uses a basic OpenMP parallel region to
|
|
parallelize the packing of blocks of A and panels of B, while the
|
|
second uses a similar parallel region to parallelize along the n
|
|
dimension of the gemm macro-kernel.
|
|
|
|
commit f8980edf9c318453bb1962ac4939c06bf11e6d5e
|
|
Merge: 67a8b94 6e7e452
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Jul 26 11:14:27 2013 -0500
|
|
|
|
Merge branch 'master' of https://code.google.com/p/blis
|
|
|
|
commit 67a8b9498d13b038deb316ac163e62c5b17da2ec
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Jul 26 11:12:37 2013 -0500
|
|
|
|
Added missing cpp kernel blocksize constraints.
|
|
|
|
Details:
|
|
- Added missing C preprocessor guards in bli_kernel_macro_defs.h that enforce
|
|
constraints on the register blocksizes relative to the cache blocksizes.
|
|
Thanks to Tyler for helping me stumble across this issue.
|
|
|
|
commit 6e7e452343014e8f86640874dc1dbadca4a642a1
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Jul 22 14:50:57 2013 -0500
|
|
|
|
Fixed minor warnings and misc issues.
|
|
|
|
Details:
|
|
- Fixed various warnings output by gcc 4.6.3-1, including removing some
|
|
set-but-not-used variables and addressing some instances of typecasting
|
|
of pointer types to integer types of different sizes.
|
|
|
|
commit 03f6c3599743bc837a7d40eb5b415b1bf4f2a4e9
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Jul 22 12:54:32 2013 -0500
|
|
|
|
Tightened some macros that detect datatypes.
|
|
|
|
Details:
|
|
- Modified the definitions of some macros, such as bli_is_real(), so that
|
|
the "special" bit is taken into account so that BLIS_INT is differentiated
|
|
from BLIS_FLOAT.
|
|
- Whitespace changes to bli_obj_macro_defs.h.
|
|
- Removed BLIS_SPECIAL_BIT definition from bli_type_defs.h, since it wasn't
|
|
being used.
|
|
|
|
commit b33e2f4443b9043b554963320280ff7783773652
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Jul 19 17:15:03 2013 -0500
|
|
|
|
CHANGELOG update (for 0.0.9).
|
|
|
|
commit 0680916fdd532f7a4716b11a2515243b2c08d00f (tag: 0.0.9)
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Jul 18 18:04:34 2013 -0500
|
|
|
|
Added BLAS error checking to compatibility layer.
|
|
|
|
Details:
|
|
- Added frame/compat/check directory, which now houses companion _check()
|
|
routines for each of the BLAS wrappers in frame/compat. These _check()
|
|
routines are called from the compatibility wrappers and mimic the
|
|
error-checking present in the netlib BLAS.
|
|
- Edited bla_xerbla.c so that xerbla() translates the operation string to
|
|
uppercase before printing.
|
|
- Redefined util routines in frame/compat/f2c/util in terms of level0
|
|
macros.
|
|
- Added prototypes for util routines, f2c routines, lsame(), and xerbla().
|
|
- Commented out prototypes in test/test_*.c since Fortran integers are now
|
|
int64_t by default (and the prototypes that were present in the files
|
|
used int).
|
|
- Removed redundant #include "bli_f2c.h" in bli_?lamch.c and bli_lsame.c,
|
|
since blis.h was already being included.
|
|
- Other minor changes to code in frame/compat/f2c.
|
|
|
|
commit 4e80ad28c97273db3366428ec44020da7944964d
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Jul 18 17:53:31 2013 -0500
|
|
|
|
Added support for C99 complex types/arithmetic.
|
|
|
|
Details:
|
|
- Added support for C99 complex types to bli_type_defs.h and overloaded
|
|
complex arithmetic to the scalar-level macros in include/level0. This
|
|
includes a somewhat substantial reorganization and re-layering of much
|
|
of the existing machinery present in the level0 macros.
|
|
- Added new #define for BLIS_ENABLE_C99_COMPLEX to bli_config.h files,
|
|
commented-out by default, which optionally enables the use of built-in
|
|
C99 complex types and arithmetic.
|
|
- Minor changes to clarksville and reference configs' make_defs.mk files.
|
|
- Removed macro definitions from bli_param_macro_defs.h which was not being
|
|
used (bli_proj_dt_to_real_if_imag_eq0).
|
|
|
|
commit 6072d7c848e837ba20d607f7b727438ada31bdcf
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed Jul 17 12:27:45 2013 -0500
|
|
|
|
Fixed bugs in trsm, trmm macro-kernels.
|
|
|
|
Details:
|
|
- Fixed a bug in trsm_rl_ker_var2() caused by incorrect edge case handling.
|
|
- Fixed a bug in trsm_rl_ker_var2() and trsm_ru_ker_var2() whereby k was
|
|
incorrectly being adjusted upward by MR, instead of NR. The rl and ru
|
|
trmm macro-kernels were updated in a similar fashion.
|
|
- Fixed a bug in trsm_ru_ker_var2() that was due to a missing negation on
|
|
diagoffb when recomputing k to skip a zero region below where the
|
|
diagonal intersects the right side of the block. The corresponding
|
|
trmm macro-kernel was also updated.
|
|
- Fixed a bug in trsm_ru_ker_var2() where the the adjustment of k (by NR)
|
|
needed to be placed AFTER the block that recomputes k to skip the zero
|
|
region (if present). The other three trsm macro-kernels, as well as the
|
|
trmm macro-kernels, were updated in the same manner, for consistency.
|
|
- Fixed a bug in trmm_lu_ker_var2() in which the wrong dimension (n) was
|
|
being updated to skip a zero region to the left of where the diagonal
|
|
of A intersects the top edge of the block.
|
|
- Comment updates to all trsm and trmm macro-kernels.
|
|
- Comment updates to bli_packm_init.c.
|
|
|
|
commit 47410a48f9b91e94ce4c67633686ffd1f2ad0275
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed Jul 10 14:53:59 2013 -0500
|
|
|
|
Added f2c'ed Givens rotation wrappers.
|
|
|
|
Details:
|
|
- Retired (for now) existing ?rot*() BLAS compatibility wrappers to 'attic'
|
|
along with other wrappers for which no BLIS implementation exists.
|
|
- Added f2c-generated codes for applicable datatype flavors of rot, rotg,
|
|
rotm, and rotmg operations.
|
|
|
|
commit e5f90f3a8dbe671104bcb9d8b4e3409de01805da
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed Jul 10 13:40:12 2013 -0500
|
|
|
|
Removed copynz defs from bli_kernel.h files.
|
|
|
|
Details:
|
|
- Removed COPYNZ_KERNEL definition from the bli_kernel.h files in each
|
|
configuration. (Meant to include this in previous commit.)
|
|
|
|
commit aec12d90f596e8c04b1ad178258a1cd38108f59d
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed Jul 10 13:33:30 2013 -0500
|
|
|
|
Removed copynzv, copynzm and related codes.
|
|
|
|
Details:
|
|
- Removed copynzv and copynzm operation directories. These operations
|
|
implemented a variation of copyv/m that, in the case of real source
|
|
and complex destination operands, leaves the imaginary component
|
|
untouched (rather than setting it to zero). I realize now that the
|
|
special case(s) (e.g. gemm with real A and B but complex C) that I
|
|
thought required this operation actually can be handled more simply.
|
|
- Removed level0 scalar macros implementing copynzs, copynzjs.
|
|
|
|
commit b0a0a0f274a761788531b5d281cc3b411b7124ed
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Jul 9 17:15:38 2013 -0500
|
|
|
|
Added handling of restrict, stdint.h for non-C99.
|
|
|
|
Details:
|
|
- Removed the #include <stdint.h> from blis.h and inserted a cpp macro block
|
|
in bli_type_defs.h that #includes <stdint.h> for C++ and C99, and otherwise
|
|
manually typedefs the types we need (which, for now, are unconditionally
|
|
int64_t and uint64_t).
|
|
- Moved basic typedefs to top of bli_type_defs.h, and comment changes.
|
|
- Added cpp macro block to bli_macro_defs.h that #defines restrict as
|
|
nothing for C++ and non-C99.
|
|
|
|
commit 4b7e7970f1af4a1ab121e07657e2b78b9fcd7671
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Jul 8 15:20:34 2013 -0500
|
|
|
|
Migrated integer usage to stdint.h types.
|
|
|
|
Details:
|
|
- Changed the way bli_type_defs.h defines integer types so that dim_t,
|
|
inc_t, doff_t, etc. are all defined in terms of gint_t (general signed
|
|
integer) or guint_t (general unsigned integer).
|
|
- Renamed Fortran types fchar and fint to f77_char and f77_int.
|
|
- Define f77_int as int64_t if a new configuration variable,
|
|
BLIS_ENABLE_BLIS2BLAS_INT64, is defined, and int32_t otherwise.
|
|
These types are defined in stdint.h, which is now included in blis.h.
|
|
- Renamed "complex" type in f2c files to "singlecomplex" and typedef'ed
|
|
in terms of scomplex.
|
|
- Renamed "char" type in f2c files to "character" and typedef'ed in terms
|
|
of char.
|
|
- Updated bla_amax() wrappers so that the return type is defined directly
|
|
as f77_int, rather than letting the prototype-generating macro decide
|
|
the type. This was the only use of GENTFUNC2I/GENTPROT2I-related macros,
|
|
so I removed them. Also, changed the body of the wrapper so that a
|
|
gint_t is passed into abmaxv, which is THEN typecast to an f77_int
|
|
before returning the value.
|
|
- Updated f2c code that accessed .r and .i fields of complex and
|
|
doublecomplex types so that they use .real and .imag instead (now that
|
|
we are using scomplex and dcomplex).
|
|
|
|
commit 372501398564fdba3d5a3db86c30bc1039b185ff
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Jul 8 11:24:18 2013 -0500
|
|
|
|
Added experimental bli_gemm_ker_var5().
|
|
|
|
Details:
|
|
- Added support for an experimental gemm macro-kernel incrementally
|
|
packs one micro-panel of B at a time. This is useful for certain
|
|
special cases of gemm where m is small.
|
|
- Minor changes to default values of clarksville configuration.
|
|
- Defined BLIS_PACKED_BLOCKS as part of pack_t type, even though we
|
|
do not yet have any use (or implementation support) for block storage.
|
|
- Comment update to bli_packm_init.c.
|
|
|
|
commit 9915d667a79f23e3a2a2516247c560e9063a1646
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Sun Jul 7 13:28:39 2013 -0500
|
|
|
|
Defined "total" blocksize query functions.
|
|
|
|
Details:
|
|
- Defined bli_blksz_total_for_type() and bli_blksz_total_for_obj() to query
|
|
the default blocksize plus blocksize extension (using the type or the type
|
|
of an object).
|
|
- Comment update in bli_packm_cxk.c.
|
|
|
|
commit 46d3d09d49aded1d9f1b468c83fce75e07d631dc
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Jun 27 13:19:56 2013 -0500
|
|
|
|
Consolidated lower/upper her[2]k blocked variants.
|
|
|
|
Details:
|
|
- Consolidated lower and upper blocked variants for herk and her2k, and
|
|
renamed the resulting variants, according to the same changes recently
|
|
made to trmm and trsm.
|
|
- Implemented support for four new subpartitions types:
|
|
BLIS_SUBPART1T
|
|
BLIS_SUBPART1B
|
|
BLIS_SUBPART1L
|
|
BLIS_SUBPART1R
|
|
which correspond to "merged" partitions that include the middle "1"
|
|
partition as well as either the neighboring "0" or "2" partition. This is
|
|
used to clean up code in herk/her2k var2 that attempts to partition away
|
|
the strictly zero region above or below the diagonal of a matrix operand
|
|
that is being marched through diagonally.
|
|
- Added safeguards to herk macro-kernels that skip any leading or trailing
|
|
zero region in the panel of C that is passed in. This is now needed given
|
|
that herk/her2k var1 no longer partitions off this zero region before
|
|
calling the macro-kernel (via bli_her[2]k_int()).
|
|
- Updated comments and other whitespace changes to trmm/trsm macro-kernels.
|
|
|
|
commit 02002ef6f3d2746665982793db36714bd69bccc9
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Jun 24 17:08:14 2013 -0500
|
|
|
|
Added row-storage optimizations for trmm, trsm.
|
|
|
|
Details:
|
|
- Implemented algorithmic optimizations for trmm and trsm whereby the right
|
|
side case is now handled explicitly, rather than induced indirectly by
|
|
transposing and swapping strides on operands. This allows us to walk through
|
|
the output matrix with favorable access patterns no matter how it is stored,
|
|
for all parameter combinations.
|
|
- Renamed trmm and trsm blocked variants so that there is no longer a
|
|
lower/upper distinction. Instead, we simply label the variants by which
|
|
dimension is partitioned and whether the variant marches forwards or
|
|
backwards through the corresponding partitioned operands.
|
|
- Added support for row-stored packing of lower and upper triangular matrices
|
|
(as provided by bli_packm_blk_var3.c).
|
|
- Fixed a performance bug in bli_determine_blocksize_b() whereby the cache
|
|
blocksize extensions (if non-zero) were not being used to appropriately size
|
|
the first iteration (ie: the bottom/right edge case).
|
|
- Updated comments in bli_kernel.h to indicate that both MC and NC must be
|
|
whole multiples of MR AND NR. This is needed for the case of trsm_r where,
|
|
in order to reuse existing left-side gemmtrsm fused micro-kernels, the
|
|
packing of A (left-hand operand) and B (right-hand operand) is done with
|
|
NR and MR, respectively (instead of MR and NR).
|
|
|
|
commit d1e81ddc848ee47bc188735883d14582bdd0cabc
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Jun 13 11:14:21 2013 -0500
|
|
|
|
Minor generalizing tweaks to trmm blk var1, var2.
|
|
|
|
commit 0efb7974f104206ba3985276f2180a9b14fe9f9b
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed Jun 12 16:40:04 2013 -0500
|
|
|
|
CHANGELOG update.
|
|
|
|
commit 5b641c3bab31eac6a1795b9f6e3f86c59651ca50 (tag: 0.0.8)
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed Jun 12 16:02:12 2013 -0500
|
|
|
|
Use separate CFLAGS for "kernels" directories.
|
|
|
|
Details:
|
|
- Added a new "special" directory type: any source code within directories
|
|
named "kernels" will be compiled with a separate CFLAGS_KERNELS set of
|
|
compiler flags. This allows the developer to specify a separate set of
|
|
flags (e.g. optimization flags) for compiling kernels while maintaining a
|
|
standard set for regular framework code.
|
|
- Fixed a bug in the top-level Makefile that was causing "noopt" code
|
|
to be compiled with the standard set of compilation flags.
|
|
- Updated make_defs.mk in reference, flame, and clarksville configurations
|
|
according to above changes.
|
|
|
|
commit 08475e7c7653ba598665071a617d10f0d8f763c2
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Jun 11 12:18:39 2013 -0500
|
|
|
|
Various level-3 optimizations for row storage.
|
|
|
|
Details:
|
|
- Implemented remaining two cases within bli_packm_blk_var2(), which allow
|
|
packing from a lower or upper-stored symmetric/Hermitian matrix to column
|
|
panels (which are row-stored). Previously one could only pack to row panels
|
|
(which are column-stored).
|
|
- Implemented various optimizations in the level-3 front-ends that allow more
|
|
favorable access through row-stored matrices for gemm, hemm, herk, her2k,
|
|
symm, syrk, and syr2k.
|
|
- Cleaned up code in level-3 front-ends that has to do with setting target and
|
|
execution datatypes.
|
|
|
|
commit 05a657a6b92e8d34efa5c57ae6a18a4f35ec0841
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Jun 7 11:04:10 2013 -0500
|
|
|
|
Added beta == 0 optimization to x86_64 ukernel.
|
|
|
|
Details:
|
|
- Modified x86_64 gemm microkernel so that when beta is zero, C is not read
|
|
from memory (nor scaled by beta).
|
|
- Fixed minor bug in test suite driver when "Test all combinations of storage
|
|
schemes?" switch is disabled, which would result in redundant tests being
|
|
executed for matrix-only (e.g. level-1m, level-3) operations if multiple
|
|
vector storage schemes were specified.
|
|
- Restored debug flags as default in clarksville configuration.
|
|
|
|
commit f1aa6b81cc421516dd77dd0f18f7c432724e6ef2
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Jun 6 13:36:06 2013 -0500
|
|
|
|
Whitespace changes to old test drivers.
|
|
|
|
Details:
|
|
- Replaced tabs with four spaces in places where indention was already
|
|
in place.
|
|
|
|
commit 9feb4c23d2e36f3d8b5417a3802c69f94b29f749
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Jun 4 14:57:46 2013 -0500
|
|
|
|
Fixed unaligned handling in axpyf, dotxaxpyf.
|
|
|
|
Details:
|
|
- Fixed over-cautious handling of unaligned operands in vector instrinsic
|
|
implementation of axpyf kernel.
|
|
- Fixed over- and under-cautious handling of unaligned operands in vector
|
|
intrinsic implementation of dotxaxpyf kernel.
|
|
|
|
commit 22b06cfcd2e3205c8325a246c2279e4b1047c066
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Jun 3 16:54:52 2013 -0500
|
|
|
|
Updated level-1/-1f [vector intrinsic] kernels.
|
|
|
|
Details:
|
|
- Updated level-1/-1f kernels so that non-unit and un-aligned cases are
|
|
handled by reference implementation (rather than aborted).
|
|
- Added -fomit-frame-pointer to default make_defs.mk for clarksville
|
|
configuration.
|
|
- Defined bli_offset_from_alignment() macro.
|
|
- Minor edits to old test drivers.
|
|
|
|
commit 0288c827d3659bb225ac9c10f168b623ed0106a2
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Sat Jun 1 08:02:23 2013 -0500
|
|
|
|
Updated ukernels for x86_64.
|
|
|
|
Details:
|
|
- Tweaked micro-kernels and configuration for clarksville.
|
|
- Updated/cleaned up old test drivers in test directory.
|
|
- Fixed syntax bug in trsv_unb_var1 and trsv_unf_var1 (introduced
|
|
recently).
|
|
|
|
commit 85a6d1c9a52c2b27c71a3a3e341c51d7ba263749
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon May 6 11:05:08 2013 -0500
|
|
|
|
Replaced axpys usage with subs in trsv.
|
|
|
|
Details:
|
|
- Replaced instances of axpys with alpha equal to -1 with subs.
|
|
- Use BLIS_MAX_TYPE_SIZE to define BLIS_CONSTANT_SLOT_SIZE instead of
|
|
sizeof(dcomplex).
|
|
|
|
commit 2d9c667f3c48a12cab64e5ad09d5fcb9f4c19d78
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri May 24 16:28:10 2013 -0500
|
|
|
|
Fixed x86_64 kernel bugs and other minor issues.
|
|
|
|
Details:
|
|
- Fixed bugs in trmv_l and trsv_u due to backwards iteration resulting in
|
|
unaligned subpartitions. We were already going out of our way a bit to
|
|
handle edge cases in the first iteration for blocked variants, and this
|
|
was simply the unblocked-fused extension of that idea.
|
|
- Fixed control tree handling in her/her2/syr/syr2 that was not taking
|
|
into account how the choice of variant needed to be altered for
|
|
upper-stored matrices (given that only lower-stored algorithms are
|
|
explicitly implemented).
|
|
- Added bli_determine_blocksize_dim_f(), bli_determine_blocksize_dim_b()
|
|
macros to provide inlined versions of bli_determine_blocksize_[fb]() for
|
|
use by unblocked-fused variants.
|
|
- Integrated new blocksize_dim macros into gemv/hemv unf variants for
|
|
consistency with that of the bugfix for trmv/trsv (both of which now
|
|
use the same macros).
|
|
- Modified bli_obj_vector_inc() so that 1 is returned if the object is a
|
|
vector of length 1 (ie: 1 x 1). This fixes a bug whereby under certain
|
|
conditions (e.g. dotv_opt_var1), an invalid increment was returned, which
|
|
was invalid only because the code was expecting 1 (for purposes of
|
|
performing contiguous vector loads) but got a value greater than 1 because
|
|
the column stride of the object (e.g. rho) was inflated for alignment
|
|
purposes (albeit unnecessarily since there is only one element in the
|
|
object).
|
|
- Replaced some old invocations of set0 with set0s.
|
|
- Added alpha parameter to gemmtrsm ukernels for x86_64 and use accordingly.
|
|
- Fixed increment bug in cleanup loop of gemm ukernel for x86_64.
|
|
- Added safeguard to test modules so that testing a problem with a zero
|
|
dimension does not result in a failure.
|
|
- Tweaked handling of zero dimensions in level-2 and level-3 operations'
|
|
internal back-ends to correctly handle cases where output operand still
|
|
needs to be scaled (e.g. by beta, in the case of gemm with k = 0).
|
|
|
|
commit d57ec42b34f8447c88adeffa95cf22f8c115ad51
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri May 3 17:35:32 2013 -0500
|
|
|
|
Renamed _trans_status() macro.
|
|
|
|
Details:
|
|
- Mistakenly forgot to rename the _trans_status() macro and instances in
|
|
previous commit.
|
|
|
|
commit 9e2b227866af429a4a6fb7dbb8c457bbdda2f136
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri May 3 17:24:58 2013 -0500
|
|
|
|
Renamed _set_trans(), _trans_status() macros.
|
|
|
|
Details:
|
|
- Renamed the following macros:
|
|
bli_obj_set_trans() -> bli_obj_set_onlytrans()
|
|
bli_obj_trans_status() -> bli_obj_onlytrans_status()
|
|
to remove ambiguity as to which bits are read/updated.
|
|
|
|
commit 2f8174509ea9f844db11ebd9389de5168e85b132
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed May 1 15:06:30 2013 -0500
|
|
|
|
Unconditionally check memory pool(s) for errors.
|
|
|
|
Details:
|
|
- Changed bli_mem_acquire_m() in bli_mem.c so that we still check if the
|
|
memory pool is exhausted before checking out and returning a block, even
|
|
if BLIS error checking has been disabled. These errors are useful because
|
|
they likely indicate that BLIS was improperly configured for the code
|
|
being run.
|
|
|
|
commit 75405a2b83679b6aff38d7e7425199d623a7b0a9
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed May 1 15:00:30 2013 -0500
|
|
|
|
CHANGELOG update.
|
|
|
|
commit 6bfa96f84887dec0b4cf8be5d38dd634c2f8951d (tag: 0.0.7)
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Apr 30 19:35:54 2013 -0500
|
|
|
|
Absorbed blocksize extensions into main objects.
|
|
|
|
Details:
|
|
- Revamped some parts of commit b6ef84fad1c9 by adding blocksize extension
|
|
fields to the blksz_t object rather than have them as separate structs.
|
|
- Updated all packm interfaces/invocations according to above change.
|
|
- Generalized bli_determine_blocksize_?() so that edge case optimization
|
|
happens if and only if cache blocksizes are created with non-zero
|
|
extensions.
|
|
- Updated comments in bli_kernel.h files to indicate that the edge case
|
|
blocksize extension mechanism is now available for use.
|
|
|
|
commit bc7c8005cedbe50961ac2a99aeeabf4e9f9a8e9e
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Apr 25 17:16:59 2013 -0500
|
|
|
|
Added option to disable err checking in testsuite.
|
|
|
|
Details:
|
|
- Added a new line to input.general that allows one to specify the error-
|
|
checking level to use for each BLIS experiment. The only two levels
|
|
supported for now are "no error checking" and "full error checking".
|
|
|
|
commit 096b366ddcfe386f44419ef84d8df8be13825f86
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Apr 25 16:43:43 2013 -0500
|
|
|
|
Use cntl trees that block in n dimension.
|
|
|
|
Details:
|
|
- Updated _cntl.c files for each level-3 operation to induce blocked
|
|
algorithms that first paritition in the n dimension with a blocksize
|
|
of NC. Typically this is not an issue since only very large problems
|
|
exceed that of NC. But developers often run very large problems, and
|
|
so this extra blocking should be the default.
|
|
- Removed some recently introduced but now unused macros from
|
|
bli_param_macro_defs.h.
|
|
|
|
commit b6e24b23cb4dfc488c1c9c70d596539c2287f72e
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Apr 25 12:06:12 2013 -0500
|
|
|
|
Use PASTEMAC in macro-kernels (over MAC2 or MAC3).
|
|
|
|
Details:
|
|
- Replaced multi-type invocations of copys_mxn, xpbys_mxn, etc. (PASTEMAC2
|
|
and PASTEMAC3) with those that only use a single type (PASTEMAC).
|
|
- Added extra macros to bli_adds_mxn_uplo.h and bli_xpbys_mxn_uplo.h to
|
|
accommodate above change.
|
|
- Fixed comment typo in bli_config.h files.
|
|
- Added .nfs* pattern to .gitignore.
|
|
|
|
commit df80acf517dde180ddcc5835c6136b2fa7556d4b
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Apr 23 19:43:23 2013 -0500
|
|
|
|
Fixed computation of b_next in L3 macro-kernels.
|
|
|
|
Details:
|
|
- Restructured herk_l and herk_u macro-kernels in the imagine of trmm
|
|
and trsm, in that the edge cases are captured by the main loop, rather
|
|
than trying to have "cleanup" sections that result in four distinct
|
|
parts (interior, bottom edge, right edge, bottom-right edge) of the
|
|
code.
|
|
- Fixed the way b_next was being computed in the non-gemm level-3
|
|
macro-kernels (herk, trmm, trsm). The way they are computed now matches
|
|
that of gemm.
|
|
|
|
commit 3671528cf8efe4b445d196665143a5c50c2c6048
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Apr 23 19:12:14 2013 -0500
|
|
|
|
Fixed minor bug in computing b_next in gemm.
|
|
|
|
commit db072a5b4a039a9a668ef951333ecfb5bd3a74b9
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Apr 23 17:49:10 2013 -0500
|
|
|
|
Fixed rare edge case bug in herk_l macro-kernel.
|
|
|
|
Details:
|
|
- Fixed a potential bug in herk_l at the m_left edge case. If MR was
|
|
chosen to be much larger than NR, then one could encounter edge cases
|
|
in the the MC dimension that fall entirely below the diagonal, which
|
|
the previous implementation of the herk_l macro-kernel was not allowing
|
|
for.
|
|
|
|
commit 1dab11e37d1cb403cbe75b73a644c00de534f104
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Apr 23 17:17:11 2013 -0500
|
|
|
|
Updated x86 gemmtrsm ukernels to use alpha.
|
|
|
|
commit 9d10d7dd9bc92a993fea7162bfa5983f75506f49
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Apr 23 16:00:18 2013 -0500
|
|
|
|
Added a_next, b_next arguments to micro-kernels.
|
|
|
|
Details:
|
|
- Added two more arguments to the gemm and gemmtrsm microkernels: the
|
|
addresses of the next micro-panels of A and B. By passing these
|
|
pointers into the micro-kernel, we allow the micro-kernel author to
|
|
prefetch micro-panels of A and B as necessary (though this is
|
|
completely optional; these addresses may also be safely ignored).
|
|
- Updated all seven macro-kernels so that they compute and pass in
|
|
a_next and b_next. Note that ONLY the gemm macro-kernel computes
|
|
a_next and b_next with the precise semantics we want. I will go back
|
|
and fix the other macro-kernels in the near future.
|
|
- Added 'restrict' to various micro-kernels from which it was missing.
|
|
|
|
commit f3815dc84d385c514a5acaf1e925424a57be2f51
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Apr 23 11:12:33 2013 -0500
|
|
|
|
Added code for backward edge-case blocking.
|
|
|
|
Disabled:
|
|
- Edited bli_determine_blocksize_b() to include experimental (and
|
|
currently disabled) code that computes extended blocks.
|
|
- Updated commnts relate to above changes.
|
|
- Enabled use of x86 gemmtrsm ukernel in config/flame/bli_kernel.h.
|
|
|
|
commit 4fe1435f20e8fc7dd72f795ac58c8e236e6c631b
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Apr 22 19:00:43 2013 -0500
|
|
|
|
Updated dupl implementation to use PACKNR and NR.
|
|
|
|
Details:
|
|
- Updated frame/util/dupl/bli_dupl_unb_var1.c to utilize PACKNR and NR
|
|
explicitly so navigate b1 so that situations where PACKNR > NR are
|
|
supported.
|
|
- Moved the 4x2 and 4x4 reference micro-kernels in frame/3/gemm/ukernels and
|
|
frame/3/trsm/ukernels to kernels/c99/.
|
|
- Updated clarksville and flame configurations.
|
|
|
|
commit 2d6f9e83799a46d52d7901e275f8fd67f0a0edc6
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Sun Apr 21 15:10:34 2013 -0500
|
|
|
|
Disabled blocksize checks for memory pools.
|
|
|
|
Details:
|
|
- Temporarily disabled checks that ensure that enough memory will be allocated
|
|
by the contiguous memory allocator for all types, given that the values for
|
|
double precision real are the ones used to allocate the space. These checks
|
|
can easily go awry in certain situations, especially if you are developing for
|
|
only one datatype. So for now, they are probably more trouble than they are
|
|
worth.
|
|
|
|
commit b6ef84fad1c9884c84b7f1350a0bcdfe1737e8f2
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Sun Apr 21 15:00:24 2013 -0500
|
|
|
|
Allow ldim of packed micro-panels != MR, NR.
|
|
|
|
Details:
|
|
- Made substantial changes throughout the framework to decouple the leading
|
|
dimension (row or column stride) used within each packed micro-panel from
|
|
the corresponding register blocksize. It appears advantageous on some
|
|
systems to use, for example, packed micro-panels of A where the column
|
|
stride is greater than MR (whereas previously it was always equal to MR).
|
|
- Changes include:
|
|
- Added BLIS_EXTEND_[MNK]R_? macros, which specify how much extra padding
|
|
to use when packing micro-panels of A and B.
|
|
- Adjusted all packing routines and macro-kernels to use PACKMR and PACKNR
|
|
where appropriate, instead of MR and NR.
|
|
- Added pd field (panel dimension) to obj_t.
|
|
- New interface to bli_packm_cntl_obj_create().
|
|
- Renamed bli_obj_packed_length()/_width() macros to
|
|
bli_obj_padded_length()/_width().
|
|
- Removed local #defines for cache/register blocksizes in level-3 *_cntl.c.
|
|
- Print out new cache and register blocksize extensions in test suite.
|
|
- Also added new BLIS_EXTEND_[MNK]C_? macros for future use in using a larger
|
|
blocksize for edge cases, which can improve performance at the margins.
|
|
|
|
commit 59fca58dbe678d79c1df0916b022afbeac7c48fa
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Apr 19 15:26:29 2013 -0500
|
|
|
|
Fixed bug in compatibility layer (her2k/syr2k).
|
|
|
|
Details:
|
|
- Fixed a bug in the BLAS compatibility layer, specifically in bla_her2k.c
|
|
and bla_syr2k.c, that caused incorrect computation to occur when the BLAS
|
|
interface caller requests the [conjugate-]transpose case. Thanks to Bryan
|
|
Marker for reporting the behavior that led to this bug.
|
|
|
|
commit 09eacbd1ab1380a95a0e9625726b45e43ed102d6
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Apr 18 19:39:13 2013 -0500
|
|
|
|
Changed old level3 test drivers to call front-ends.
|
|
|
|
Details:
|
|
- Changed old level-3 test drivers, in 'test' directory, to always call the
|
|
front-end object API instead of the internal back-end with the locally
|
|
defined control tree.
|
|
|
|
commit 83e45de23e565138b8fde06fb11cfedc973b7246
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Apr 18 18:33:03 2013 -0500
|
|
|
|
Allow packm_init() to reacquire a too-small mem_t.
|
|
|
|
Details:
|
|
- Changed bli_packm_init() to react differently to a situation where a pack
|
|
obj_t has an already-allocated mem_t entry that has a buffer that is smaller
|
|
than what will be needed to hold the block/panel that now needs to be
|
|
packed. Previously, this situation was treated with an abort() since I
|
|
assumed something was horribly wrong. I have changed the code so that it now
|
|
reacts by releasing the previous mem_t and re-acquires a new mem_t with the
|
|
new information. (This change was done at the request of Bryan Marker to
|
|
facilitate code generation via DxT.)
|
|
|
|
commit a6990434173b0cf651f8521194f3aef738deb7d2
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Apr 18 13:52:47 2013 -0500
|
|
|
|
Fixed bug in packing block of A for hemm/symm.
|
|
|
|
Details:
|
|
- Fixed a bug in bli_packm_blk_var2() that affected the packing functionality
|
|
of hemm and symm. The bug occurs whenever attempting to pack a Hermitian or
|
|
symmetric matrix where the block of A being packed intersects the diagonal,
|
|
but some of its micro-panels do not intersect the diagonal and lie completely
|
|
in the unstored region. Thanks to Francisco Igual for reporting this bug.
|
|
- Comment updates to both _blk_var2.c and _blk_var3.c.
|
|
|
|
commit c92e7590e1934f830814ab614c794215ebe0c415
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed Apr 17 20:53:29 2013 -0500
|
|
|
|
Activated bli_packm_acquire_mpart_t2b().
|
|
|
|
Details:
|
|
- Removed the overly-paranoid bli_abort() from the end of
|
|
bli_packm_acquire_mpart_t2b(), to allow others to experiment with
|
|
partitioning through packed blocks of A. Also, and more importantly,
|
|
changed an earlier check that was causing an erroneous (but
|
|
coincidentally redundant) abort(). Also, updated some of the comments
|
|
in bli_packm_part.c.
|
|
|
|
commit bea579e9f009a44e08008eb14d09f38748ab2b53
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Apr 16 19:43:14 2013 -0500
|
|
|
|
Allow creation of "empty" objects.
|
|
|
|
Details:
|
|
- Modified bli_obj_alloc_buffer() to allow allocating an empty buffer, and
|
|
modified bli_adjust_strides() to explicitly handle m = n = 0.
|
|
- Updated bli_check_matrix_strides() to allow cases where m = n = 0.
|
|
|
|
commit 7904e20f2e6908571ee5008da2a08084198eefae
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Apr 16 17:37:16 2013 -0500
|
|
|
|
Fixed "root" object bug in bli_her[2]k/syr[2]k.
|
|
|
|
Details:
|
|
- Fixed an obscure bug in the front-ends for herk, her2k, syrk, and syr2k,
|
|
that manifested as the incorrect triangle being updated. It occurred when
|
|
the user would pass in a matrix object that was correctly marked as
|
|
symmetric/Hermitian and lower-stored, but whose root object was never marked
|
|
as lower (or upper). We now alias and re-assign root status for matrix C
|
|
within the front-ends. Note that trmm and trsm were already doing this,
|
|
albeit for a slightly different reason (to allow the internal back-end to
|
|
choose which algorithm to run--lower or upper--based on the uplo of the root
|
|
object for both left and right side cases). Thanks to Bryan Marker for
|
|
leading me to this bug.
|
|
|
|
commit 19155a768dd97b57cfb59c32fa8e54a344ec66e1
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Apr 16 11:24:03 2013 -0500
|
|
|
|
Fixed overzealous type-checking in bli_getsc().
|
|
|
|
Details:
|
|
- Relaxed type checking in getsc so that the input object could be a constant
|
|
and not just a proper floating-point type. (If it is a constant, default to
|
|
extracting the dcomplex values.) Thanks to Bryan Marker for reporting this
|
|
bug.
|
|
- Added definition for bli_is_constant() in bli_param_macro_defs.h
|
|
- Comment updates to various level-0 scalar routines.
|
|
|
|
commit 2ee6bbca2953d04c967685da9735b3eaf8a4b813
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Apr 15 19:27:57 2013 -0500
|
|
|
|
Fixed bug in bli_obj_is_packed() and renamed.
|
|
|
|
Details:
|
|
- This macro is used to determine whether the partitioning routines should
|
|
call a corresponding packm_part routine instead. However, it was
|
|
unintentionally catching matrices that were marked as "packed" by virtue
|
|
of them simply being marked as BLIS_PACKED_UNSPEC in, say, bli_gemv().
|
|
The macro has now been renamed to bli_obj_is_panel_packed(), and now only
|
|
checks for row or column panel packing. (Note that I first attempted to
|
|
fix this bug in a571af816d72.) Thanks to Bryan Marker for reporting the
|
|
erroneous behavior that led me to this bug.
|
|
|
|
commit 99b99eebe70336b5f28039a4a084aa7f5fa7059d
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Apr 15 17:54:43 2013 -0500
|
|
|
|
Removed local reference ukernel blocksize macros.
|
|
|
|
Details:
|
|
- Removed locally defined gemm microkernel blocksize macros from _mxn
|
|
reference microkernel definition and header. Meant to include this in
|
|
a recent/previous commit (0020ef7c8271).
|
|
|
|
commit 6a538fa7b164655f41cea5b9c8d3902438bda66b
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Apr 15 14:40:31 2013 -0500
|
|
|
|
Formatting change to mods in previous commit.
|
|
|
|
commit ea079d35591e808971d2d98a1a7d9f89bc1f7c2f
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Apr 15 14:31:40 2013 -0500
|
|
|
|
Set structure of objects in level-2 BLIS APIs.
|
|
|
|
Details:
|
|
- Added missing statement to set structure field of local objects in
|
|
top-level BLIS (BLAS-like) API wrappers. Thanks to Bryan Marker for
|
|
reporting this bug.
|
|
|
|
commit d9948c541c0446e20e249a1ccc83709ce51b7aa8
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Apr 15 10:21:26 2013 -0500
|
|
|
|
Tweak to test suite function string construction.
|
|
|
|
Details:
|
|
- Fixed a minor bug in the way that the test suite would construct function
|
|
name strings when the user anchored all parameters in input.operations.
|
|
In this case, the test driver would mistake this situation for one where
|
|
the operation simply had no parameters to begin with, and thus would not
|
|
include the parameter string in the function string that is output for
|
|
every result.
|
|
|
|
commit ca9e435c57c5c7a000d2a32681dd8070ba850abd
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Apr 15 09:59:46 2013 -0500
|
|
|
|
Fixed a bug in reference implementation of dupl.
|
|
|
|
Details:
|
|
- Fixed a bug in reference implementation of dupl (bli_dupl_unb_var1.c),
|
|
which resulted in incorrect duplication.
|
|
- Updated old test drivers according to recently updated packm control tree
|
|
creation interface.
|
|
- Added 'restrict' to x86 gemm microkernel interface.
|
|
|
|
commit 26cbd52e364bbe439e3744101cd5a6cbcb82dffd
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Sun Apr 14 19:05:33 2013 -0500
|
|
|
|
Modified bli_kernel.h include order in blis.h.
|
|
|
|
Details:
|
|
- Delayed #include of bli_kernel.h in blis.h to prevent a situation where
|
|
_kernel.h includes an optimized microkernel header, which uses BLIS types
|
|
such as dim_t and inc_t, which would precede the definition of those types
|
|
in bli_type_defs.h.
|
|
- Moved the #include of bli_kernel_macro_defs.h in bli_macro_defs.h to blis.h
|
|
(immediately after that of bli_kernel.h).
|
|
|
|
commit 3414a23c38b0de45a8034b3dda2fc4b5a755e4e1
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Sat Apr 13 16:53:16 2013 -0500
|
|
|
|
CHANGELOG update.
|
|
|
|
commit ec16c52f2ecf419c749175ce0a297441c10f1c68 (tag: 0.0.6)
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Sat Apr 13 16:41:16 2013 -0500
|
|
|
|
Updated INSTALL file (now redirects to website).
|
|
|
|
commit 0020ef7c82711a7ebf08e5174f939bee2563184c
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Sat Apr 13 15:26:35 2013 -0500
|
|
|
|
Removed gemmtrsm-, trsm-specific blocksize macros.
|
|
|
|
Details:
|
|
- Modified gemmtrsm micro-kernel wrappers to use new aliased blocksize macros
|
|
instead of operation-specific ones.
|
|
- Removed local, gemmtrsm-specific blocksize macro definitions found in
|
|
micro-kernel header files.
|
|
(Meant to include above changes in 31b100e7bf4a.)
|
|
- Added comments to reference gemmtrsm micro-kernel wrapper implementation.
|
|
|
|
commit 1a9f427b85bb95aaa9e54c8ff8ecad8734b361ee
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Apr 12 15:25:54 2013 -0500
|
|
|
|
Added/renamed alignment constants to _config.h.
|
|
|
|
Details:
|
|
- Added new memory alignment constants:
|
|
BLIS_HEAP_STRIDE_ALIGN_SIZE (previously assumed to be same as SYSTEM_MEM)
|
|
BLIS_CONTIG_ADDR_ALIGN_SIZE (previously assumed to be same as PAGE_SIZE)
|
|
BLIS_STACK_BUF_ALIGN_SIZE (previously not enforced)
|
|
and renamed existing ones
|
|
BLIS_SYSTEM_MEM_ALIGN_SIZE -> BLIS_HEAP_ADDR_ALIGN_SIZE
|
|
BLIS_CONTIG_MEM_ALIGN_SIZE -> BLIS_CONTIG_STRIDE_ALIGN_SIZE
|
|
to better convey what the alignment factor is used for (and what it is
|
|
not used for).
|
|
- Removed BLIS_ENABLE_SYSTEM_MEM_ALIGN. Dynamic memory alignment is now
|
|
disabled by setting BLIS_HEAP_STRIDE_ALIGN_SIZE to 1.
|
|
- Inserted instances of __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE)))
|
|
into macro-kernels to specify stack alignment of temporary buffers.
|
|
- Modified test suite driver to output new constants.
|
|
- Removed bli_align_dim_to_sys() and bli_align_dim_to_cmem(). Instead, we now
|
|
use bli_align_dim_to_size(), which takes a third argument (the desired
|
|
alignment).
|
|
|
|
commit a77d10e87e3c0ab55ec14d74c285bc95c06285c3
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Apr 12 11:40:55 2013 -0500
|
|
|
|
Fixed an bug in axpyv/axpym when alpha is unit.
|
|
|
|
Details:
|
|
- Fixed bug whereby axpyv and axpym were incorrectly simplifying to a copy,
|
|
rather than an add, when alpha = 1. Thanks to Bryan Marker for identifying
|
|
this bug.
|
|
|
|
commit 0495bd1d6de5995fe2fb79b321eec79e961eb7a5
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Apr 11 16:39:25 2013 -0500
|
|
|
|
Moved _POSIX_C_SOURCE def to compiler cmd line.
|
|
|
|
Details:
|
|
- Removed the #define of _POSIX_C_SOURCE in bli_config.h (for both reference
|
|
and clarksville configurations) and added "-D_POSIX_C_SOURCE=200112L" to
|
|
the compiler command line arguments in make_defs.mk (for both configs).
|
|
Thanks to Devin Matthews for suggesting this change.
|
|
|
|
commit d43d1a0a2ef6de4bc57627566aef8e3fdb458b8c
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Apr 11 16:28:17 2013 -0500
|
|
|
|
Appended 'f2c_' to abs, min, max macros in f2c.h.
|
|
|
|
Details:
|
|
- Renamed abs, min, max, dmin, and dmax macros in bli_f2c.h so that they
|
|
would not conflict with anything defined by the user (or the language).
|
|
Thanks to Devin Matthews for suggesting this fix.
|
|
- Updated all instances of the above macros accordingly.
|
|
|
|
commit 31b100e7bf4aeaa4ceafefd2b6c3102d5fbc4cbb
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Apr 11 11:11:52 2013 -0500
|
|
|
|
Added new kernel blocksize macro aliases.
|
|
|
|
Details:
|
|
- Added new macros that alias level-3 cache and register blocksize macros
|
|
to names that can be constructed via the PASTEMAC macro. These aliased
|
|
macro definitions live inside bli_kernel_macro_defs.h, which is now
|
|
#included after bli_kernel.h.
|
|
- Modified macro-kernels to use new aliased blocksize macros instead of
|
|
operation-specific ones.
|
|
- Removed local, operation-specific kernel blocksize macro definitions
|
|
(found in macro-kernel header files).
|
|
|
|
commit bd2b24ba65b36d7c07c5918a3838ce2ff57c4b48
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Apr 11 10:35:39 2013 -0500
|
|
|
|
Updated CREDITS file.
|
|
|
|
commit 79328c15410215737f3f14cd069328cf52aa11fd
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Apr 11 10:32:14 2013 -0500
|
|
|
|
Reverted testsuite object files' home to 'obj'.
|
|
|
|
Details:
|
|
- Removed 'obj' and 'lib' from .gitignore.
|
|
- Added testsuite/obj/.gitkeep (which is an empty file).
|
|
- Updated testsuite/Makefile accordingly.
|
|
- Thanks to Vernon Austel for pointing out the .gitkeep trick to tracking
|
|
empty directories in git.
|
|
|
|
commit 4afe3bfd82c03e1e97b58b7d250588a0d28541e5
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Apr 9 17:45:39 2013 -0500
|
|
|
|
Renamed/moved object scalar constant macros.
|
|
|
|
Details:
|
|
- Replaced scalar constant macro definitions in bli_const_defs.h with a single,
|
|
simplier macro in bli_obj_macro_defs.h.
|
|
- Updated invocations of old macros accordingly.
|
|
- Removed bli_const_defs.h.
|
|
|
|
commit 357893f5be5c56ab7b062874005e77e614b23f06
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Apr 9 14:48:15 2013 -0500
|
|
|
|
Applied fix from prev commit to gemmtrsm_?_ref_4x4
|
|
|
|
Details:
|
|
- Fixed hard-coded kernels in bli_gemmtrsm_l_ref_4x4.c and
|
|
bli_gemmtrsm_u_ref_4x4.c.
|
|
|
|
commit 54988e8dca44475610bcaee5a7bc1c40e8921402
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Apr 8 19:08:43 2013 -0500
|
|
|
|
Fixed a performance bug in trsm.
|
|
|
|
Details:
|
|
- Fixed a bug in the reference implementations of the gemmtrsm wrappers
|
|
(bli_gemmtrsm_l_ref_mxn.c and bli_gemmtrsm_u_ref_mxn.c) whereby the
|
|
reference gemm microkernel was hard-coded, and thus always called, even
|
|
when GEMM_UKERNEL was defined to point to an optimzied microkernel. This
|
|
manifested as artificially low trsm performance for all problem sizes, but
|
|
especially for small problem sizes as it only affected blocks of A that
|
|
intersected the diagonal. Thanks to Mike Kistler of IBM for helping me
|
|
find this bug.
|
|
|
|
commit a7252e40b5c351eef9a1df531ea0ef25cb5fb705
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Apr 8 16:08:22 2013 -0500
|
|
|
|
Generate testsuite objects 'src'.
|
|
|
|
Details:
|
|
- Tweaked the testsuite makefile so that object files are stored in 'src'
|
|
rather than 'obj', since (a) the top-level .gitignore dictates that
|
|
obj directories are to be ignored, and (b) since git has problems
|
|
tracking empty directories. Now, users do not need to create their own
|
|
obj directories within their own local clones of BLIS.
|
|
|
|
commit 803871c55b60d3c225ad9a0607fa507a9c16aab7
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Apr 8 15:18:42 2013 -0500
|
|
|
|
Minor formatting changes.
|
|
|
|
commit a571af816d72727e16cad37007e7043b9d6fa362
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Apr 8 15:00:13 2013 -0500
|
|
|
|
Fixed definition of bli_is_packed_object() macro.
|
|
|
|
Details:
|
|
- Changed the definition of bli_is_packed_object() so that it keys off of the
|
|
value of the pack schema bits in the info field of obj_t, rather than
|
|
comparing the obj_t buffer with that of the mem_t entry. This was the cause
|
|
of a very low probability bug whereby uninitialized memory caused the macro
|
|
to evaluate to TRUE even though the object in question was not packed.
|
|
Thanks to Vernon Austel of IBM for helping discover this bug.
|
|
- Changed an abort() in bli_packm_part() to a not-yet-implemented.
|
|
|
|
commit 3be14c32f735ecc6169d3ab6370cf8b69162acec
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Sat Apr 6 12:54:45 2013 -0500
|
|
|
|
Updated information in testsuite output header.
|
|
|
|
Details:
|
|
- Added to the information that is echoed at the beginning of the test suite's
|
|
output, and also re-labeled some existing information.
|
|
|
|
commit 874707c1b183a4dd9a91dbfd4ea1522384c190df
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Apr 5 17:19:43 2013 -0500
|
|
|
|
Fixed edge case handling bug in herk macrokernels.
|
|
|
|
Details:
|
|
- Fixed a bug present in bli_herk_l_ker_var2() and bli_herk_u_ker_var2() that
|
|
only manifests when BLIS is configured such that MR != NR. The bug involves
|
|
incorrectly detecting edge cases, which resulted in some parts of matrix C
|
|
potentially being skipped and not updated, depending on the problem size.
|
|
- Updated the default values of MR and NR in config/reference/bli_kernel.h to
|
|
8 and 4, respectively, so that I can better stress the framework on a
|
|
day-to-day basis. (The fact that they were both equal to 4 for so long is
|
|
why I did not stumble upon this bug much sooner.)
|
|
|
|
commit 7cbda15291d3e01300e71c286b9657b7ef0708bf
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Apr 4 15:25:43 2013 -0500
|
|
|
|
Added reference microkernels for arbitrary MR, NR.
|
|
|
|
Details:
|
|
- Added a new set of reference gemm, gemmtrsm, and trsm micro-kernels that
|
|
contain explicit loops over MR and NR, thus allowing them to be used
|
|
unmodified by developers who want to build a reference library with
|
|
custom register blocksizes.
|
|
- Changed config/reference/bli_kernel.h to use above ukernels by default.
|
|
- Changed interfaces of new and existing gemm, gemmtrsm, and trsm micro-kernels
|
|
to use 'restrict' keyword.
|
|
- Added -funroll-loops option to config/reference/make_defs.mk.
|
|
- Updated comments in bli_kernel.h describing constraints on register and
|
|
cache blocksizes.
|
|
- Updated _adds_mxn.h, _copys_mxn.h, and _xpbys_mxn.h macros files so that
|
|
single-char macros are also defined.
|
|
|
|
commit 6684b73d5501f91d24a79e26655a42819c9b3114
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Apr 2 13:06:20 2013 -0500
|
|
|
|
Implemented amax operation and related changes.
|
|
|
|
Details:
|
|
- Implemented amax operation in BLIS.
|
|
- Activated BLAS2BLIS routine mapping for new amax BLIS implementation.
|
|
- Added integer support to [f]printv, [f]printm.
|
|
- Added integer support to level-0 copys macros.
|
|
- Updated printing of configuration information in test suite driver.
|
|
- Comment changes to _config.h files.
|
|
- Added comments to bla_dot.c to reminder reader what sdsdot()/dsdot() are
|
|
used for.
|
|
|
|
commit fb68087f8727cd5fd656a742a110e54fb1c91db9
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Mar 26 15:10:16 2013 -0500
|
|
|
|
More memory alignment-related tweaks.
|
|
|
|
Details:
|
|
- Renamed BLIS_MEMORY_ALIGNMENT_SIZE to BLIS_CONTIG_MEM_ALIGN_SIZE.
|
|
- Renamed BLIS_ENABLE_MEMORY_ALIGNMENT to BLIS_ENABLE_SYSTEM_MEM_ALIGN.
|
|
- Added BLIS_SYSTEM_MEM_ALIGN_SIZE, which controls only the alignment
|
|
passed into posix_memalign() or equivalent.
|
|
- Defined new function, bli_align_dim_to_cmem(), which applies the
|
|
contiguous memory alignment (rather than the system/malloc alignment).
|
|
|
|
commit 9682ef61dbf9a8846c8b0826d4de24bc216cd641
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Mar 26 14:14:53 2013 -0500
|
|
|
|
Always define memory alignment size cpp constant.
|
|
|
|
Details:
|
|
- Removed guard around #define for memory alignment size constant.
|
|
Memory alignment should always be enabled, and so this value should
|
|
always be defined.
|
|
|
|
commit 3a787cccaae16531474f34398e3c0cf4f49b8cd8
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Mar 26 13:59:19 2013 -0500
|
|
|
|
Renamed memory alignment macro constant.
|
|
|
|
Details:
|
|
- Renamed all occurrences of BLIS_MEMORY_ALIGNMENT_BOUNDARY to
|
|
BLIS_MEMORY_ALIGNMENT_SIZE.
|
|
|
|
commit 37308f9a502b56d94fa52a7df71c676a46c3be3d
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Mar 26 12:43:14 2013 -0500
|
|
|
|
Align packed panel strides with system alignment.
|
|
|
|
Details:
|
|
- Pass panel strides through bli_align_dim_to_sys() to ensure that each
|
|
subsequent packed panel of A and B begins at an aligned address. (The
|
|
first panel is presumably aligned to system alignment because it is
|
|
aligned to a page boundary, which is typically much larger.)
|
|
- Rearranged code in packm_init_pack() to prevent additional conditional
|
|
blocks as a result of the aforementioned change.
|
|
- Adjusted contiguous memory allocator so that the system memory alignment
|
|
is used to allocate enough space for each block no matter what kind of
|
|
register blocking is used (even if register blocksize is unit and every
|
|
row/column needs maximal padding).
|
|
- Adjusted default blocksizes in reference configuration so that MC*KC
|
|
and KC*NC result in identical footprints for all datatypes.
|
|
|
|
commit 40a0654ada5f256beb3da80ebba015a3c71fb61f
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Sun Mar 24 20:18:12 2013 -0500
|
|
|
|
CHANGELOG update.
|
|
|
|
commit b65cdc57d9e51fa00e3c03539cfb7e045707d0f4 (tag: 0.0.5)
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Sun Mar 24 20:01:49 2013 -0500
|
|
|
|
Migrated 'bl2' prefix to 'bli'.
|
|
|
|
Details:
|
|
- Changed all filename and function prefixes from 'bl2' to 'bli'.
|
|
- Changed the "blis2.h" header filename to "blis.h" and changed all
|
|
corresponding #include statements accordingly.
|
|
- Fixed incorrect association for Fran in CREDITS file.
|
|
|
|
commit 132bffcef7441f32d02cc7485aef6a0648e0ef1e
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Sun Mar 24 18:49:36 2013 -0500
|
|
|
|
Removed several 'old' directories and files.
|
|
|
|
Details:
|
|
- Removed most of the 'old' directories scattered throughout the framework,
|
|
which includes alternate/half-baked/broken implementations.
|
|
|
|
commit 551ea4767a3ea6c263f12aaca94bc2642cee4cfa
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Sun Mar 24 18:00:10 2013 -0500
|
|
|
|
Removed #include "blis2.h" from low-level headers.
|
|
|
|
Details:
|
|
- Removed #include of "blis2.h" from various lower-level, operation-specific
|
|
header files throughout the framework. Given that these low-level headers
|
|
are included within #blis2.h in a very specific order, #include'ing blis2.h
|
|
within them directly is unnecessary.
|
|
|
|
commit bc7b318ed0960edeb4537797dd8c91de0d942ca9
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Mar 22 17:18:58 2013 -0500
|
|
|
|
Added cpp guards to conflicting libflame typedefs.
|
|
|
|
Details:
|
|
- Added cpp guards around the definitions of dim_t, scomplex, and dcomplex.
|
|
This is a temporary hack to allow interoperability with libflame. (Similarly
|
|
temporary changes are being made to libflame's type definitions file.)
|
|
|
|
commit f469907503fcdc24dff0174c569170e6e756e045
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Mar 22 15:20:15 2013 -0500
|
|
|
|
Renamed MAX_PREFETCH_BYTE_OFFSET to MAX_PRELOAD_.
|
|
|
|
Details:
|
|
- Renamed BLIS_MAX_PREFETCH_BYTE_OFFSET to
|
|
BLIS_MAX_PRELOAD_BYTE_OFFSET since "prefetch" is kind of a loaded word
|
|
(e.g. "prefetch" instructions, which are different than the particular
|
|
kind of prefetching/preloading referred to by this constant).
|
|
|
|
commit d1023bfbc6668a58a01ee4f82ded2319911e7b19
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Mar 22 15:09:59 2013 -0500
|
|
|
|
Removed build/old directory.
|
|
|
|
commit 718888849c48d99f83eea6b8f83bc1998cffef7e
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Mar 22 15:07:01 2013 -0500
|
|
|
|
Deprecated 'flame' configuration.
|
|
|
|
Details:
|
|
- Removed 'flame' configuration, as it was horribly out-of-date.
|
|
- Comment changes to bl2_blocksize.c and bl2_mem.c.
|
|
|
|
commit bba38cf4e9d28058c14483f44fa074a6d2852ad9
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Mar 19 18:07:40 2013 -0500
|
|
|
|
Added missing conjbeta argument to scald.
|
|
|
|
commit 1f82b51d06d0279dded3f2b87ba59403f3ed0af6
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Mar 18 15:37:20 2013 -0500
|
|
|
|
Relocated packed mem_t dimension fields to obj_t.
|
|
|
|
Details:
|
|
- Removed the m and n (and elem_size) fields from the mem_t object, and added
|
|
m_packed and n_packed fields to obj_t. These new fields track the same as
|
|
the old ones. From an abstraction standpoint, it seemed awkward to store
|
|
those dimensions inside the mem_t.
|
|
- Updated interfaces to bl2_mem_acquire_*() so that only a byte size argument
|
|
is passed in, instead of m, n, and elem_size.
|
|
- Updated bl2_packm_init_pack() and bl2_packv_init_pack() to inline the
|
|
functionality of bl2_mem_alloc_update_m() and bl2_mem_alloc_update_v(),
|
|
respectively.
|
|
- Updated packm variants to access the packed length and width fields from
|
|
their new locations.
|
|
|
|
commit 36c782857bf9b8ac1b1dac47a70f689a4407e2cc
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Mar 18 10:37:03 2013 -0500
|
|
|
|
CHANGELOG update.
|
|
|
|
commit e7d41229d3b1674e74f47d7f29fae004a745201a (tag: 0.0.4)
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Mar 15 17:12:36 2013 -0500
|
|
|
|
Re-implemented contiguous memory allocator.
|
|
|
|
Details:
|
|
- Completely re-wrote the contiguous memory allocator (bl2_mem.c). The new
|
|
allocator instantiates and initializes three separate memory pool objects,
|
|
each one associated with a separate array of contiguous memory blocks, each
|
|
block of fixed and uniform size. (The three pools are for allocating mc-by-kc
|
|
blocks of A, kc-by-nc panels of B, and mc-by-nc panels of C.) The pool
|
|
objects use a stack structure internally to track which blocks in the region
|
|
have been "checked out" to a thread and which are still available. Critical
|
|
regions are now clearly marked and adaptable to parallel environments (e.g.
|
|
OpenMP). Memory pools are set up when bl2_init() is called.
|
|
- Added a new field to the packm control tree node, which indicates what kind
|
|
of packed buffer is being allocated. The enumerated type for this argument
|
|
is defined as packbuf_t in bl2_type_defs.h.
|
|
- Updated level-3 _cntl.c files to pass in the appropriate value for a new
|
|
packbuf_t argument to bl2_packm_cntl_obj_create().
|
|
- Moved some macros called by packm_init_pack() from bl2_obj_macro_defs.h to
|
|
bl2_mem_macro_defs.h.
|
|
- Added BLIS_MAX_NUM_THREADS to bl2_config.h, which we use as the default
|
|
number of blocks of A reserved for the memory allocator.
|
|
- Deprecated bl2_align_dim(). Replaced usage with that of
|
|
bl2_align_dim_to_mult(). Turns out that typically we don't need to align
|
|
a dimension to the system alignment, since that value has to do with
|
|
starting addresses, whereas the values we are dealing with are unitless
|
|
dimensions.
|
|
|
|
commit 1e76cae00cb0a04544aaae1ade878686b238d283
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Mar 15 12:21:42 2013 -0500
|
|
|
|
Perform her2k var1 loops in sequence.
|
|
|
|
Details:
|
|
- Changed variant 1 of her2k so that the two rank-k products are computed
|
|
and accumulated in sequence rather than fused into one loop. This is
|
|
necessary if BLIS is to be configured to provide only enough contiguous
|
|
memory for one panel of B.
|
|
|
|
commit c95c270eba91ae4efc26603beddfd0292caa919b
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Mar 7 14:42:15 2013 -0600
|
|
|
|
Enhanced tracking of dimensions for mem_t objects.
|
|
|
|
Details:
|
|
- Added new fields to mem_t struct definition to track the allocated (as
|
|
opposed to the currently used) dimensions of the memory region. This
|
|
allows packm_init() to be more robust in situations where memory is
|
|
already allocated but is more than needed for the current packing job.
|
|
- Updated logic in bl2_obj_set_buffer_with_cached_packm_mem() macro, used
|
|
in packm_init(), to update the "currently used" dimensions of the mem_t
|
|
object if the requested dimensions are smaller than the allocated
|
|
dimensions.
|
|
|
|
commit e99281a0f41d482fddeffa239bfc8e13e6d13d4b
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Mar 7 14:00:10 2013 -0600
|
|
|
|
Fixed test suite flop formulas for ops with side.
|
|
|
|
Details:
|
|
- Fixed incorrect flop counts in test suite modules for hemm, symm, trmm,
|
|
trmm3, and trsm.
|
|
- Comment updates in herk macro-kernels.
|
|
|
|
commit ef8cbfc44dd620fdcbdb51cdb173217194bebe31
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Sat Mar 2 12:47:06 2013 -0600
|
|
|
|
Added "version" to .gitignore.
|
|
|
|
Details:
|
|
- Added "version" to .gitignore file so that the file does not show up when
|
|
running 'git status', or accidentally get pulled into the index when
|
|
running 'git add' or 'git add --all'.
|
|
|
|
commit e9e0747c2f6c178f53ac46ab794acbb7b8c4fea8
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Sat Mar 2 12:43:54 2013 -0600
|
|
|
|
Removed version file from version control.
|
|
|
|
Details:
|
|
- Removed version file from version control to prevent git errors that occur
|
|
when trying to pull new commits.
|
|
|
|
commit bb612f864e9c17dd9805e9446840f02259619469
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Mar 1 12:55:42 2013 -0600
|
|
|
|
Updated behavior of bl2_obj_induce_trans() macro.
|
|
|
|
Details:
|
|
- Changed bl2_obj_induce_trans() so that the transposition bit is no longer
|
|
updated as part of the macro. All current uses of the macro have been
|
|
coupled with instances of bl2_obj_set_trans() to clear the bit.
|
|
- Added Jed to CREDITS file.
|
|
|
|
commit f24e29b789e7314764a818ceb3063126936c986f
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Feb 22 18:15:41 2013 -0600
|
|
|
|
Replaced banded/packed BLAS2 stubs with f2c code.
|
|
|
|
Details:
|
|
- Retired the blas2blis wrappers that simply called abort with a "not yet
|
|
implemented" message. This includes all of the level-2 banded and packed
|
|
routines.
|
|
- Replaced the aforementioned with the corresponding netlib implementations
|
|
having been run through f2c (with some customization).
|
|
- Added directories named 'attic' to build/gen-make-frags/ignore_list.
|
|
|
|
commit 1454c1a14207766dfed372b8e38b47fa384f5198
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Feb 22 12:38:45 2013 -0600
|
|
|
|
Moved Fortran name-mangling macro to bl2_config.h.
|
|
|
|
Details:
|
|
- Moved the Fortran-77 name-mangling macros from bl2_blas_macro_defs.h to the
|
|
configuration directory (bl2_config.h, specifically) given that it can be
|
|
expected to be tweaked by some developers.
|
|
|
|
commit ede75693e5a36c6006087c4a7df834175b604504 (tag: 0.0.3)
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Feb 22 12:11:24 2013 -0600
|
|
|
|
Implemented blas2blis compatibility layer.
|
|
|
|
Details:
|
|
- Added the blas2blis compatibility layer, located in frame/compat. This
|
|
includes virtually all of the BLAS, including banded and packed level-2
|
|
operations.
|
|
|
|
- Defined bl2_init_safe(), bl2_finalize_safe(). The former allows a conditional
|
|
initialization, which stores the "exit status" in an err_t, which is then
|
|
read by the latter function to determine whether finalization should actually
|
|
take place.
|
|
- Added calls to bl2_init_safe(), bl2_finalize_safe() to all level-2 and
|
|
level-3 BLAS-like wrappers.
|
|
- Added configuration option to instruct BLIS to remain initialized whenever
|
|
it automatically initializes itself (via bl2_init_safe()), until/unless the
|
|
application code explicitly calls bl2_finalize().
|
|
|
|
- Added INSERT_GENTFUNC* and INSERT_GENTPROT* macros to facilitate type
|
|
templatization of blas2blis wrappers.
|
|
- Defined level-0 scalar macro bl2_??swaps().
|
|
- Defined level-1v operation bl2_swapv().
|
|
- Defined some "Fortran" types to bl2_type_defs.h for use with BLAS
|
|
wrappers.
|
|
|
|
commit 995edf43e21c1868732dbdd7fee14b08730218bd
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Feb 21 14:30:50 2013 -0600
|
|
|
|
Updated version file. (Forgot to in prev commit).
|
|
|
|
commit e823b08aaf7b65ecc6ddc30570709ea8a4b52aa7
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Feb 21 12:00:17 2013 -0600
|
|
|
|
Fixed some scalar types in BLAS-like Herm APIs.
|
|
|
|
Details:
|
|
- Some of the scalars of Hermitian operations, such as alpha in her,
|
|
alpha and beta in herk, and beta in her2k, need to be real. These
|
|
arguments were typed incorrectly as the complex types. This has been
|
|
fixed. Note the issue was only present in the BLAS-like APIs for
|
|
these operations (not the native object-based interfaces).
|
|
|
|
commit 5ece050a669e74ba4a711d1d4669239d22d45642
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed Feb 20 15:50:54 2013 -0600
|
|
|
|
Updated version file. (Forgot to in prev commit).
|
|
|
|
commit f243034b8b430d4684680ea8eddfd246e73fefc0
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed Feb 20 14:11:36 2013 -0600
|
|
|
|
Changed API of packm_init_pack() to use blksz_t.
|
|
|
|
Details:
|
|
- Changed the interface of packm_init_pack() so that mult_m and mult_n
|
|
are passed in as type blksz_t* instead of dim_t.
|
|
- Make similar change for packv_init_pack().
|
|
|
|
commit da0c22f24107be9f33e0ea2dae52e5534b1fd0e5
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Feb 15 09:59:48 2013 -0600
|
|
|
|
Minor changes to lower levels of scalm and setm.
|
|
|
|
Details:
|
|
- Removed diagx parameter from lower-level interfaces of scalm.
|
|
- Modified scalm_basic_check() to expect an object with a nonunit diagonal.
|
|
- Changed setm_unb_var1() so that having an implicit unit diagonal results
|
|
in only the strictly lower or upper triangle of the matrix being modified.
|
|
|
|
commit 2c836adadcd2a7d7f217033ac4d7fcad03d5bd55
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Feb 14 10:42:56 2013 -0600
|
|
|
|
Updated beta == zero semantics of mulsc.
|
|
|
|
Details:
|
|
- Updated beta == zero semantics of mulsc. Hopefully this is the last
|
|
operation that needed updating.
|
|
- Added Devin to CREDITS file.
|
|
|
|
commit 722b66c7dcaaaa1b109e7c8b1d53fd71a9af8240
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Feb 14 10:18:00 2013 -0600
|
|
|
|
Removed some calls to setv() in test modules.
|
|
|
|
Details:
|
|
- Removed calls to setv() in test modules whose sole purpose was to
|
|
initialize vectors to zero to ensure that nan's and inf's would not
|
|
taint the computation. Now that beta == zero semantics have been
|
|
updated to clear the output operand (when beta is zero), rather than
|
|
multiply against it, these setv() calls are no longer needed.
|
|
|
|
commit e6ac623a902f776c42f85eadbf76996d9770a0db
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed Feb 13 18:44:59 2013 -0600
|
|
|
|
Properly implemented beta == 0 semantics.
|
|
|
|
Details:
|
|
- Changed name of set0 and set0_mxn macros to set0s and set0s_mxn,
|
|
respectively.
|
|
- Added code to the following operations that sets the output operand to
|
|
zero if the corresponding scalar is zero (rather than performing the
|
|
floating-point multiply, or in the case of setv, copying the value).
|
|
This will prevent nan's and inf's from creeping into results from
|
|
uninitialized memory.
|
|
- axpy
|
|
- dotxv
|
|
- scalv
|
|
- scal2v
|
|
- setv
|
|
- gemv
|
|
- ger
|
|
- hemv
|
|
- her
|
|
- her2
|
|
- gemm reference ukernels
|
|
|
|
commit aedccbc85d491e41711a0c6eb0d246d8700a199a
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed Feb 13 18:29:53 2013 -0600
|
|
|
|
Fixed stale interface to packm_unb_var1().
|
|
|
|
Details:
|
|
- Removed the control tree from the interface to packm_unb_var1(), which
|
|
I meant to do when it was un-deprecated.
|
|
|
|
commit c23135669f7a8a545e2e11ef559bf284be8bc65c
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed Feb 13 13:21:00 2013 -0600
|
|
|
|
Un-deprecated packm_unb_var1.c (needed by l2 ops).
|
|
|
|
Details:
|
|
- Added bl2_packm_unb_var1() back into the mix once I realized that level-2
|
|
operations still need this routine for packing matrices. Now, whether
|
|
level-2 operations should be packing matrices to begin with is another
|
|
matter. But this fixes the segmentation fault one would have gotten when
|
|
running bl2_gemv() on a general stride matrix.
|
|
|
|
commit cf49e35f9819f9d93ebdca4703ade5abab28f6f6
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Feb 12 18:39:35 2013 -0600
|
|
|
|
Removed cntl tree usage from packm implementation.
|
|
|
|
Details:
|
|
- Added new fields to obj_t info field:
|
|
- invert_diag
|
|
- pack_order_if_upper
|
|
- pack_order_if_lower
|
|
These fields allow packm_init() to embed information that begins
|
|
in the control tree into the object so that the packm implementation
|
|
does not need to use control trees at all. This is being done to aid
|
|
Bryan's DxT code generation.
|
|
- Added macros that operate on above fields.
|
|
- Changed packm_init(), packm_blk_var2(), and packm_blk_var3() according
|
|
to above changes.
|
|
- Made similar (but much simpler) changes to packv.
|
|
- Deprecated packm_blk_var1(), packm_unb_var1(), and packm_densify().
|
|
These were part of prototype implementations and are no longer needed.
|
|
|
|
commit eb139ae256651af7820b93ef982626180195b87f
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Feb 12 12:39:30 2013 -0600
|
|
|
|
Replaced bl2_abs() with _fabs() where appropriate.
|
|
|
|
commit 474bac30c99928f9e87315972bcb45c632c0b7ec
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Feb 12 12:23:48 2013 -0600
|
|
|
|
Removed level-0 macros projrs, grabis.
|
|
|
|
Details:
|
|
- Replaced instances of projrs and grabis macros with newer,
|
|
more general-purpose getris.
|
|
|
|
commit 03a260a457c8964e4603a655cee0d40ac17affba
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Feb 12 11:45:34 2013 -0600
|
|
|
|
Restored executable permissions to scripts.
|
|
|
|
Details:
|
|
- Restored executable (0755) permissions to scripts that were touched by
|
|
the recursive sed script that updated the copyright headers in the
|
|
previous commit.
|
|
|
|
commit 1274e1243775e5e705114257a43176f63635227f
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Feb 11 14:37:47 2013 -0600
|
|
|
|
Updated copyright headers from 2012 to 2013.
|
|
|
|
commit 3b620cc8e90c53c79129bd9dd89ae6b77c2446f1
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Feb 11 13:38:07 2013 -0600
|
|
|
|
CHANGELOG update.
|
|
|
|
commit 768fcebaa8be0eb936a6e7a02cd8a19438c79d99 (tag: 0.0.2)
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Feb 11 13:20:44 2013 -0600
|
|
|
|
Added unified test suite, and many fixes.
|
|
|
|
Details:
|
|
- Added a highly configurable, unified test suite.
|
|
|
|
- Removed DUPB configuration constant from bl2_kernel.h and macro-kernel
|
|
header files. Now, instead, DUPB is computed as (NDUP != 1) within each
|
|
macro-kernel. This fixes a bug in trmm/trsm whereby bp was indexed into
|
|
incorrectly when DUPB was set to FALSE but the NDUP was still non-unit.
|
|
By encoding both pieces of information into one constant in _kernel.h,
|
|
it seems somewhat less likely others will encounter this bug in the
|
|
future.
|
|
- Added level-2 cache blocksizes to _kernel.h for reference configuration,
|
|
and defined blocksizes in _cntl.c files to these default values.
|
|
|
|
- Changed semantics of her2k and syr2k such that these operations no longer
|
|
expect the B matrix to already be conjugate-transposed (or just transposed
|
|
for syr2k). However, these semantics are preserved for the internal
|
|
mechanics of the implementations, including the internal back-end and all
|
|
blocked variants.
|
|
- Inserted checks for real-valued alpha and beta for herk/her2k and herk,
|
|
respectively.
|
|
|
|
- Relaxed general object structure constraints in _basic_check() for gemv, ger.
|
|
- Changed her front-end to NOT copy-cast to real projection; instead, this is
|
|
replaced by selecting either the real part or both parts within the unblocked
|
|
algorithm implementation, depending on the value of conjh.
|
|
- Added conjh to all _check routines for her so that the code knows when to
|
|
verify that alpha has an imaginary component equal to zero (for her, but
|
|
not syr).
|
|
- Changed control tree for her to forgo packing.
|
|
|
|
- Added unit diagonal support to fnormm.
|
|
- Redefined real versions of abval2s macros in terms of fabs(), fabsf().
|
|
- Redefined complex versions of sqrt2s macros using the actual "complex square
|
|
root" formula.
|
|
- Created new level-0 object-based routines, suffixed with "sc" (for "scalar").
|
|
- Defined new level-1v, -1d, and -1m versions of add and sub operations
|
|
(two-operand add and subtract).
|
|
- Added new scalar macros:
|
|
- getris: acquire real and imaginary components.
|
|
- setris: set real and imaginary components.
|
|
- addjs: addition with conjugated x.
|
|
- subjs: subtraction with conjugated x.
|
|
- Defined new utility operations:
|
|
- absumv: element-wise sum of absolute values for vector elements.
|
|
- absumm: element-wise sum of absolute values for matrix elements.
|
|
- mkherm: convert existing matrix to Hermitian.
|
|
- mksymm: convert existing matrix to symmetric.
|
|
- mktrim: convert existing matrix to triangular.
|
|
|
|
- Added various error checking routines.
|
|
- Added bl2_clock_min_diff(), which is used to more cleanly measure the
|
|
wall clock time of a code block.
|
|
- Added general stride support to bl2_obj_alloc_buffer().
|
|
- Added bl2_obj_init_scalar().
|
|
- Updated parameter mapping in bl2_param_map.c.
|
|
- Added support for queriable version string.
|
|
|
|
- Fixed a bug in the her2k macro-kernels (which currently are simply
|
|
implemented in terms of two invocations of herk) whereby beta was being
|
|
applied to both the first and second rank-k updates, rather than only
|
|
the first.
|
|
- Fixed a bug in trmm/trsm whereby transpose and right side cases were not
|
|
properly implemented due to erroneous assumptions regarding aliasing and
|
|
root objects.
|
|
- Fixed a bug in the upper triangular trsm macro-kernel in which the wrong
|
|
MR x NR block of B was being updated.
|
|
- Fixed a bug in the inverts macro in the double real case whereby the
|
|
value was typecast to float before inversion. This affected non-unit cases
|
|
of dtrsm.
|
|
- Fixed a bug in the reference kernels for gemmtrsm whereby the minus one
|
|
constant was being applied incorrectly.
|
|
- Fixed a bug in the overall treatment of non-unit alpha for trsm. The code
|
|
now mimics the rank-k strategy of gemm, whereby alpah is applied during
|
|
the first iteration of variant 3, with BLIS_ONE passed in instead for
|
|
subsequent iterations. This also required passing alpha into the macro-
|
|
kernels as well as the fused gemmtrsm micro-kernels.
|
|
- Fixed a bug in trsm_u_blk_var1 whereby the gemm macro-kernel was being
|
|
called for blocks strictly above the diagonal. While this sounds good in
|
|
theory, this cannot be done because gemm_ker_var2 expects row panels of
|
|
A to be packed from top to bottom, while for trsm_u, A is actually packed
|
|
from bottom to top due to the reverse (BR->TL) nature of the algorithm.
|
|
- Fixed a bug in packm_cxk() whereby panel packings with unit panel
|
|
dimensions were mishandled due to incorrect arguments to the copyv kernel.
|
|
Also changed the copyv kernel invocation to scal2v so that these edge
|
|
cases are properly handled when scaling is requested.
|
|
- Fixed a bug in packv_int() whereby an uninitialized object is passed in
|
|
instead of the source object.
|
|
- Fixed a bug whereby level-2 code could allocate memory dynamically via
|
|
bl2_malloc() and then attempt to free it via bl2_mm_release(). Also fixed
|
|
a potential future bug whereby a mem_t object that is actually no longer
|
|
"allocated" from the static pool is mistaken for being allocated due to
|
|
failure to NULLify the buffer when the block was most recently released.
|
|
- Fixed a bug in bl2_acquire_mpart_*() whreby the uplo field was mistakenly
|
|
toggled when the requested subpartition needed to be "reflected" due to it
|
|
residing in an unstored region.
|
|
|
|
commit be94fb84c0351602d7585269f29998e3bf83f899
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Jan 4 10:55:21 2013 -0600
|
|
|
|
Added missing 'd' to fused gemmtrsm function name.
|
|
|
|
commit 879a179e1dee36f0c56765f2ab91a26861019b34
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Jan 4 10:37:27 2013 -0600
|
|
|
|
Added debug statements to bl2_mm_acquire_m().
|
|
|
|
Details:
|
|
- Added printf() statements to bl2_mm_acquire_m() to help debug issues
|
|
with prematurely exhausted memory pool.
|
|
- Removed 'd' from kernel names of reference kernels in clarksville
|
|
configuration's bl2_kernel.h
|
|
|
|
commit 806e74beb4eafeef620a555ffbb3f6779e29c7b6
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Dec 20 17:07:50 2012 -0600
|
|
|
|
Defined Frobenius norm operations.
|
|
|
|
Details:
|
|
- Added level-0 grabis macro operation to grab imaginary component of one
|
|
variable and copy it to the real component of another variable.
|
|
- Defined sumsqv operation, which computes the sum of the absolute squares
|
|
of the elements of a vector. This implementation is modeled after ?lassq
|
|
in netlib LAPACK.
|
|
- Defined fnormv and fnormm operations, which compute the Frobenius norm on
|
|
vectors and matrices, respectively. These operations are treated as one-
|
|
operand operations where the output norm value is the real projection of
|
|
the datatype of the input operand. Both operations are implemented in terms
|
|
of sumsqv.
|
|
|
|
commit 66e80ce1aec099b2b2b0c4f295e38add2c921383
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Dec 20 17:02:55 2012 -0600
|
|
|
|
Added GENT*R macros; tweaked bl2_machval defs.
|
|
|
|
Details:
|
|
- Added function and prototype macro-generating macros for GENTFUNCR and
|
|
GENTPROTR, which are one-operand macros with auxiliary real projection
|
|
types.
|
|
- Tweaked bl2_machval files to use new macros.
|
|
|
|
commit 2fecc88ca22142020573f168da715e8e9f3dd7de
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Dec 20 11:35:14 2012 -0600
|
|
|
|
Fixed harmless macro bug in level-1m operations.
|
|
|
|
Details:
|
|
- Fixed some inconsistent usage of n_iter_max and n_iter in the two
|
|
bl2_set_dims_incs_uplo_[12]m macros. The right thing ended up happening
|
|
despite the bug, which is why I had not discovered it until now.
|
|
|
|
commit 8945db6ec9f82168cf72411ad408b4fdb44ae0d1
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Dec 18 15:07:36 2012 -0600
|
|
|
|
Renamed x86,x86_64 kernels to indicate 'd' fusing.
|
|
|
|
Details:
|
|
- Renamed x86 and x86_64 kernels to contain a 'd' before the fusing shape
|
|
to emphasize that the fusing shape is not for all datatype instances, but
|
|
rather just for one (that of double-precision real). Other fusing shapes
|
|
would be proportional to their precision and domain "byte footprints".
|
|
- Corresponding changes to config/clarksville/bl2_kernel.h.
|
|
|
|
commit 6fbbdd4e194d06096ad08c5db61127be338067db
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Dec 18 14:34:02 2012 -0600
|
|
|
|
More tweaks to _config.h, _kernel.h; smem tweaks.
|
|
|
|
Details:
|
|
- Moved kernel-related definitions form bl2_config.h to bl2_kernel.h.
|
|
- Replaced #define of _GNU_SOURCE with #define of _POSIX_C_SOURCE. This
|
|
accomplishes the same thing (enabling posix_memalign()) without enabling
|
|
all of the GNU extensions we don't need.
|
|
- Defined the size of the static memory pool in terms of MC, KC, and NC,
|
|
as well as two new constants that determine how many MCxKC blocks and
|
|
how many KCxNC blocks should be allocated (defined in bl2_config.h).
|
|
- In the case of static memory pool exhaustion, replaced the generic
|
|
bl2_abort() with a specific error code call.
|
|
|
|
commit 5d8bdb21c48e8fb11bef6128a242122cc1470a99
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Dec 17 16:07:36 2012 -0600
|
|
|
|
Minor reordering of bl2_config.h definitions.
|
|
|
|
commit 4a83f67490136a898f558e273b76a687aed8b893
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Dec 17 12:35:54 2012 -0600
|
|
|
|
Consolidated configuration headers.
|
|
|
|
Details:
|
|
- Merged contents of bl2_arch.h into bl2_config.h for reference and
|
|
clarksville configurations.
|
|
- Updated CREDITS, INSTALL, LICENSE, README files.
|
|
|
|
commit 0670c33cc14612f636ef09ede4133404ae0af6ba
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Dec 14 12:45:26 2012 -0600
|
|
|
|
Fixed bug in reference gemm ukernels.
|
|
|
|
Details:
|
|
- Fixed a bug whereby, for the reference gemm ukernels, the matrix product
|
|
was not correctly accumulated and scaled (by alpha) into the output matrix
|
|
C. (Thanks to Fran for finding this bug.)
|
|
- Whitespace changes to reference trsm kernels.
|
|
|
|
commit e2e7cb2fbe615be4d375bc2dce88d03d98fadc9e
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Dec 13 18:17:54 2012 -0600
|
|
|
|
Expanded reference packm/unpackm kernel set to 16.
|
|
|
|
Details:
|
|
- Added 10xk, 12xk, 14xk, and 16xk reference kernels for packm and
|
|
unpackm.
|
|
- Updated bl2_[un]packm_cxk() to silently use scal2m if "out of range"
|
|
kernel size is requested. (Thanks to Tyler for finding this bug.)
|
|
- Updated bl2_kernel.h to contain new _KERNEL definitions, according
|
|
to above changes, for 'reference' and 'clarksville' configurations.
|
|
- Updated CHANGELOG.
|
|
- Removed "output*.m" from .gitignore.
|
|
|
|
commit 17455a8bce038dd570356ab0c5c11d9a89f20248
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Dec 10 17:23:32 2012 -0600
|
|
|
|
Minor updates towards to 0.0.1.
|
|
|
|
commit 7ad4ebef38b8e6eea9b6091844ba7294ec870271 (tag: 0.0.1)
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Dec 10 16:18:40 2012 -0600
|
|
|
|
Tweaks to get BLIS compiling again on clarksville.
|
|
|
|
Details:
|
|
- Updated header files and make_defs.mk in config/clarksville.
|
|
- Fixes to bl2_mem.c (now that SMEM_M, SMEM_N are gone).
|
|
- Moved definition of blksz_t from bl2_cntl.h to bl2_type_defs.h.
|
|
- Shuffled include statements in blis2.h.
|
|
|
|
commit cc58ea86010b1f046134d13b546c878389df9af5
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Dec 10 14:55:12 2012 -0600
|
|
|
|
Added template fragment.mk; updated .gitignore.
|
|
|
|
commit 714c527b0eb153b7e2040b79349edc8372f743fd
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Dec 7 19:54:04 2012 -0600
|
|
|
|
Added 'changelog' make target; other tweaks.
|
|
|
|
Details:
|
|
- Updated CHANGELOG.
|
|
- Added 'changelog' target to Makefile that runs 'git log --decorate' and
|
|
overwrites CHANGELOG with the output.
|
|
- Other trivial changes.
|
|
|
|
commit e4e5404d26aded4873278e85faf6f14ac32115b5
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Dec 7 17:34:53 2012 -0600
|
|
|
|
Define static memory pool size in bl2_config.h.
|
|
|
|
commit 19bb507d0de6a2bd3ce37cf616bdcd6b419ed641
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Dec 7 17:18:00 2012 -0600
|
|
|
|
Refined INSTALL text; added 'showconfig' target.
|
|
|
|
Details:
|
|
- Added 'showconfig' target to Makefile.
|
|
- Added header files and ./config/<configname>/make_defs.mk as prerequisites
|
|
to object file rules.
|
|
- Added config.mk as prerequisite to library install rules.
|
|
- Edited and added to INSTALL file.
|
|
|
|
commit 26cb659dd79636489db5a051aa60fff80273a7b9
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Dec 6 15:34:53 2012 -0600
|
|
|
|
Added auto-detection of version string (via git).
|
|
|
|
Details:
|
|
- Added build/update-version-file.sh script for auto-detecting "version"
|
|
string and updating 'version' file accordingly. (If .git directory is
|
|
not present, then it is assumed this copy of BLIS is a downloaded
|
|
release, in which case 'version' file is left unchanged.)
|
|
- Added invocation of update-version-file.sh to configure script.
|
|
|
|
commit b0ecd0ff52fa6ffc9e1d9eb44c365f7f009a6204
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Dec 6 14:27:11 2012 -0600
|
|
|
|
Wrote first draft of INSTALL file.
|
|
|
|
commit bcbe81235a35ccfdbcc2f2319a0ca6e04f75a785 (tag: 0.0.0)
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Dec 6 12:42:35 2012 -0600
|
|
|
|
Updated standalone test Makefile and other fixes.
|
|
|
|
Details:
|
|
- Major edits to test/Makefile to bring up-to-date wrt new build system;
|
|
should no longer be broken.
|
|
- Minor edits to top-level Makefile.
|
|
- Fixed copy-and-paste bugs in
|
|
- frame/1m/packm/ukernels/bl2_packm_ref_?xk.c
|
|
- frame/1m/unpackm/ukernels/bl2_unpackm_ref_?xk.c
|
|
|
|
commit 2f272b40f43307909736327f49d17737c7a05d37
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Dec 4 19:22:14 2012 -0600
|
|
|
|
Added build system and continued reorganization.
|
|
|
|
Details:
|
|
- Added/renamed packm, unpackm kernels.
|
|
- Added machine value routines.
|
|
- Added param_map facility.
|
|
- Renamed AUTHORS to CREDITS.
|
|
- Added Makefile; continued to expand upon existing configure script.
|
|
- #define fuse_fac macros in operation headers if not defined already
|
|
(by the user in bl2_kernels.h).
|
|
|
|
commit 00f3498a8943be1b387f0d5c029c8c7891687ad5
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Dec 3 12:36:11 2012 -0600
|
|
|
|
Initial commit.
|