mirror of
https://github.com/amd/blis.git
synced 2026-04-19 23:28:52 +00:00
ReleaseNotes.md update in advance of next version.
Details: - Updated ReleaseNotes.md in preparation for next version. Change-Id: I2aa6f944ce2584de85ae7b6921ff0193b3b7020a
This commit is contained in:
committed by
Devrajegowda, Kiran
parent
51f87f3e42
commit
d6496d55cc
@@ -4,6 +4,7 @@
|
||||
|
||||
## Contents
|
||||
|
||||
* [Changes in 0.6.1](ReleaseNotes.md#changes-in-061)
|
||||
* [Changes in 0.6.0](ReleaseNotes.md#changes-in-060)
|
||||
* [Changes in 0.5.2](ReleaseNotes.md#changes-in-052)
|
||||
* [Changes in 0.5.1](ReleaseNotes.md#changes-in-051)
|
||||
@@ -35,6 +36,61 @@
|
||||
* [Changes in 0.0.2](ReleaseNotes.md#changes-in-002)
|
||||
* [Changes in 0.0.1](ReleaseNotes.md#changes-in-001)
|
||||
|
||||
## Changes in 0.6.1
|
||||
January 14, 2020
|
||||
|
||||
Improvements present in 0.6.1:
|
||||
|
||||
Framework:
|
||||
- Added support for pre-broadcast when packing B. This causes elements of B to be repeated (broadcast) in the packed copy of B so that subsequent vector loads will result in the element already being pre-broadcast into the vector register.
|
||||
- Added support for selective packing to `gemmsup` (controlled via environment variables and/or the `rntm_t` object). (AMD)
|
||||
- Fixed a bug in `sdsdot_sub()` that redundantly added the "alpha" scalar and a separate bug in the order of typecasting intermediate products in `sdsdot_()`. (Simon Lukas Märtens, Devin Matthews)
|
||||
- Fixed an obscure bug in `bli_acquire_mpart_mdim()`/`bli_acquire_mpart_ndim()`. (Minh Quan Ho)
|
||||
- Fixed a subtle and complicated bug that only manifested via the BLAS test drivers in the `generic` subconfiguration, and possibly any other subconfiguration that did not register complex-domain `gemm` ukernels, or registered ONLY real-domain ukernels as row-preferential. (Dave Love)
|
||||
- Always use `sumsqv` to compute `normfv` instead of the "dot product trick" that was previously employed for performance reasons. (Roman Yurchak, Devin Matthews, and Isuru Fernando)
|
||||
- Fixed bug in `thrinfo_t` debugging/printing code.
|
||||
|
||||
Kernels:
|
||||
- Implemented and registered an optimized `dgemm` microkernel for the `power9` kernel set. (Nicholai Tukanov)
|
||||
- Pacify a `restrict` warning in the `gemmtrsm4m1` reference ukernel. (Dave Love, Devin Matthews)
|
||||
|
||||
Build system:
|
||||
- Fixed parsing in `vpu_count()` on some SkylakeX workstations. (Dave Love)
|
||||
- Reimplemented `bli_cpuid_query()` for ARM to use `stdio`-based functions instead of `popen()`. (Dave Love)
|
||||
- Use `-march=znver1` for clang on `zen2` subconfig.
|
||||
- Updated `-march` flags for `sandybridge`, `haswell` subconfigurations to use newer syntax (e.g. `haswell` instead of `core-avx2` and `sandybridge` instead of `corei7-avx`.
|
||||
- Correctly use `-qopenmp-simd` for reference kernels when compiling with icc. (Victor Eikjhout)
|
||||
- Added `-march` support for select gcc version ranges where flag syntax changes or new flags are added. The ranges we identify are: versions older than 4.9.0; versions older than 6.1.0 (but newer than 4.9.0); versions older than 9.1.0 (but newer than 6.1.0).
|
||||
- Use `-funsafe-math-optimizations` and `-ffp-contract=fast` for all reference kernels when using gcc or clang.
|
||||
- Updated MC cache blocksizes used by `haswell` subconfig.
|
||||
- Updated NC cache blocksizes used by `zen` subconfig.
|
||||
- Fixed a typo in the context registration of the `cortexa53` subconfiguration in `bli_gks.c`. (Francisco Igual)
|
||||
- Output a more informative error when the user manually targets a subconfiguration that configure places in the configuration blacklist. (Tze Meng Low)
|
||||
- Set execute bits of shared library at install-time. (Adam J. Stewart)
|
||||
- Added missing thread-related symbols for export to shared libraries. (Kyungmin Lee)
|
||||
- Removed (finally) the `attic/windows` directory since we offer Windows DLL support via AppVeyor's build artifacts, and thus that directory was only likely confusing people.
|
||||
|
||||
Testing:
|
||||
- Fixed latent testsuite microkernel module bug for `power9` subconfig. (Jeff Hammond)
|
||||
- Added `test/1m4m` driver directory for test drivers related to the 1m paper.
|
||||
- Added libxsmm support to `test/sup drivers`. (Robert van de Geijn)
|
||||
- Updated `.travis.yml` and `do_sde.sh` to automatically accept SDE license and download SDE directly from Intel. (Devin Matthews, Jeff Hammond)
|
||||
- Updated standalone test drivers to iterate backwards through the specified problem space. This often helps avoid the situation whereby the CPU doesn't immediately throttle up to its maximum clock frequency, which can produce strange discontinuities (sharply rising "cliffs") in performance graphs.
|
||||
- Pacify an unused variable warning in `blastest/f2c/lread.c`. (Jeff Hammond)
|
||||
- Various other minor fixes/tweaks to test drivers.
|
||||
|
||||
Documentation:
|
||||
- Added libxsmm results to `docs/PerformanceSmall.md`.
|
||||
- Added BLASFEO results to `docs/PerformanceSmall.md`.
|
||||
- Added the page size and location of the performance drivers to `docs/Performance.md` and `docs/PerformanceSmall.md`. (Dave Love)
|
||||
- Added notes to `docs/Multithreading.md` regarding the nuances of setting multithreading parameters the manual way vs. the automatic way. (Jérémie du Boisberranger)
|
||||
- Added a section on reproduction to `docs/Performance.md` and `docs/PerformanceSmall.md`. (Dave Love)
|
||||
- Documented Eigen `-march=native` hack in `docs/Performance.md` and `docs/PerformanceSmall.md`. (Sameer Agarwal)
|
||||
- Inserted multithreading links and disclaimers to `BuildSystem.md`. (Jeff Diamond)
|
||||
- Fixed typo in description for `bli_?axpy2v()` in `docs/BLISTypedAPI.md`. (Shmuel Levine)
|
||||
- Added "How to Download BLIS" section to `README.md`. (Jeff Diamond)
|
||||
- Various other minor documentation fixes.
|
||||
|
||||
## Changes in 0.6.0
|
||||
June 3, 2019
|
||||
|
||||
|
||||
Reference in New Issue
Block a user