Commit Graph

1886 Commits

Author SHA1 Message Date
Devin Matthews
8945a1512d This version gets ~1550 GFLOPs on KNL wuth 16x4. 2016-08-03 11:28:24 -05:00
praveeng
cdfb3c3f29 Merge master code as on 2016_07_29 to amd-staging branch by praveeng
Change-Id: Ic78b84d8b8d10158fb2a612f9a64bbc7b1f9b486
2016-07-29 12:46:21 +05:30
praveeng
4bc842ca3a Merge branch 'master' of publicrepo 2016-07-28 17:32:12 +05:30
praveeng
117f883851 Revert commits 357c990bdd
Change-Id: I12a34456d7eed93fda4369e76bcddb42ba7ccb99
2016-07-28 17:30:53 +05:30
praveeng
2fcdc28f10 Revert commits 8aee306
Change-Id: I3dd999c77c6779332a40dbb84371ca487216f189
2016-07-28 17:30:53 +05:30
praveeng
1b5d104afe removed changes from readme file which are giving confilcts
Change-Id: Ic71ad1313e1404fed444e899466043704d875af6
2016-07-28 17:30:52 +05:30
praveeng
d81273047b first commit
Change-Id: Ib50c81acda3b2c1583da3d421efc0ca547ef68e2
2016-07-28 17:30:04 +05:30
praveeng
65905c3011 small modification to readme for git push test
Change-Id: I68506a49586b07eaa907f3f85304ee40d4c92d0a
2016-07-28 17:28:55 +05:30
praveeng
23cca231be first commit
Change-Id: Ib50c81acda3b2c1583da3d421efc0ca547ef68e2
2016-07-28 17:26:49 +05:30
praveeng
922e309170 small modification to readme for git push test
Change-Id: I68506a49586b07eaa907f3f85304ee40d4c92d0a
2016-07-28 17:26:49 +05:30
praveeng
b0d510bf0e Revert commits 357c990bdd
Change-Id: I12a34456d7eed93fda4369e76bcddb42ba7ccb99
2016-07-28 15:11:08 +05:30
praveeng
5ebeece5b4 Revert commits 8aee306
Change-Id: I3dd999c77c6779332a40dbb84371ca487216f189
2016-07-28 15:01:36 +05:30
Devin Matthews
6ce4c022eb Switch back to 24x8. I could only squeeze 24.5GFLOP out of 8x24, and scalability is not improved. 2016-07-27 16:26:36 -05:00
Field G. Van Zee
d52cb76715 Merge branch 'master' into compose 2016-07-27 16:04:55 -05:00
Field G. Van Zee
c31b1e7b9d Relax alignment restrictions for sandybridge ukrs.
Details:
- Relaxed the base pointer and leading dimension alignment restrictions
  in the sandybridge gemm microkernels, allowing the use of vmovups/vmovupd
  instead of vmovaps/vmovapd. These change mimic those made to the haswell
  microkernels in e0d2fa0 and ee2c139.
- Updated testsuite modules as well as standalone test drivers in 'test'
  directory to use DBL_MAX as the initial time candidate. Thanks to Devin
  Matthews for suggesting this change.
- Inserted #include "float.h" into bli_system.h (to gain access to DBL_MAX).
- Minor update (vis-a-vis contexts) to driver code in test/3m4m.
2016-07-27 15:58:07 -05:00
Devin Matthews
b8f2b55532 Try an 8x24 kernel for the hell of it. 2016-07-27 15:22:55 -05:00
Devin Matthews
7ede5863ae Allocate pack buffer on MCDRAM for KNL. 2016-07-27 13:42:32 -06:00
Devin Matthews
ad89ed2e82 Merge branch 'knl' of github.com:devinamatthews/blis into knl 2016-07-27 11:45:40 -05:00
Devin Matthews
2c9de740ed This version gets ~26GF on one core. 2016-07-27 11:44:54 -05:00
Devin Matthews
81e2b05f31 Add optimized packing kernels for KNL. 2016-07-27 11:39:05 -05:00
Devin Matthews
a7d8ca97b8 All fixed. 2016-07-25 15:15:13 -05:00
Devin Matthews
963d0393b0 Add 24xk pack kernel. 2016-07-25 14:40:53 -05:00
Devin Matthews
117b76739a In the midst of debugging. 2016-07-25 13:53:07 -05:00
Devin Matthews
8c0a4fd1d3 Fix some row/column confusion. 2016-07-25 13:09:24 -05:00
Devin Matthews
c44f9f9693 Simplify displacements -- clang assembler was badly botching EVEX compressed displacements giving false alarms for instruction length. 2016-07-25 12:02:24 -05:00
Devin Matthews
e0cce177cc Minor fixes for 8x24 KNL kernel. 2016-07-25 10:02:25 -05:00
praveeng
50a2f2efcb Merge master code as on 2016_07_25 to amd-staging branch by praveeng
Change-Id: I84886ae241db2aac0bef6b7ef399f04aa8bca16d
2016-07-25 17:07:38 +05:30
praveeng
cfd46c88d5 Merge remote-tracking branch 'publicrepo/master' 2016-07-25 15:38:13 +05:30
praveeng
f493bf4d70 removed changes from readme file which are giving confilcts
Change-Id: Ic71ad1313e1404fed444e899466043704d875af6
2016-07-25 14:14:00 +05:30
Devin Matthews
65735bbedf Switch to 24x8 kernel, unrolled by 16. 2016-07-24 21:50:32 -05:00
Devin Matthews
45d5dc9717 Add 24x8 "KNC-style" kernel for KNL. 2016-07-24 14:25:26 -05:00
Field G. Van Zee
95abea46f8 Merge branch 'master' into compose 2016-07-23 15:38:33 -05:00
Field G. Van Zee
a017062fdf Integrated "memory broker" (membrk_t) abstraction.
Details:
- Integrated a patch originally authored and submitted by Ricardo Magana
  of HP Enterprise. The changeset inserts use of a new object type, membrk_t,
  (memory broker) that allows multiple sets of memory pools on, for example,
  separate NUMA nodes, each of which has a separate memory space.
- Added membrk field to cntx_t and defined corresponding accessor macros.
- Added membrk field to mem_t object and defined corresponding accessor macros.
- Created new bli_membrk.c file, which contains the new memory broker API,
  including:
    bli_membrk_init(), bli_membrk_finalize()
    bli_membrk_acquire_[mv](), bli_membrk_release(),
    bli_membrk_init_pools(), bli_membrk_reinit_pools(),
    bli_membrk_finalize_pools(),
    bli_membrk_pool_size()
- In bli_mem.c, changed function calls to
    bli_mem_init_pools()     -> bli_membrk_init()
    bli_mem_reinit_pools()   -> bli_membrk_reinit()
    bli_mem_finalize_pools() -> bli_membrk_finalize()
- In bli_packv_init.c, bli_packm_init.c, changed function calls to:
    bli_mem_acquire_[mv]() -> bli_membrk_acquire_[mv]()
    bli_mem_release()      -> bli_membrk_release()
- Added bli_mutex.c and related files to frame/thread. These files define
  abstract mutexes (locks) and corresponding APIs for pthreads, openmp, or
  single-threaded execution. This new API is employed within functions
  such as bli_membrk_acquire_[mv]() and bli_membrk_release().
2016-07-22 17:02:59 -05:00
Devin Matthews
8ff2e069c4 Add 4x unrolled variant for KNL microkernel. 2016-07-22 16:22:26 -05:00
Devin Matthews
9cb2ed9b0c Git rid of one RBX update. 2016-07-22 16:10:30 -05:00
Devin Matthews
451bde076f Add some more knobs to twiddle for KNL microkernel. 2016-07-22 15:43:00 -05:00
Devin Matthews
8c6e621c09 Make knl conform to new kernel dir structure. 2016-07-22 15:05:15 -05:00
Devin Matthews
ce7214c661 Merge remote-tracking branch 'origin/master' into knl 2016-07-22 14:59:53 -05:00
Field G. Van Zee
ce59f81108 Merge pull request #88 from devinamatthews/32bit-dim_t
Handle 32-bit dim_t in 64-bit microkernels.
2016-07-22 14:48:14 -05:00
Devin Matthews
707a2b7fac Somehow forgot the most important microkernel. 2016-07-22 13:49:44 -05:00
Devin Matthews
47ec045056 Merge remote-tracking branch 'upstream/master' into 32bit-dim_t 2016-07-22 13:45:23 -05:00
Devin Matthews
08f1d6b6fa Use 64-bit intermediate variable for k for architectures that do 64-bit loads in case dim_t is 32-bit. 2016-07-22 13:44:37 -05:00
Field G. Van Zee
ff41153f4e Merge pull request #86 from devinamatthews/haswell-vmovups
Remove alignment restrictions on C in haswell kernel.
2016-07-22 13:21:03 -05:00
Devin Matthews
e0d2fa0d83 Relax alignment restrictions for haswell sgemm. 2016-07-22 12:56:51 -05:00
Field G. Van Zee
f9214ced97 Merge pull request #85 from devinamatthews/qopenmp
Change -openmp to -fopenmp for icc.
2016-07-22 12:16:39 -05:00
Devin Matthews
ee2c139df6 Remove alignment restrictions on C in haswell kernel. 2016-07-22 12:06:03 -05:00
Devin Matthews
08666eaa20 Change -openmp to -fopenmp for icc. 2016-07-22 11:07:34 -05:00
Devin Matthews
119d039942 Add 8x24 KNL kernel. 2016-07-22 10:23:31 -05:00
praveeng
1aa77dfc1d Merge master code as on 2016_07_21 to amd-staging branch by praveeng
Change-Id: Ic7d0a21101358f08147736e7f1884e7409937344
2016-07-21 14:23:41 +05:30
Devin Matthews
b58cda9eba Merge remote-tracking branch 'origin/master' into knl
# Conflicts:
#	frame/base/bli_threading.h
#	frame/include/blis.h
#	frame/thread/bli_thread.c
2016-07-19 14:09:09 -05:00