Devin Matthews
8945a1512d
This version gets ~1550 GFLOPs on KNL wuth 16x4.
2016-08-03 11:28:24 -05:00
praveeng
cdfb3c3f29
Merge master code as on 2016_07_29 to amd-staging branch by praveeng
...
Change-Id: Ic78b84d8b8d10158fb2a612f9a64bbc7b1f9b486
2016-07-29 12:46:21 +05:30
praveeng
4bc842ca3a
Merge branch 'master' of publicrepo
2016-07-28 17:32:12 +05:30
praveeng
117f883851
Revert commits 357c990bdd
...
Change-Id: I12a34456d7eed93fda4369e76bcddb42ba7ccb99
2016-07-28 17:30:53 +05:30
praveeng
2fcdc28f10
Revert commits 8aee306
...
Change-Id: I3dd999c77c6779332a40dbb84371ca487216f189
2016-07-28 17:30:53 +05:30
praveeng
1b5d104afe
removed changes from readme file which are giving confilcts
...
Change-Id: Ic71ad1313e1404fed444e899466043704d875af6
2016-07-28 17:30:52 +05:30
praveeng
d81273047b
first commit
...
Change-Id: Ib50c81acda3b2c1583da3d421efc0ca547ef68e2
2016-07-28 17:30:04 +05:30
praveeng
65905c3011
small modification to readme for git push test
...
Change-Id: I68506a49586b07eaa907f3f85304ee40d4c92d0a
2016-07-28 17:28:55 +05:30
praveeng
23cca231be
first commit
...
Change-Id: Ib50c81acda3b2c1583da3d421efc0ca547ef68e2
2016-07-28 17:26:49 +05:30
praveeng
922e309170
small modification to readme for git push test
...
Change-Id: I68506a49586b07eaa907f3f85304ee40d4c92d0a
2016-07-28 17:26:49 +05:30
praveeng
b0d510bf0e
Revert commits 357c990bdd
...
Change-Id: I12a34456d7eed93fda4369e76bcddb42ba7ccb99
2016-07-28 15:11:08 +05:30
praveeng
5ebeece5b4
Revert commits 8aee306
...
Change-Id: I3dd999c77c6779332a40dbb84371ca487216f189
2016-07-28 15:01:36 +05:30
Devin Matthews
6ce4c022eb
Switch back to 24x8. I could only squeeze 24.5GFLOP out of 8x24, and scalability is not improved.
2016-07-27 16:26:36 -05:00
Field G. Van Zee
d52cb76715
Merge branch 'master' into compose
2016-07-27 16:04:55 -05:00
Field G. Van Zee
c31b1e7b9d
Relax alignment restrictions for sandybridge ukrs.
...
Details:
- Relaxed the base pointer and leading dimension alignment restrictions
in the sandybridge gemm microkernels, allowing the use of vmovups/vmovupd
instead of vmovaps/vmovapd. These change mimic those made to the haswell
microkernels in e0d2fa0 and ee2c139 .
- Updated testsuite modules as well as standalone test drivers in 'test'
directory to use DBL_MAX as the initial time candidate. Thanks to Devin
Matthews for suggesting this change.
- Inserted #include "float.h" into bli_system.h (to gain access to DBL_MAX).
- Minor update (vis-a-vis contexts) to driver code in test/3m4m.
2016-07-27 15:58:07 -05:00
Devin Matthews
b8f2b55532
Try an 8x24 kernel for the hell of it.
2016-07-27 15:22:55 -05:00
Devin Matthews
7ede5863ae
Allocate pack buffer on MCDRAM for KNL.
2016-07-27 13:42:32 -06:00
Devin Matthews
ad89ed2e82
Merge branch 'knl' of github.com:devinamatthews/blis into knl
2016-07-27 11:45:40 -05:00
Devin Matthews
2c9de740ed
This version gets ~26GF on one core.
2016-07-27 11:44:54 -05:00
Devin Matthews
81e2b05f31
Add optimized packing kernels for KNL.
2016-07-27 11:39:05 -05:00
Devin Matthews
a7d8ca97b8
All fixed.
2016-07-25 15:15:13 -05:00
Devin Matthews
963d0393b0
Add 24xk pack kernel.
2016-07-25 14:40:53 -05:00
Devin Matthews
117b76739a
In the midst of debugging.
2016-07-25 13:53:07 -05:00
Devin Matthews
8c0a4fd1d3
Fix some row/column confusion.
2016-07-25 13:09:24 -05:00
Devin Matthews
c44f9f9693
Simplify displacements -- clang assembler was badly botching EVEX compressed displacements giving false alarms for instruction length.
2016-07-25 12:02:24 -05:00
Devin Matthews
e0cce177cc
Minor fixes for 8x24 KNL kernel.
2016-07-25 10:02:25 -05:00
praveeng
50a2f2efcb
Merge master code as on 2016_07_25 to amd-staging branch by praveeng
...
Change-Id: I84886ae241db2aac0bef6b7ef399f04aa8bca16d
2016-07-25 17:07:38 +05:30
praveeng
cfd46c88d5
Merge remote-tracking branch 'publicrepo/master'
2016-07-25 15:38:13 +05:30
praveeng
f493bf4d70
removed changes from readme file which are giving confilcts
...
Change-Id: Ic71ad1313e1404fed444e899466043704d875af6
2016-07-25 14:14:00 +05:30
Devin Matthews
65735bbedf
Switch to 24x8 kernel, unrolled by 16.
2016-07-24 21:50:32 -05:00
Devin Matthews
45d5dc9717
Add 24x8 "KNC-style" kernel for KNL.
2016-07-24 14:25:26 -05:00
Field G. Van Zee
95abea46f8
Merge branch 'master' into compose
2016-07-23 15:38:33 -05:00
Field G. Van Zee
a017062fdf
Integrated "memory broker" (membrk_t) abstraction.
...
Details:
- Integrated a patch originally authored and submitted by Ricardo Magana
of HP Enterprise. The changeset inserts use of a new object type, membrk_t,
(memory broker) that allows multiple sets of memory pools on, for example,
separate NUMA nodes, each of which has a separate memory space.
- Added membrk field to cntx_t and defined corresponding accessor macros.
- Added membrk field to mem_t object and defined corresponding accessor macros.
- Created new bli_membrk.c file, which contains the new memory broker API,
including:
bli_membrk_init(), bli_membrk_finalize()
bli_membrk_acquire_[mv](), bli_membrk_release(),
bli_membrk_init_pools(), bli_membrk_reinit_pools(),
bli_membrk_finalize_pools(),
bli_membrk_pool_size()
- In bli_mem.c, changed function calls to
bli_mem_init_pools() -> bli_membrk_init()
bli_mem_reinit_pools() -> bli_membrk_reinit()
bli_mem_finalize_pools() -> bli_membrk_finalize()
- In bli_packv_init.c, bli_packm_init.c, changed function calls to:
bli_mem_acquire_[mv]() -> bli_membrk_acquire_[mv]()
bli_mem_release() -> bli_membrk_release()
- Added bli_mutex.c and related files to frame/thread. These files define
abstract mutexes (locks) and corresponding APIs for pthreads, openmp, or
single-threaded execution. This new API is employed within functions
such as bli_membrk_acquire_[mv]() and bli_membrk_release().
2016-07-22 17:02:59 -05:00
Devin Matthews
8ff2e069c4
Add 4x unrolled variant for KNL microkernel.
2016-07-22 16:22:26 -05:00
Devin Matthews
9cb2ed9b0c
Git rid of one RBX update.
2016-07-22 16:10:30 -05:00
Devin Matthews
451bde076f
Add some more knobs to twiddle for KNL microkernel.
2016-07-22 15:43:00 -05:00
Devin Matthews
8c6e621c09
Make knl conform to new kernel dir structure.
2016-07-22 15:05:15 -05:00
Devin Matthews
ce7214c661
Merge remote-tracking branch 'origin/master' into knl
2016-07-22 14:59:53 -05:00
Field G. Van Zee
ce59f81108
Merge pull request #88 from devinamatthews/32bit-dim_t
...
Handle 32-bit dim_t in 64-bit microkernels.
2016-07-22 14:48:14 -05:00
Devin Matthews
707a2b7fac
Somehow forgot the most important microkernel.
2016-07-22 13:49:44 -05:00
Devin Matthews
47ec045056
Merge remote-tracking branch 'upstream/master' into 32bit-dim_t
2016-07-22 13:45:23 -05:00
Devin Matthews
08f1d6b6fa
Use 64-bit intermediate variable for k for architectures that do 64-bit loads in case dim_t is 32-bit.
2016-07-22 13:44:37 -05:00
Field G. Van Zee
ff41153f4e
Merge pull request #86 from devinamatthews/haswell-vmovups
...
Remove alignment restrictions on C in haswell kernel.
2016-07-22 13:21:03 -05:00
Devin Matthews
e0d2fa0d83
Relax alignment restrictions for haswell sgemm.
2016-07-22 12:56:51 -05:00
Field G. Van Zee
f9214ced97
Merge pull request #85 from devinamatthews/qopenmp
...
Change -openmp to -fopenmp for icc.
2016-07-22 12:16:39 -05:00
Devin Matthews
ee2c139df6
Remove alignment restrictions on C in haswell kernel.
2016-07-22 12:06:03 -05:00
Devin Matthews
08666eaa20
Change -openmp to -fopenmp for icc.
2016-07-22 11:07:34 -05:00
Devin Matthews
119d039942
Add 8x24 KNL kernel.
2016-07-22 10:23:31 -05:00
praveeng
1aa77dfc1d
Merge master code as on 2016_07_21 to amd-staging branch by praveeng
...
Change-Id: Ic7d0a21101358f08147736e7f1884e7409937344
2016-07-21 14:23:41 +05:30
Devin Matthews
b58cda9eba
Merge remote-tracking branch 'origin/master' into knl
...
# Conflicts:
# frame/base/bli_threading.h
# frame/include/blis.h
# frame/thread/bli_thread.c
2016-07-19 14:09:09 -05:00