Commit Graph

88 Commits

Author SHA1 Message Date
Chao Liu
79e6abbda8 update cuda cmake config 2019-02-15 02:14:26 -06:00
Chao Liu
b2888adfbe change file extension to hip.hpp and hip.cpp 2019-02-15 02:13:21 -06:00
Chao Liu
a414e3fdf8 update build 2019-02-15 02:06:34 -06:00
Chao Liu
67c6f73ffe hip build 2019-02-15 00:54:30 -06:00
Chao Liu
121693b3d3 update cmake config 2019-02-14 15:12:29 -06:00
Chao Liu
e80fbbdd71 refactor build, clean up 2019-02-14 15:10:16 -06:00
Chao Liu
28354a0fa3 make LDS double buffer works, 1x1 conv now hits 80% of peak 2019-02-12 00:57:08 -06:00
Chao Liu
61ac08661d tune for 1x1 2019-02-11 22:36:17 -06:00
Chao Liu
abf75ac039 refactor 2019-02-11 17:45:22 -06:00
Chao Liu
120ab94aa1 update with new copy op 2019-02-07 01:31:00 -06:00
Chao Liu
07f16673c9 add lds double buffer for cnhw implicit gemm 2019-02-07 00:56:53 -06:00
Chao Liu
c866773642 unroll some loop, register double buffer gemm 2019-02-06 23:44:21 -06:00
Chao Liu
1b323316a8 add another blockwise gemm 2019-02-06 23:10:08 -06:00
Chao Liu
5e77650415 fixed LDS alignment bug 2019-02-06 01:54:13 -06:00
Chao Liu
079d63a788 bug fixes 2019-02-05 23:19:57 -06:00
Chao Liu
42f4c7fd56 refactor 2019-02-05 18:04:23 -06:00
Chao Liu
6614729a68 add another version of blockwise 2d copy, refactor 2019-02-05 17:06:53 -06:00
Chao Liu
4b616aad52 refactor 2019-02-05 00:51:37 -06:00
Chao Liu
0741a8ab88 working on reducing index calculation... 2019-02-04 17:16:28 -06:00
Chao Liu
9bbe9073ab refactor 2019-02-04 15:40:34 -06:00
Chao Liu
3439e4b5b7 padding works (sort of), but code looks ugly. Tuned some resnet configs 2019-01-25 02:50:28 -06:00
Chao Liu
8bd6ea1a97 improve implicit gemm NCHW, SRCK, NKHW, and tuned 2019-01-24 22:24:30 -06:00
Chao Liu
1de6fd0753 fixed a bug, and refactored 2019-01-24 21:20:29 -06:00
Chao Liu
1410850ecc add another implicit gemm: CHWN, CSRK, KHWN 2019-01-24 21:03:21 -06:00
Chao Liu
bd811e2c20 refactor 2019-01-24 16:15:51 -06:00
Chao Liu
c39c573eb8 refactor 2019-01-24 16:02:24 -06:00
Chao Liu
c9af4dece0 implicit gemm: LDS double buffer 2019-01-24 14:28:46 -06:00
Chao Liu
1f3870ca19 another version of blockwise 2d tensor copy 2019-01-23 16:42:57 -06:00
Chao Liu
e9ac4855f8 tune 2019-01-21 16:38:13 -06:00
Chao Liu
b5b4fd28ed refactor 2019-01-21 15:33:34 -06:00
Chao Liu
c64f63d5ec refactor 2019-01-21 11:36:31 -06:00
Chao Liu
2096847297 add 2nd variation of implicit gemm 2019-01-20 21:14:35 -06:00
Chao Liu
3bd51021ab tune implicit_gemm 2019-01-17 00:01:21 -06:00
Chao Liu
216e3da609 bug fix and tune implicit gemm 2019-01-16 23:24:49 -06:00
Chao Liu
caf4d7e6f5 refactor 2019-01-16 16:11:08 -06:00
Chao Liu
5872b710df refactor 2019-01-16 15:45:02 -06:00
Chao Liu
2b52fbd24a bug fix 2019-01-16 12:58:44 -06:00
Chao Liu
ff7a62198d refactor 2019-01-16 11:58:12 -06:00
Chao Liu
89ee259752 adding implicit gemm 2019-01-16 02:44:10 -06:00
Chao Liu
913afaeb5d adding implicit gemm 2019-01-16 02:01:56 -06:00
Chao Liu
e7b8705b91 adding implicit gemm 2019-01-15 18:11:41 -06:00
Chao Liu
84d9802d30 adding implicit gemm 2019-01-15 00:11:30 -06:00
Chao Liu
aa0199a31c adding implicit gemm 2019-01-14 11:13:36 -06:00
Chao Liu
dc60d16962 adding implicit gemm 2019-01-09 19:12:22 -06:00
Chao Liu
0597116330 refactor 2019-01-09 19:11:45 -06:00
Chao Liu
df228b3cf5 refactor 2019-01-08 16:56:46 -06:00
Chao Liu
0b8e67ef08 refactor 2019-01-08 14:05:03 -06:00
Chao Liu
ac1f62be3f refactor 2019-01-07 23:01:41 -06:00
Chao Liu
3dbd47252c added threadwise tensor reorder operation 2019-01-04 15:34:13 -06:00
Chao Liu
21c918162e added blockwise tensor reorder operation 2019-01-04 14:48:57 -06:00