zjing14
fbdf4332c7
Add xdlops v4r4r4 into online compilation (#48)
* init for v4r4 xdlops olc
* refactor wrap
* init impl of v4r4 nchw xdlops olc
* tuning
* test perf
* fixed v4r4 nhwc
* tuned v4r4 nhwc
* use gridwise_gemm_xdlops_v2r3
* swap a/b
* add pointer support into offline v2r3
* debugging v4r4r4 transform for olc
* change timer of olc
* refactor v4r4 xdlops nchw olc
* remove transform fun in v4r4 xdlops nhwc olc
Co-authored-by: Chao Liu <chao.liu2@amd.com>
2021-07-16 23:27:08 -05:00
..
2021-07-08 10:40:00 -05:00
2021-07-16 23:27:08 -05:00
2021-07-16 23:27:08 -05:00
2021-07-16 23:27:08 -05:00
2021-07-08 12:17:43 -05:00
2021-06-09 23:53:08 -05:00
2021-07-04 22:50:29 -05:00
2021-07-16 23:27:08 -05:00
2021-07-16 23:27:08 -05:00
2021-06-09 23:53:08 -05:00
2021-07-08 10:40:00 -05:00
2021-06-09 23:53:08 -05:00
2021-07-08 10:40:00 -05:00
2021-07-04 22:50:29 -05:00
2021-07-01 14:33:00 -05:00
2021-07-01 14:33:00 -05:00
2021-07-16 22:55:01 -05:00
2021-07-01 14:33:00 -05:00
2021-06-24 08:34:19 -05:00
2021-07-16 23:27:08 -05:00
2021-07-16 23:27:08 -05:00
2021-06-24 08:34:19 -05:00
2021-06-24 08:34:19 -05:00