Commit Graph

18 Commits

Author SHA1 Message Date
Chao Liu
67ad47e7c1 refactor 2021-08-16 21:01:33 +00:00
Chao Liu
16effa767c refactor 2021-08-16 20:36:47 +00:00
Chao Liu
f63a23acb1 [MIOpen Downstream] Initial MIOpen integration (#52)
* update online kernel wrapper bundle all descriptors in a tuple

* change __CONSTANT__ to CONSTANT

* rename

* adding tuning

* added IsValidCompileParameter

* reorginze

* adding tunable for fp16 and int8

* fix kernel compile warning and bug fixes

* suppress warning about cast CONSTANT (address space 4) pointer

* fix building issue
2021-07-27 00:02:27 -05:00
zjing14
3835318cc3 xdlops_v4r4_fwd fp32/fp16 (#34)
* create files for xdlops

* working on blockwise_gemm_xdlops

* add KReduction

* add m/n repeats

* add 2x2 pipeline

* added 128x128 wavegemm

* use StaticBuffer of vector_type

* break vector type to blk_size

* add kpack into xldops_gemm and blockwise_gemm

* abroadcast only

* add fp32 mfma instructions

* adding fp16 mfma

* pack half4_t

* rename kperwave to kpack

* add 32x32x8fp16

* add fp16 mfma

* clean code

* clean code

* V4r4 xdlops kpack (#35)

* add kpack with incorrect results

* bug fix for make_dynamic_naive_tensor_descriptor_aligned_v2

* add 1x1 kernel

* add gridwise_gemm_v2 - single_buffer

* enabled dwordx4 for fp16

Co-authored-by: Chao Liu <chao.liu2@amd.com>

* refactor fwd-v4r4-xdlops

* add v4r4-nhwc-xdlop

* improve some perf of nhwc and nchw by tuning parameters, and change scheuduling in gridwise-gemm loop

* tweak scheduling in gridwise gemm

* add v4r3 with a single output copy

* init commit: output with slice win

* adding sliceWin

* add multiple repeats pattern

* starting adding bwd-v4r1-xdlops

* use tuple as SrcBuffer

* adding bwd-data v4r1 nhwc xdlops

* fix bug in make_dynamic_naive_tensor_descriptor_aligned_v2()

* fix bug in host bwd-data conv

* initial implementation of bwd-data v4r1 nhwc xdlops

* add launch bound flags

* enable launch bound

* add m/nrepeat=4

* tweak bwd-data v4r1 nhwc xdlops

* added bwd-data v4r1 nhwc xlops with output A and weight B

* add fwd-v4r4 nhwc xdlops, A input, B weight, C output

Co-authored-by: Chao Liu <chao.liu2@amd.com>
2021-07-01 14:33:00 -05:00
Chao Liu
30072aec37 Restructure gridwise and blockwise GEMM, add tensor contraction and FWD-v4r5 (#36)
* experimenting magic number division

* overhauling fwd-v4r4 to clearly reflect transformation graph

* added fwd-v4r5

* bug fix for make_dynamic_naive_tensor_descriptor_aligned_v2

* bug fix and added sanity-check in transform_dynamic_tensor_descriptor

* added conv_driver_v2
2021-06-09 23:53:08 -05:00
Chao Liu
01055d95d9 No raw index calculation (#31)
* Replace most raw index calculation to coordinate transformation
* Overhaul blockwise and threadwise GEMM
* Overhaul driver for gridwies GEMM kernel

Co-authored-by: Jing Zhang <jizhan@amd.com>
2021-05-11 00:09:25 -05:00
Chao Liu
fcbb978828 Dynamic tensor descriptor (#24)
* support dynamic tensor descriptor

* use buffer load OOB feature for padding case

* add navi support

* add int8x4 inference kernel

Co-authored-by: Chao Liu <chao@ixt-rack-81.local.lan>
Co-authored-by: Jing Zhang <jizhan@amd.com>
2021-03-25 13:51:11 -05:00
Chao Liu
5c7cec1115 Code clean up (#20)
* tuning para,

* testing on v100

* add fp16

* remove deprecated tensor descriptor

* sync with miopen

* update build script

Co-authored-by: Jing Zhang <jizhan@amd.com>
2020-06-23 20:31:27 -05:00
Chao Liu
1a66e35b6f MIopen integration (#13)
* update for miopen integration: cosmetic refactor
2020-02-17 09:53:20 -06:00
Chao Liu
3406a1148a Update for recent MIOpen integration (#11)
* update for MIOpen integration
2020-01-27 15:29:33 -06:00
Chao Liu
c5da0377fb Added bwd data v3r1 v4r1, tweaking v1 (#10)
* Added bwd data v3r1: breaking down compute into a series of load balanced GEMM, and launch in a single kernel
* Added bwd data v4r1: like v3r1, but launch GEMMs in multiple kernels
* Tweaked v1r1  and v1r2 (atomic) on AMD GPU
2020-01-20 10:20:03 -06:00
Chao Liu
8f5f64960e backward data (#7)
* enabled atomic add in tensor copy
* added gridwise GEMM
* added backward data conv using GEMM + atomic
* added backward data conv using GEMM, no atomic
2019-12-03 01:16:12 -06:00
Chao Liu
ca42e9101d adding merge transform 2019-09-10 01:53:49 -05:00
Chao Liu
7a7fe16086 more utility code 2019-09-09 00:29:33 -05:00
Chao Liu
0c05f4279f adding dimension tranformation 2019-09-05 00:19:06 -05:00
Chao Liu
37b82b7e54 refactor 2019-06-19 22:26:45 -05:00
Chao Liu
1f2cfcebb3 fixed amd build 2019-06-19 18:51:19 -05:00
Chao Liu
21f7e9f103 refactor 2019-06-19 17:43:56 -05:00