Chao Liu
c67332b930
Use Tuple and vector_type instead of Array for holding tensor data ( #30 )
...
* replacing array with tuple and vector for tensor data
[ROCm/composable_kernel commit: d075adf126 ]
2021-04-28 13:10:33 -05:00
Chao Liu
2501a44530
Overhaul vector_type and use real vector for int8x4_t instead of aliasing from int32_t ( #29 )
...
* overhaul vector_type, make int8x4_t real vector instead of aliasing from int32_t
[ROCm/composable_kernel commit: e4790c250c ]
2021-04-12 23:48:43 -05:00
zjing14
2457224dc9
Hybrid direct + implicit GEMM forward convolution NCHWc v5r1 ( #25 )
...
* Hybrid direct + implicit GEMM forward convolution NCHWc v5r1. Input tensor bypass LDS. Support fp32/fp16/int8
[ROCm/composable_kernel commit: 792a20fa5b ]
2021-04-07 16:47:29 -05:00
Chao Liu
e2753e68bd
Dynamic tensor descriptor ( #24 )
...
* support dynamic tensor descriptor
* use buffer load OOB feature for padding case
* add navi support
* add int8x4 inference kernel
Co-authored-by: Chao Liu <chao@ixt-rack-81.local.lan >
Co-authored-by: Jing Zhang <jizhan@amd.com >
[ROCm/composable_kernel commit: fcbb978828 ]
2021-03-25 13:51:11 -05:00
Chao Liu
a3c89131fa
Code clean up ( #20 )
...
* tuning para,
* testing on v100
* add fp16
* remove deprecated tensor descriptor
* sync with miopen
* update build script
Co-authored-by: Jing Zhang <jizhan@amd.com >
[ROCm/composable_kernel commit: 5c7cec1115 ]
2020-06-23 20:31:27 -05:00
Chao Liu
bd24dfbea7
Update for recent MIOpen integration ( #11 )
...
* update for MIOpen integration
[ROCm/composable_kernel commit: 3406a1148a ]
2020-01-27 15:29:33 -06:00
Chao Liu
7c9100b53f
Added bwd data v3r1 v4r1, tweaking v1 ( #10 )
...
* Added bwd data v3r1: breaking down compute into a series of load balanced GEMM, and launch in a single kernel
* Added bwd data v4r1: like v3r1, but launch GEMMs in multiple kernels
* Tweaked v1r1 and v1r2 (atomic) on AMD GPU
[ROCm/composable_kernel commit: c5da0377fb ]
2020-01-20 10:20:03 -06:00
Chao Liu
24f7d66609
update implicit GEMM forward v4r4 to use gridwise gemm ( #9 )
...
* updated fwd v4r4 to use gridwise gemm
* updated gridwise gemm api calls in bwd-data v1r1 and v2r1
[ROCm/composable_kernel commit: e2b4c5b469 ]
2019-12-05 12:36:36 -06:00
Chao Liu
4414e495ed
backward data ( #7 )
...
* enabled atomic add in tensor copy
* added gridwise GEMM
* added backward data conv using GEMM + atomic
* added backward data conv using GEMM, no atomic
[ROCm/composable_kernel commit: 8f5f64960e ]
2019-12-03 01:16:12 -06:00