Chao Liu
|
e823d518cb
|
ckProfiler and device-level XDL GEMM operator (#48)
* add DeviceGemmXdl
* update script
* fix naming issue
* fix comment
* output HostTensorDescriptor
* rename
* padded GEMM for fwd v4r4r4 nhwc
* refactor
* refactor
* refactor
* adding ckProfiler
* adding ckProfiler
* refactor
* fix tuning parameter bug
* add more gemm instances
* add more fp16 GEMM instances
* fix profiler driver
* fix bug in tuning parameter
* add fp32 gemm instances
* small fix
* refactor
* rename
* refactor gemm profiler; adding DeviceConv and conv profiler
* refactor
* fix
* add conv profiler
* refactor
* adding more GEMM and Conv instance
* Create README.md
Add build instruction for ckProfiler
* Create README.md
Add Readme for gemm_xdl example
* Update README.md
Remove build instruction from top most folder
* Update README.md
* clean up
|
2021-11-14 11:28:32 -06:00 |
|
Chao Liu
|
643ebd4f3e
|
tidy
|
2021-08-10 07:07:11 +00:00 |
|
Chao Liu
|
3d32ae9404
|
add solver ConvIgemmFwdV6r1DlopsNchwKcyxNkhw; rename static ck source files
|
2021-07-30 17:50:17 -05:00 |
|
Chao Liu
|
82fae390fb
|
update to clang-format-10
|
2021-07-30 16:37:00 -05:00 |
|
Chao Liu
|
fcbb978828
|
Dynamic tensor descriptor (#24)
* support dynamic tensor descriptor
* use buffer load OOB feature for padding case
* add navi support
* add int8x4 inference kernel
Co-authored-by: Chao Liu <chao@ixt-rack-81.local.lan>
Co-authored-by: Jing Zhang <jizhan@amd.com>
|
2021-03-25 13:51:11 -05:00 |
|
Chao Liu
|
5c7cec1115
|
Code clean up (#20)
* tuning para,
* testing on v100
* add fp16
* remove deprecated tensor descriptor
* sync with miopen
* update build script
Co-authored-by: Jing Zhang <jizhan@amd.com>
|
2020-06-23 20:31:27 -05:00 |
|