ltqin
|
32c128bcc5
|
NHWC conv 2d: fwd bfp16/int8, Device level tuning and host API (#73)
* add fwd bf16 conv
* change tunning parametor
* add int8 for conv fwd
* remove comments
* change tunning parametor for int8
* change init int8 example
* add test for conv2d fwd
* change device operation file pos because merge develop
* fwd int8 use reference
* test_conv_fwd use reference
* add braket for if statement
* rename fwd example name
* remove StaticBufferOfVectorTypeV2
* tweak example
Co-authored-by: ltqin <letaoqin@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>
[ROCm/composable_kernel commit: 880fbee957]
|
2022-02-11 20:06:40 -06:00 |
|
Chao Liu
|
b9f9ed96ac
|
ckProfiler and device-level XDL GEMM operator (#48)
* add DeviceGemmXdl
* update script
* fix naming issue
* fix comment
* output HostTensorDescriptor
* rename
* padded GEMM for fwd v4r4r4 nhwc
* refactor
* refactor
* refactor
* adding ckProfiler
* adding ckProfiler
* refactor
* fix tuning parameter bug
* add more gemm instances
* add more fp16 GEMM instances
* fix profiler driver
* fix bug in tuning parameter
* add fp32 gemm instances
* small fix
* refactor
* rename
* refactor gemm profiler; adding DeviceConv and conv profiler
* refactor
* fix
* add conv profiler
* refactor
* adding more GEMM and Conv instance
* Create README.md
Add build instruction for ckProfiler
* Create README.md
Add Readme for gemm_xdl example
* Update README.md
Remove build instruction from top most folder
* Update README.md
* clean up
[ROCm/composable_kernel commit: e823d518cb]
|
2021-11-14 11:28:32 -06:00 |
|
Chao Liu
|
590dde14c8
|
tidy
[ROCm/composable_kernel commit: 56fc0842b3]
|
2021-08-09 19:27:49 +00:00 |
|
Chao Liu
|
f94e566273
|
reorganize files to prepare for MIOpen integration (#51)
* change olc cmake
* adding online compile to fwd-v4r5r2
* update scripts
* remane fwd-v4r5r2 to fwd-v6r1
* clean up
[ROCm/composable_kernel commit: 1264925422]
|
2021-07-18 00:43:05 -05:00 |
|