Bartłomiej Kocot
10732847e7
Grouped conv bwd wei NDHWGC/NDHWGK ( #804 )
2023-07-21 12:00:55 -05:00
Bartłomiej Kocot
1ee99dcaa6
Support NHWGC conv2d_bwd_weight ( #769 )
...
* Support NHWGC conv2d_bwd_weight
* Fix client example
* Fix client example
* Fix comments
* Redesign grouped_conv_bwd_weight instances
* Clang format fix
---------
Co-authored-by: zjing14 <zhangjing14@gmail.com >
2023-07-12 08:25:02 -05:00
Bartłomiej Kocot
63388e84ab
Support bf16/f32/f16 and NHWGC conv2d_bwd_data ( #757 )
...
* Support bf16/f32/f16 and NHWGC conv2d_bwd_data
* Add interface test
* clang format
* Comment fixes
* Add more friendly error message
2023-06-21 08:20:31 -05:00
Bartłomiej Kocot
fc9f97568f
Add DeviceBatchedGemmMultipleD_Dl ( #732 )
...
* Add DeviceBatchedGemmMultipleD_Dl
* Fix batched_gemm tests
* Fix comments
* test_batched_gemm_multi_d fixes
* Fix args for isSupported batchedGemmMultipleDDl
* Disable tests for gfx90a
2023-06-12 08:37:15 -05:00
Bartłomiej Kocot
642d5e9155
Add contraction profiler and tests ( #701 )
...
* Add contraction profiler and tests
* Build and style fixes
* Allow to use any elementwise operator for ref_contraction
* Introduce profile_contraction_scale and profile_contraction_bilinear
* Make ref_contraction generic and extend interface tests
* Stylistic minor fixes
* Extend test_contraction_interface
2023-05-15 09:46:52 -05:00
Chao Liu
cd167e492a
Compile for gfx908 and gfx90a ( #130 )
...
* adding compilation for multiple targets
* fix build
* clean
* update Jekinsfile
* update readme
* update Jenkins
* use ck::half_t instead of ushort for bf16
* rename enum classes
* clean
* rename
* clean
2022-03-31 12:33:34 -05:00
Chao Liu
5d37d7bff4
Reorganize files, Part 1 ( #119 )
...
* delete obselete files
* move files
* build
* update cmake
* update cmake
* fix build
* reorg examples
* update cmake for example and test
2022-03-08 21:46:36 -06:00
Chao Liu
e823d518cb
ckProfiler and device-level XDL GEMM operator ( #48 )
...
* add DeviceGemmXdl
* update script
* fix naming issue
* fix comment
* output HostTensorDescriptor
* rename
* padded GEMM for fwd v4r4r4 nhwc
* refactor
* refactor
* refactor
* adding ckProfiler
* adding ckProfiler
* refactor
* fix tuning parameter bug
* add more gemm instances
* add more fp16 GEMM instances
* fix profiler driver
* fix bug in tuning parameter
* add fp32 gemm instances
* small fix
* refactor
* rename
* refactor gemm profiler; adding DeviceConv and conv profiler
* refactor
* fix
* add conv profiler
* refactor
* adding more GEMM and Conv instance
* Create README.md
Add build instruction for ckProfiler
* Create README.md
Add Readme for gemm_xdl example
* Update README.md
Remove build instruction from top most folder
* Update README.md
* clean up
2021-11-14 11:28:32 -06:00