rocking5566
6f928a0876
Support alpha beta scaling for GEMM ( #78 )
...
* [What] Add 2d version of bias, prepare to implement alpha / beta scaling
* Add alpha / beta functor
* Refine parameter of example
* [What] Use real type instead of template
[Why] Prevent implicit cast
* Rename parameter for general operator
* Remove redundant comment
* Fix compile error
Co-authored-by: rocking <chunylai@amd.com >
Co-authored-by: Chao Liu <chao.liu2@amd.com >
2022-02-11 00:48:41 -06:00
ltqin
4be7f0198e
add split-k GEMM ( #59 )
...
* add DeviceGemmSplitKXdl
* add file device_gemm_splitk_xdl.hpp
* set c matrix zero
* using atomic
* add all tuning parameter to f32 mkkn
* grid size change to 720
* add tunning parameter for NT
* add tunning parameter for TN
* add tunning parameter for TT
* add m=96tunning parameter
* add lost config
* add element wise operation
* fixed MPerBlock=96
* remove marco for slpitk swtich
* add test
* add new line at the end of device_gemm_xdl_instance.hpp
* remove step hack
* seperate split-k instance files
* add tunning parameters
* change disired grid size to parameters
* remove slice length
* add desiredgridsize parameter to ckProfiler
* add losting file device_gemm_xdl_splitk_instance.hpp
* change desired gride size to kbatch
* format
* format
* clean up
* add selection of device_instances
* clean code
* fix build issue
Co-authored-by: ltqin <letaoqin@amd.com >
Co-authored-by: Chao Liu <chao.liu2@amd.com >
Co-authored-by: Jing Zhang <jizhan@amd.com >
2022-02-02 22:47:27 -06:00
Chao Liu
41cdd3801a
GEMM/Conv+BiasAdd+ReLU+Add ( #55 )
...
* gemm+activation
* move C pointwise operation into threadwise copy
* add pointwise operation to A/B matrix
* update ckProfiler
* adding bias add
* adding bias add
* adding bias add
* added bias add; worked around compiler issues
* clean up
* clean up
* Update README.md
* Update README.md
* Update README.md
* clean up
* add conv_xdl example
* adding conv_xdl_bias_relu_add example
* add conv+bias+relu+add, but has register spill issue
* tweak
* tweak
* refactor
* Update README.md
update readme for example/2_gemm_xdl_bias_relu_add
* clean up
* Update README.md
update readme for example/3_conv_xdl
* Update README.md
2021-12-02 20:07:37 -06:00
Chao Liu
e823d518cb
ckProfiler and device-level XDL GEMM operator ( #48 )
...
* add DeviceGemmXdl
* update script
* fix naming issue
* fix comment
* output HostTensorDescriptor
* rename
* padded GEMM for fwd v4r4r4 nhwc
* refactor
* refactor
* refactor
* adding ckProfiler
* adding ckProfiler
* refactor
* fix tuning parameter bug
* add more gemm instances
* add more fp16 GEMM instances
* fix profiler driver
* fix bug in tuning parameter
* add fp32 gemm instances
* small fix
* refactor
* rename
* refactor gemm profiler; adding DeviceConv and conv profiler
* refactor
* fix
* add conv profiler
* refactor
* adding more GEMM and Conv instance
* Create README.md
Add build instruction for ckProfiler
* Create README.md
Add Readme for gemm_xdl example
* Update README.md
Remove build instruction from top most folder
* Update README.md
* clean up
2021-11-14 11:28:32 -06:00