Chao Liu
d1db6a0c3e
Absolute include path ( #281 )
...
* ad gelu and fast_gelu
* added GeLU and fast GeLU
* clean up
* add gemm+fastgelu example
* add gemm+gelu instances
* update profiler
* clean up
* clean up
* adding gemm+bias+activation
* clean
* adding bias
* clean
* adding gemm multiple d
* debugging
* add gemm bias add fastgelu
* rename, clean
* refactoring; add readme
* refactor
* refactor
* refactor
* refactor
* refactor
* refactor
* fix
* fix
* update example
* update example
* rename
* update example
* add ckProfiler
* clean
* clean
* clean
* clean
* add client app example
* update readme
* delete obselete files
* remove old client app
* delete old file
* cleaning
* clean
* remove half
* fix header path
* fix header path
* fix header path
* fix header path
* fix header path
* fix header path for all examples
* fix header path
* fix header path
* fix header path
* fix header path
* fix header path
* fix header path
* fix header path
* fix header path
* fix header path
* revert client app example
* clean build
* fix build
* temporary disable client test on Jenkins
* clean
* clean
* clean
2022-06-24 20:51:04 -05:00
JD
cec69bc3bc
Add host API ( #220 )
...
* Add host API
* manually rebase on develop
* clean
* manually rebase on develop
* exclude tests from all target
* address review comments
* update client app name
* fix missing lib name
* clang-format update
* refactor
* refactor
* refactor
* refactor
* refactor
* fix test issue
* refactor
* refactor
* refactor
* upate cmake and readme
Co-authored-by: Chao Liu <chao.liu2@amd.com >
2022-05-12 09:21:01 -05:00
Jianfeng Yan
34c661e71c
Batched gemm and reduction ( #156 )
...
* adding batched_gemm_and_reduction
* batched_gemm_reduce works with bactch_count=1
* fix a bug in grid_size; batched_gemm_reduce works for batch_count > 1
* adding profiler for batched_gemm_fp16
* fixed a bug in declaration of d1 and d0; both example and profiler work
* clang-format
* cleanup
* batched_gemm_reduce: add test
* minor change
* fixed some typo in function names
2022-03-30 11:21:18 -05:00
Chao Liu
f95267f166
Gemm+Reduce Fusion ( #128 )
...
* add gridwise gemm v4r1
* rename
* adding gemm+reduce
* adding gemm+reduce
* adding gemm+reduce
* adding gemm+reduce
* use sfc in shuffling
* remove hardcode
* remove hardcode
* refactor
* fix build
* adding gemm+reduce
* adding gemm+reduce
* adding gemm+reduce
* adding gemm+reduce
* adding gemm+reduce
* format
* clean
* adding gemm+reduce
* adding profiler for gemm+reduce
* adding gemm+reduce profiler
* fix build
* clean up
* gemm+reduce
* fix build
* update DeviceGemm_Xdl_CShuffle; update enum to enum class
* clean up
* add test for gemm+reduce
* clean up
* refactor
* fix build
* fix build
2022-03-23 22:18:42 -05:00