Illia Silin
b94fd0b227
update copyright headers ( #726 )
2023-05-31 18:46:57 -05:00
Chao Liu
d3051d7517
add license in file ( #303 )
2022-06-24 23:32:43 -05:00
Chao Liu
d1db6a0c3e
Absolute include path ( #281 )
...
* ad gelu and fast_gelu
* added GeLU and fast GeLU
* clean up
* add gemm+fastgelu example
* add gemm+gelu instances
* update profiler
* clean up
* clean up
* adding gemm+bias+activation
* clean
* adding bias
* clean
* adding gemm multiple d
* debugging
* add gemm bias add fastgelu
* rename, clean
* refactoring; add readme
* refactor
* refactor
* refactor
* refactor
* refactor
* refactor
* fix
* fix
* update example
* update example
* rename
* update example
* add ckProfiler
* clean
* clean
* clean
* clean
* add client app example
* update readme
* delete obselete files
* remove old client app
* delete old file
* cleaning
* clean
* remove half
* fix header path
* fix header path
* fix header path
* fix header path
* fix header path
* fix header path for all examples
* fix header path
* fix header path
* fix header path
* fix header path
* fix header path
* fix header path
* fix header path
* fix header path
* fix header path
* revert client app example
* clean build
* fix build
* temporary disable client test on Jenkins
* clean
* clean
* clean
2022-06-24 20:51:04 -05:00
rocking5566
aafc3ac27a
elementwise op ( #238 )
...
* Add elementwise operation kernel and example
* Add comment
* Add template argument of dim . Prepare to support multiple dimension
* Rename example
* Support 1 dimension
* Add static assert
* Add comment
* Extract pad
* Remove redundant argument
* Support any dimension for elementwise operation
* Remove line
* Let it be the multiple number of CU
* Move thread per block to the parameter of constructor
* rename threadPerBlock with blockSize
* Support double
* rename kernel function name
* remove redundant include header
* Refine type
* Need to the final dimension
* Refine variable name
* Refine type
* Use index_t instead of int in API
Co-authored-by: rocking <chunylai@amd.com >
2022-05-18 23:34:35 -05:00
Chao Liu
ec7c2e912e
Code refactor ( #175 )
...
* format
* improving pipeline
* fix typo
* format
* adding thread group
* adding thread group
* adding thread group
* adding gemm pipeline
* tweak
* refactor
* refactor
* add missing type convert
* refactor
* refactor
* refactor
* clean
* fix build
* refactor
* format
* clean up
* use remove_cvref_t
* clean
* clean up
* clean up
* clean up
2022-05-09 14:57:59 -05:00
Chao Liu
cd167e492a
Compile for gfx908 and gfx90a ( #130 )
...
* adding compilation for multiple targets
* fix build
* clean
* update Jekinsfile
* update readme
* update Jenkins
* use ck::half_t instead of ushort for bf16
* rename enum classes
* clean
* rename
* clean
2022-03-31 12:33:34 -05:00