ltqin
|
370efa6c08
|
batched_gemm + multiple_d + gemm + multiple_d (#394)
* refactor
* start
* add device gemm file
* add BatchStrideD0
* add stridd0
* add gridwise file
* add d0 parameters to gridwise gemm
* add c layout transformer
* add d0 threadwise copy
* init kernel
* init kernel
* regular code
* nm desc put to out
* kernel parameter can not use reference
* host add bias+gelu
* run right for bias+gelu
* change AddFastGelu into another file
* interface add d1 bias parameters
* add d1 parameter to argument
* add d1 parameter to gridwise
* first all code,not verify
* gelu change to relu and GetElementSpaceSize bug
* add instance
* start add to ckprofiler
* ckprofiler finish code
* change input parameter for ckProfiler
* fix host bias+gelu bug
* show help for ckProfiler
* fix bug for lunch kernel ignore parametes
* add pad and fix about bug
* mutiple d0
* add dynamic d0_element_op
* change profiler and instance to mutiple d0
* example have 2 d0
* remove some comments not using
* change 2 d0 have self parameters
* change d element_op name
* change class name(multiple_d)
* fix bug
* fix bug that don't find file
* update profiler
* refactor
* update profiler
* clean
* revert example change
* add gon layout
* optimize parameter for gno
* add gon to gemm+gemm
* change helping input parameters
* change to GemmPadder_v2
* using ForEach
* fix gb_per_sec
Co-authored-by: Chao Liu <lc.roy86@gmail.com>
Co-authored-by: ltqin <letaoqin@amd.com>
|
2022-09-14 17:54:18 -05:00 |
|