Aviral Goel
0aadb4b2c4
chore(copyright): update copyright header for profiler directory ( #3205 )
...
* chore(copyright): update copyright header for tile_engine directory
* chore(copyright): update copyright header for script directory
* chore(copyright): update copyright header for test_data directory
* chore(copyright): update copyright header for python directory
* chore(copyright): update copyright header for profiler directory
2025-11-14 11:19:25 -08:00
Illia Silin
b94fd0b227
update copyright headers ( #726 )
2023-05-31 18:46:57 -05:00
Po Yen Chen
8784a72e23
Modularize ckProfiler operations ( #514 )
...
* Re-structure ckProfiler source files
* Rename profiler.cpp to main.cpp
* Modularize ckProfiler operations
* Add description for profiler operations
* Use longer name to avoid name collision
* Use macro to delay expansion
* Use std::move() to avoid object copying
* Prohibit users from calling dtor
* Use macro to eliminate redundant code
* Make friend function hidden
* Add missing include directive <iostream>
* Fix wrong include directives
* Remove int8 from batchnorm-forward instances since it is not needed for forward training and could fail test
Co-authored-by: Qianfeng Zhang <Qianfeng.Zhang@amd.com >
2022-12-01 15:15:02 -06:00
ltqin
370efa6c08
batched_gemm + multiple_d + gemm + multiple_d ( #394 )
...
* refactor
* start
* add device gemm file
* add BatchStrideD0
* add stridd0
* add gridwise file
* add d0 parameters to gridwise gemm
* add c layout transformer
* add d0 threadwise copy
* init kernel
* init kernel
* regular code
* nm desc put to out
* kernel parameter can not use reference
* host add bias+gelu
* run right for bias+gelu
* change AddFastGelu into another file
* interface add d1 bias parameters
* add d1 parameter to argument
* add d1 parameter to gridwise
* first all code,not verify
* gelu change to relu and GetElementSpaceSize bug
* add instance
* start add to ckprofiler
* ckprofiler finish code
* change input parameter for ckProfiler
* fix host bias+gelu bug
* show help for ckProfiler
* fix bug for lunch kernel ignore parametes
* add pad and fix about bug
* mutiple d0
* add dynamic d0_element_op
* change profiler and instance to mutiple d0
* example have 2 d0
* remove some comments not using
* change 2 d0 have self parameters
* change d element_op name
* change class name(multiple_d)
* fix bug
* fix bug that don't find file
* update profiler
* refactor
* update profiler
* clean
* revert example change
* add gon layout
* optimize parameter for gno
* add gon to gemm+gemm
* change helping input parameters
* change to GemmPadder_v2
* using ForEach
* fix gb_per_sec
Co-authored-by: Chao Liu <lc.roy86@gmail.com >
Co-authored-by: ltqin <letaoqin@amd.com >
2022-09-14 17:54:18 -05:00