logicat
fec81109f1
Remove unnecessary hip_fp16 include from stream_config ( #3549 )
2026-01-16 10:40:05 -08:00
Aviral Goel
f5ac3ee359
chore(copyright): update copyright header for include directory ( #3224 )
...
* chore(copyright): update copyright header for tile_engine directory
* chore(copyright): update copyright header for script directory
* chore(copyright): update copyright header for test_data directory
* chore(copyright): update copyright header for python directory
* chore(copyright): update copyright header for profiler directory
* chore(copyright): update copyright header for library directory
* chore(copyright): update copyright header for include directory
2025-11-18 10:17:18 -08:00
ltqin
f448d179b7
Universal gemm flush cache ( #1251 )
...
* add flush cache to device op
* add flush cache parameter to ckProfiler
* change calculate size a and b method
* chang evaluation time method foro AVERAGE to MEDIAN
* format code
* adjust some code
* fix core dumped
* remove loop call flush icache in kernel
* remove loop(outer) call flush icache
---------
Co-authored-by: letaoqin <letaoqin@amd.com >
2024-04-25 15:07:14 -05:00
Haocong WANG
bb63b9732c
[GEMM] Optimization for MI200/300. ( #1135 )
...
* Optimize GEMM on MI200/300:
1. Add new blockwise gemm pipeline
2. Add irregular splitk intances
* clang format + typo fix
* Fix a bug
2024-01-19 07:02:22 -06:00
zjing14
ae5e5181aa
recover default niter ( #1064 )
2023-11-28 12:18:42 -08:00
zjing14
e8cddfdc3b
Improve 4k gemm perf ( #1047 )
...
* improve 4k gemm perf
* add f8 instances
* format
---------
Co-authored-by: Jing Zhang <jizha@amd.com >
2023-11-17 07:06:24 -06:00
Illia Silin
b94fd0b227
update copyright headers ( #726 )
2023-05-31 18:46:57 -05:00
Chao Liu
43c898f6ff
disable print for group conv multiple D ( #421 )
2022-09-16 09:46:32 -05:00
Chao Liu
d3051d7517
add license in file ( #303 )
2022-06-24 23:32:43 -05:00
JD
cec69bc3bc
Add host API ( #220 )
...
* Add host API
* manually rebase on develop
* clean
* manually rebase on develop
* exclude tests from all target
* address review comments
* update client app name
* fix missing lib name
* clang-format update
* refactor
* refactor
* refactor
* refactor
* refactor
* fix test issue
* refactor
* refactor
* refactor
* upate cmake and readme
Co-authored-by: Chao Liu <chao.liu2@amd.com >
2022-05-12 09:21:01 -05:00