Commit Graph

10 Commits

Author SHA1 Message Date
logicat
2f59f74334 Remove unnecessary hip_fp16 include from stream_config (#3549)
[ROCm/composable_kernel commit: fec81109f1]
2026-01-16 10:40:05 -08:00
Aviral Goel
0cfa802f89 chore(copyright): update copyright header for include directory (#3224)
* chore(copyright): update copyright header for tile_engine directory

* chore(copyright): update copyright header for script directory

* chore(copyright): update copyright header for test_data directory

* chore(copyright): update copyright header for python directory

* chore(copyright): update copyright header for profiler directory

* chore(copyright): update copyright header for library directory

* chore(copyright): update copyright header for include directory

[ROCm/composable_kernel commit: f5ac3ee359]
2025-11-18 10:17:18 -08:00
ltqin
b4f3b8e693 Universal gemm flush cache (#1251)
* add flush cache to device op

* add flush cache parameter to ckProfiler

* change calculate size a and b method

* chang evaluation time method foro AVERAGE to MEDIAN

* format code

* adjust some code

* fix core dumped

* remove loop call flush icache in kernel

* remove loop(outer) call flush icache

---------

Co-authored-by: letaoqin <letaoqin@amd.com>

[ROCm/composable_kernel commit: f448d179b7]
2024-04-25 15:07:14 -05:00
Haocong WANG
ec7e5b1331 [GEMM] Optimization for MI200/300. (#1135)
* Optimize GEMM on MI200/300:
1. Add new blockwise gemm pipeline
2. Add irregular splitk intances

* clang format + typo fix

* Fix a bug

[ROCm/composable_kernel commit: bb63b9732c]
2024-01-19 07:02:22 -06:00
zjing14
04675fdf63 recover default niter (#1064)
[ROCm/composable_kernel commit: ae5e5181aa]
2023-11-28 12:18:42 -08:00
zjing14
b88a739b88 Improve 4k gemm perf (#1047)
* improve 4k gemm perf

* add f8 instances

* format

---------

Co-authored-by: Jing Zhang <jizha@amd.com>

[ROCm/composable_kernel commit: e8cddfdc3b]
2023-11-17 07:06:24 -06:00
Illia Silin
b57fbee2f1 update copyright headers (#726)
[ROCm/composable_kernel commit: b94fd0b227]
2023-05-31 18:46:57 -05:00
Chao Liu
c1cfd4c894 disable print for group conv multiple D (#421)
[ROCm/composable_kernel commit: 43c898f6ff]
2022-09-16 09:46:32 -05:00
Chao Liu
31706d4896 add license in file (#303)
[ROCm/composable_kernel commit: d3051d7517]
2022-06-24 23:32:43 -05:00
JD
69d5f78b16 Add host API (#220)
* Add host API

* manually rebase on develop

* clean

* manually rebase on develop

* exclude tests from all target

* address review comments

* update client app name

* fix missing lib name

* clang-format update

* refactor

* refactor

* refactor

* refactor

* refactor

* fix test issue

* refactor

* refactor

* refactor

* upate cmake and readme

Co-authored-by: Chao Liu <chao.liu2@amd.com>

[ROCm/composable_kernel commit: cec69bc3bc]
2022-05-12 09:21:01 -05:00