Ville Pietilä
60db94037e
Fix compilation issues on MI300.
2025-10-22 12:51:02 +00:00
Ville Pietilä
784c84c54b
Benchmarking script improvements.
2025-10-21 15:09:57 +00:00
Ville Pietilä
a5afa4c07e
Script to convert MIOpenDriver commands to CK profiler input.
2025-10-21 08:16:14 +00:00
Ville Pietilä
159bcf6750
Small script improvements.
2025-10-21 06:38:49 +00:00
Ville Pietilä
2efd174b33
Add new kernel instances.
2025-10-20 12:01:57 +00:00
Ville Pietilä
307ca52156
Improve benchmarking and analysis script.
2025-10-20 10:50:34 +00:00
Ville Pietilä
1247133245
Parallel compilation of the CK Tile instances.
2025-10-20 09:31:53 +00:00
Ville Pietilä
3aec8b5493
Improve profiler output.
2025-10-20 08:04:59 +00:00
Ville Pietilä
dc65dc98e1
Optimize calculation of the CPU reference.
2025-10-17 14:48:12 +00:00
Ville Pietilä
949bf1149f
Add back BF16 instances.
2025-10-17 14:47:39 +00:00
Ville Pietilä
697dd2e6f1
Create runner script to runs CK and CK Tile profilers.
2025-10-17 14:27:52 +00:00
Ville Pietilä
28055fdd9a
Improve profiler output.
2025-10-17 13:17:29 +00:00
Ville Pietilä
7722f901df
Fix validation.
2025-10-17 13:07:06 +00:00
Ville Pietilä
6789c219c1
Add missing header.
2025-10-17 12:43:49 +00:00
Ville Pietilä
a92c965667
Fix fwd layouts.
2025-10-17 11:07:39 +00:00
Ville Pietilä
ef3e871e6e
Add grouped conv fwd direction profiling into CK Tile profiler.
2025-10-17 10:47:23 +00:00
Ville Pietilä
0e0fb54b9f
Rename conv factory.
2025-10-17 06:26:41 +00:00
Ville Pietilä
a708b177fc
Add double smem buffer instances.
2025-10-17 06:24:11 +00:00
Ville Pietilä
c0b68c8a85
Add more instances.
2025-10-16 14:18:40 +00:00
Ville Pietilä
6c5531a4ae
Disqualify benchmarking results from kernels that do not pass validation.
2025-10-16 12:22:51 +00:00
Ville Pietilä
76ffa1bf0a
Add more instances.
2025-10-16 11:33:06 +00:00
Ville Pietilä
044bcfcb1e
Take universal GEMM pipeline into use for grouped convolutions.
2025-10-16 11:03:14 +00:00
Ville Pietilä
e99b5a8c28
Merge remote-tracking branch 'origin/develop' into vpietila/ck-vs-ck-tile-conv-benchmarking
2025-10-16 07:33:08 +00:00
Ville Pietilä
9b3c61cac2
Add more instances.
2025-10-16 07:32:52 +00:00
Ville Pietilä
19fac39880
Enable vector loads in grouped conv bwd weight kernels.
2025-10-16 07:17:12 +00:00
Haocong WANG
013ba3c737
Enable storelse for fmha_fwd_trload kernel ( #3023 )
2025-10-16 13:51:23 +08:00
Emily Martins
0dbd173500
Fix compiler noreturn error for ck tile permute test ( #3036 )
2025-10-15 19:42:02 -07:00
Aviral Goel
232523d9fa
docs: add quant mode comparison to readme ( #3032 )
...
* docs: add quant mode comparison to readme
* Update example/ck_tile/38_block_scale_gemm/README.md
Co-authored-by: Christopher Millette <63608002+cgmillette@users.noreply.github.com >
---------
Co-authored-by: Christopher Millette <63608002+cgmillette@users.noreply.github.com >
2025-10-15 18:35:06 -07:00
Illia Silin
87d0a3ac17
use branch develop to test hipTensor ( #3034 )
2025-10-15 15:40:34 -07:00
Illia Silin
3348f01e6f
re-enable clang-format by default ( #3030 )
...
* re-enable clang-format by default
* fix clang format
2025-10-15 07:43:11 -07:00
Ville Pietilä
a5b60ed2f2
Add more instances.
2025-10-15 14:33:01 +00:00
Christopher Millette
bde5f26db3
Disable streamk extended regression tests for now ( #3016 )
2025-10-15 09:05:47 -05:00
Ville Pietilä
96a7c26a0b
Better split-K handling in the template instantiation.
2025-10-15 13:47:04 +00:00
Ville Pietilä
bbe13f4635
Add more instances.
2025-10-15 13:23:55 +00:00
Ville Pietilä
23aa650172
Add min blocks per CU to invoker name.
2025-10-15 13:21:29 +00:00
Ville Pietilä
57dbd2f4a4
Remove unnecessary compilations.
2025-10-15 13:20:58 +00:00
Ville Pietilä
3c08ce1e64
Improve the grouped conv kernel name generation in CK Tile.
2025-10-15 11:02:21 +00:00
felix
4c826abfff
Felix/opt sorting ( #2902 )
...
* merge felix/sorting
* opt moe sorting (#2822 )
* opt moe storing for 2k
---------
Co-authored-by: lalala-sh <Jiaxing.Wen@amd.com >
Co-authored-by: coderfeli <coderfeli@163.com >
2025-10-15 09:24:03 +08:00
AviralGoelAMD
ca1ab083a7
test(grouped_gemm_multi_d): add unit test for bf16 support
2025-10-14 18:00:43 -04:00
AviralGoelAMD
8d8b49dec2
feat(grouped_gemm_multi_d): add support for bf16
2025-10-14 18:00:43 -04:00
Geo Min
706c2b281c
fixing group id ( #3002 )
2025-10-14 08:51:52 -07:00
joyeamd
b9d74e7746
update s_barrier's logic in gfx12 architecture ( #3003 )
...
change s_waitcnt's logic in gfx1250
change s_waitcnt's logic in gfx1250
update comment
2025-10-14 08:49:34 -07:00
Illia Silin
e4298e55c7
Revert "[CK_TILE] Non-K Major from old CK to CK-Tile ( #2442 )" ( #3017 )
...
This reverts commit d2bbca3eca .
2025-10-14 08:43:14 -07:00
Ville Pietilä
3d0db2ca63
Fix transferring data back to host for validation.
2025-10-14 15:02:51 +00:00
jakpiase
6deaaa92cc
[CK_TILE] Switch into universal gemms for conv bwds ( #2981 )
...
* switch into universal gemms for conv bwds
* some fixes and support universal gemm in conv fwd
* add reviewer comments
2025-10-14 16:09:16 +02:00
Ville Pietilä
bbed3a62dc
Fully functional CK Tile profiler.
2025-10-14 13:35:37 +00:00
msaffari-amd
589e242eda
Fix: Handle JSON boolean values (pad_m, pad_n, pad_k and persistent) in gemm_instance_builder ( #3008 )
2025-10-14 13:20:25 +02:00
Ville Pietilä
0f6bf78caa
Add empty instance factory.
2025-10-14 07:13:20 +00:00
Ville Pietilä
eaf9ba4e45
Rename CK Tile grouped conv factory.
2025-10-14 06:31:34 +00:00
ClementLinCF
e1b0bdfbfa
[CK_TILE] Correct BlockWarps calculation and fix smoke-test in rmsnorm ( #2540 )
...
* [CK_TILE] Correct BlockWarps calculation and fix smoke-test in rmsnorm
* Update rmsnorm host reference
* Update tree reduction of rmsnorm for reference host
* Fix cross warp for m > 1 cases
* Add RMSNorm model selectable option for host reference
* Fix save_unquant cases
* Update reference rmsnorm forward function to use enum for model sensitivity
* Update reference rmsnorm calculation for model sensitivity
* Fix m warp for layernorm
* Adjust parameter of reference for twoPass
* Fix clang format
* Run clang-format-overwrite.sh to fix formating issue
* fix clang format
---------
Co-authored-by: MHYang <mengyang@amd.com >
Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com >
Co-authored-by: ThomasNing <thomas.ning@amd.com >
2025-10-13 11:52:37 -07:00