Commit Graph

2508 Commits

Author SHA1 Message Date
Ville Pietilä
60db94037e Fix compilation issues on MI300. 2025-10-22 12:51:02 +00:00
Ville Pietilä
784c84c54b Benchmarking script improvements. 2025-10-21 15:09:57 +00:00
Ville Pietilä
a5afa4c07e Script to convert MIOpenDriver commands to CK profiler input. 2025-10-21 08:16:14 +00:00
Ville Pietilä
159bcf6750 Small script improvements. 2025-10-21 06:38:49 +00:00
Ville Pietilä
2efd174b33 Add new kernel instances. 2025-10-20 12:01:57 +00:00
Ville Pietilä
307ca52156 Improve benchmarking and analysis script. 2025-10-20 10:50:34 +00:00
Ville Pietilä
1247133245 Parallel compilation of the CK Tile instances. 2025-10-20 09:31:53 +00:00
Ville Pietilä
3aec8b5493 Improve profiler output. 2025-10-20 08:04:59 +00:00
Ville Pietilä
dc65dc98e1 Optimize calculation of the CPU reference. 2025-10-17 14:48:12 +00:00
Ville Pietilä
949bf1149f Add back BF16 instances. 2025-10-17 14:47:39 +00:00
Ville Pietilä
697dd2e6f1 Create runner script to runs CK and CK Tile profilers. 2025-10-17 14:27:52 +00:00
Ville Pietilä
28055fdd9a Improve profiler output. 2025-10-17 13:17:29 +00:00
Ville Pietilä
7722f901df Fix validation. 2025-10-17 13:07:06 +00:00
Ville Pietilä
6789c219c1 Add missing header. 2025-10-17 12:43:49 +00:00
Ville Pietilä
a92c965667 Fix fwd layouts. 2025-10-17 11:07:39 +00:00
Ville Pietilä
ef3e871e6e Add grouped conv fwd direction profiling into CK Tile profiler. 2025-10-17 10:47:23 +00:00
Ville Pietilä
0e0fb54b9f Rename conv factory. 2025-10-17 06:26:41 +00:00
Ville Pietilä
a708b177fc Add double smem buffer instances. 2025-10-17 06:24:11 +00:00
Ville Pietilä
c0b68c8a85 Add more instances. 2025-10-16 14:18:40 +00:00
Ville Pietilä
6c5531a4ae Disqualify benchmarking results from kernels that do not pass validation. 2025-10-16 12:22:51 +00:00
Ville Pietilä
76ffa1bf0a Add more instances. 2025-10-16 11:33:06 +00:00
Ville Pietilä
044bcfcb1e Take universal GEMM pipeline into use for grouped convolutions. 2025-10-16 11:03:14 +00:00
Ville Pietilä
e99b5a8c28 Merge remote-tracking branch 'origin/develop' into vpietila/ck-vs-ck-tile-conv-benchmarking 2025-10-16 07:33:08 +00:00
Ville Pietilä
9b3c61cac2 Add more instances. 2025-10-16 07:32:52 +00:00
Ville Pietilä
19fac39880 Enable vector loads in grouped conv bwd weight kernels. 2025-10-16 07:17:12 +00:00
Haocong WANG
013ba3c737 Enable storelse for fmha_fwd_trload kernel (#3023) 2025-10-16 13:51:23 +08:00
Emily Martins
0dbd173500 Fix compiler noreturn error for ck tile permute test (#3036) 2025-10-15 19:42:02 -07:00
Aviral Goel
232523d9fa docs: add quant mode comparison to readme (#3032)
* docs: add quant mode comparison to readme

* Update example/ck_tile/38_block_scale_gemm/README.md

Co-authored-by: Christopher Millette <63608002+cgmillette@users.noreply.github.com>

---------

Co-authored-by: Christopher Millette <63608002+cgmillette@users.noreply.github.com>
2025-10-15 18:35:06 -07:00
Illia Silin
87d0a3ac17 use branch develop to test hipTensor (#3034) 2025-10-15 15:40:34 -07:00
Illia Silin
3348f01e6f re-enable clang-format by default (#3030)
* re-enable clang-format by default

* fix clang format
2025-10-15 07:43:11 -07:00
Ville Pietilä
a5b60ed2f2 Add more instances. 2025-10-15 14:33:01 +00:00
Christopher Millette
bde5f26db3 Disable streamk extended regression tests for now (#3016) 2025-10-15 09:05:47 -05:00
Ville Pietilä
96a7c26a0b Better split-K handling in the template instantiation. 2025-10-15 13:47:04 +00:00
Ville Pietilä
bbe13f4635 Add more instances. 2025-10-15 13:23:55 +00:00
Ville Pietilä
23aa650172 Add min blocks per CU to invoker name. 2025-10-15 13:21:29 +00:00
Ville Pietilä
57dbd2f4a4 Remove unnecessary compilations. 2025-10-15 13:20:58 +00:00
Ville Pietilä
3c08ce1e64 Improve the grouped conv kernel name generation in CK Tile. 2025-10-15 11:02:21 +00:00
felix
4c826abfff Felix/opt sorting (#2902)
* merge felix/sorting
* opt moe sorting  (#2822)
* opt moe storing for 2k
---------
Co-authored-by: lalala-sh <Jiaxing.Wen@amd.com>
Co-authored-by: coderfeli <coderfeli@163.com>
2025-10-15 09:24:03 +08:00
AviralGoelAMD
ca1ab083a7 test(grouped_gemm_multi_d): add unit test for bf16 support 2025-10-14 18:00:43 -04:00
AviralGoelAMD
8d8b49dec2 feat(grouped_gemm_multi_d): add support for bf16 2025-10-14 18:00:43 -04:00
Geo Min
706c2b281c fixing group id (#3002) 2025-10-14 08:51:52 -07:00
joyeamd
b9d74e7746 update s_barrier's logic in gfx12 architecture (#3003)
change s_waitcnt's logic in gfx1250

change s_waitcnt's logic in gfx1250

update comment
2025-10-14 08:49:34 -07:00
Illia Silin
e4298e55c7 Revert "[CK_TILE] Non-K Major from old CK to CK-Tile (#2442)" (#3017)
This reverts commit d2bbca3eca.
2025-10-14 08:43:14 -07:00
Ville Pietilä
3d0db2ca63 Fix transferring data back to host for validation. 2025-10-14 15:02:51 +00:00
jakpiase
6deaaa92cc [CK_TILE] Switch into universal gemms for conv bwds (#2981)
* switch into universal gemms for conv bwds

* some fixes and support universal gemm in conv fwd

* add reviewer comments
2025-10-14 16:09:16 +02:00
Ville Pietilä
bbed3a62dc Fully functional CK Tile profiler. 2025-10-14 13:35:37 +00:00
msaffari-amd
589e242eda Fix: Handle JSON boolean values (pad_m, pad_n, pad_k and persistent) in gemm_instance_builder (#3008) 2025-10-14 13:20:25 +02:00
Ville Pietilä
0f6bf78caa Add empty instance factory. 2025-10-14 07:13:20 +00:00
Ville Pietilä
eaf9ba4e45 Rename CK Tile grouped conv factory. 2025-10-14 06:31:34 +00:00
ClementLinCF
e1b0bdfbfa [CK_TILE] Correct BlockWarps calculation and fix smoke-test in rmsnorm (#2540)
* [CK_TILE] Correct BlockWarps calculation and fix smoke-test in rmsnorm

* Update rmsnorm host reference

* Update tree reduction of rmsnorm for reference host

* Fix cross warp for m > 1 cases

* Add RMSNorm model selectable option for host reference

* Fix save_unquant cases

* Update reference rmsnorm forward function to use enum for model sensitivity

* Update reference rmsnorm calculation for model sensitivity

* Fix m warp for layernorm

* Adjust parameter of reference for twoPass

* Fix clang format

* Run clang-format-overwrite.sh to fix formating issue

* fix clang format

---------

Co-authored-by: MHYang <mengyang@amd.com>
Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>
Co-authored-by: ThomasNing <thomas.ning@amd.com>
2025-10-13 11:52:37 -07:00