Commit Graph

2496 Commits

Author SHA1 Message Date
Ville Pietilä
7722f901df Fix validation. 2025-10-17 13:07:06 +00:00
Ville Pietilä
6789c219c1 Add missing header. 2025-10-17 12:43:49 +00:00
Ville Pietilä
a92c965667 Fix fwd layouts. 2025-10-17 11:07:39 +00:00
Ville Pietilä
ef3e871e6e Add grouped conv fwd direction profiling into CK Tile profiler. 2025-10-17 10:47:23 +00:00
Ville Pietilä
0e0fb54b9f Rename conv factory. 2025-10-17 06:26:41 +00:00
Ville Pietilä
a708b177fc Add double smem buffer instances. 2025-10-17 06:24:11 +00:00
Ville Pietilä
c0b68c8a85 Add more instances. 2025-10-16 14:18:40 +00:00
Ville Pietilä
6c5531a4ae Disqualify benchmarking results from kernels that do not pass validation. 2025-10-16 12:22:51 +00:00
Ville Pietilä
76ffa1bf0a Add more instances. 2025-10-16 11:33:06 +00:00
Ville Pietilä
044bcfcb1e Take universal GEMM pipeline into use for grouped convolutions. 2025-10-16 11:03:14 +00:00
Ville Pietilä
e99b5a8c28 Merge remote-tracking branch 'origin/develop' into vpietila/ck-vs-ck-tile-conv-benchmarking 2025-10-16 07:33:08 +00:00
Ville Pietilä
9b3c61cac2 Add more instances. 2025-10-16 07:32:52 +00:00
Ville Pietilä
19fac39880 Enable vector loads in grouped conv bwd weight kernels. 2025-10-16 07:17:12 +00:00
Haocong WANG
013ba3c737 Enable storelse for fmha_fwd_trload kernel (#3023) 2025-10-16 13:51:23 +08:00
Emily Martins
0dbd173500 Fix compiler noreturn error for ck tile permute test (#3036) 2025-10-15 19:42:02 -07:00
Aviral Goel
232523d9fa docs: add quant mode comparison to readme (#3032)
* docs: add quant mode comparison to readme

* Update example/ck_tile/38_block_scale_gemm/README.md

Co-authored-by: Christopher Millette <63608002+cgmillette@users.noreply.github.com>

---------

Co-authored-by: Christopher Millette <63608002+cgmillette@users.noreply.github.com>
2025-10-15 18:35:06 -07:00
Illia Silin
87d0a3ac17 use branch develop to test hipTensor (#3034) 2025-10-15 15:40:34 -07:00
Illia Silin
3348f01e6f re-enable clang-format by default (#3030)
* re-enable clang-format by default

* fix clang format
2025-10-15 07:43:11 -07:00
Ville Pietilä
a5b60ed2f2 Add more instances. 2025-10-15 14:33:01 +00:00
Christopher Millette
bde5f26db3 Disable streamk extended regression tests for now (#3016) 2025-10-15 09:05:47 -05:00
Ville Pietilä
96a7c26a0b Better split-K handling in the template instantiation. 2025-10-15 13:47:04 +00:00
Ville Pietilä
bbe13f4635 Add more instances. 2025-10-15 13:23:55 +00:00
Ville Pietilä
23aa650172 Add min blocks per CU to invoker name. 2025-10-15 13:21:29 +00:00
Ville Pietilä
57dbd2f4a4 Remove unnecessary compilations. 2025-10-15 13:20:58 +00:00
Ville Pietilä
3c08ce1e64 Improve the grouped conv kernel name generation in CK Tile. 2025-10-15 11:02:21 +00:00
felix
4c826abfff Felix/opt sorting (#2902)
* merge felix/sorting
* opt moe sorting  (#2822)
* opt moe storing for 2k
---------
Co-authored-by: lalala-sh <Jiaxing.Wen@amd.com>
Co-authored-by: coderfeli <coderfeli@163.com>
2025-10-15 09:24:03 +08:00
AviralGoelAMD
ca1ab083a7 test(grouped_gemm_multi_d): add unit test for bf16 support 2025-10-14 18:00:43 -04:00
AviralGoelAMD
8d8b49dec2 feat(grouped_gemm_multi_d): add support for bf16 2025-10-14 18:00:43 -04:00
Geo Min
706c2b281c fixing group id (#3002) 2025-10-14 08:51:52 -07:00
joyeamd
b9d74e7746 update s_barrier's logic in gfx12 architecture (#3003)
change s_waitcnt's logic in gfx1250

change s_waitcnt's logic in gfx1250

update comment
2025-10-14 08:49:34 -07:00
Illia Silin
e4298e55c7 Revert "[CK_TILE] Non-K Major from old CK to CK-Tile (#2442)" (#3017)
This reverts commit d2bbca3eca.
2025-10-14 08:43:14 -07:00
Ville Pietilä
3d0db2ca63 Fix transferring data back to host for validation. 2025-10-14 15:02:51 +00:00
jakpiase
6deaaa92cc [CK_TILE] Switch into universal gemms for conv bwds (#2981)
* switch into universal gemms for conv bwds

* some fixes and support universal gemm in conv fwd

* add reviewer comments
2025-10-14 16:09:16 +02:00
Ville Pietilä
bbed3a62dc Fully functional CK Tile profiler. 2025-10-14 13:35:37 +00:00
msaffari-amd
589e242eda Fix: Handle JSON boolean values (pad_m, pad_n, pad_k and persistent) in gemm_instance_builder (#3008) 2025-10-14 13:20:25 +02:00
Ville Pietilä
0f6bf78caa Add empty instance factory. 2025-10-14 07:13:20 +00:00
Ville Pietilä
eaf9ba4e45 Rename CK Tile grouped conv factory. 2025-10-14 06:31:34 +00:00
ClementLinCF
e1b0bdfbfa [CK_TILE] Correct BlockWarps calculation and fix smoke-test in rmsnorm (#2540)
* [CK_TILE] Correct BlockWarps calculation and fix smoke-test in rmsnorm

* Update rmsnorm host reference

* Update tree reduction of rmsnorm for reference host

* Fix cross warp for m > 1 cases

* Add RMSNorm model selectable option for host reference

* Fix save_unquant cases

* Update reference rmsnorm forward function to use enum for model sensitivity

* Update reference rmsnorm calculation for model sensitivity

* Fix m warp for layernorm

* Adjust parameter of reference for twoPass

* Fix clang format

* Run clang-format-overwrite.sh to fix formating issue

* fix clang format

---------

Co-authored-by: MHYang <mengyang@amd.com>
Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>
Co-authored-by: ThomasNing <thomas.ning@amd.com>
2025-10-13 11:52:37 -07:00
Ville Pietilä
fc6a9e3931 Create invoker for the kernel and a factory for creating invokers. 2025-10-13 15:22:50 +00:00
John Shumway
fc2a121c44 Enable GMock and improve gtest configuration (#2976)
Our current cmake/gtest.cmake file does not enable gmock. Gmock is needed for matchers that are needed for more readable unit tests. This PR enables gmock and does a little cleanup in gtest.cmake:

* Enable BUILD_GMOCK by default (was previously disabled)
* Patch gtest-src/googlemock/CMakeLists.txt for broken include path.
* Add configuration to gmock if the target is used.

No other changes in this PR, but I've verified I can use gmock matchers correctly once I include these changes in other code.
2025-10-13 08:11:51 -07:00
Ville Pietilä
a60dab521e Added a placeholder conv bwd instance factory for CK Tile profiler. 2025-10-13 14:32:20 +00:00
Ville Pietilä
6dcee56fee WIP: CK Tile conv bwd profiler. 2025-10-13 13:03:21 +00:00
Sami Remes
d2bbca3eca [CK_TILE] Non-K Major from old CK to CK-Tile (#2442)
* Enable the adapted LDS B layout for Row-Major

* fix formatting

* Implement specialized col-major A LDS block descriptor

* Fix formatting

* Use VecLoadSize for AK1/BK1

* Fix some thread access pattern values

* Use GetVectorSizeA for A

* Fix formatting

* Add extra condition to avoid division by zero

* disable layout for wave32

* remove extra else

* fix formatting

* Fix formatting

* Rename one remaining TileDistributionEncodingPattern2D

* Use integer ceil division

* revert remod.py changes

* also revert utility.hpp

* use getA/BTileAccessPattern everywhere

* use integer_divide_ceil for AK0 too

---------

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>
Co-authored-by: Adam Osewski <Adam.Osewski@amd.com>
2025-10-13 14:27:02 +02:00
aledudek
634634f5c0 [CK_TILE] Blockwise GEMM pipeline v6 - port of v5 from old CK (#2955)
* First checkpoint

* Second checkpoint - hot loop scheduler

* Third checkpoint - init main operator

* Fourth checkpoint - main loop ready

* Fifth checkpoint - main loop fix

* Sixth checkpoint - ReadWritecompFunc

* Seventh checkpoint - Tail finished

* [CK_TILE] Blockwise gemm pipeline v5 complete

* Working

* Working fixes 2

* Rename v5 to v77 temporarily

* Data type adjustment

* Data type adjustment 2

* [CK_TILE] Blockwise Gemm pipeline v5 add tests

* [CK_TILE] Fix calculation error

* TEMP: check pipeline

* Fix name to V6

* naming and documentation changes

* WIP dump

* Try fixing v1

* Failing tests v5

* Debugging

* Changes v2

* F16 tests working great

* Working BlockwiseGemmPipelineV5 as V6

* Cleanup and format

* Merging changes part1

* [CK_TILE] Blockwise Gemm Pipeline Comp V5/V6

* Remove commented code

* Fix gfx950 build issues

* Fix file formatting

* Review changes, more concat info, add bf16 bf8 tests

* Fix formatting

* Add bf16 and bf8 tests

---------

Co-authored-by: Adam Osewski <Adam.Osewski@amd.com>
2025-10-13 13:57:37 +02:00
aledudek
3021604213 [CK_TILE] Batched Gemm Kernel IsSupported function checks (#2860)
* Add valid check batched gemm part1

* [CK_TILE] Add batched gemm kernel IsSupported func checks

* revert broken pre-commit hook changes

* revert broken pre-commit hook changes v2

* Clarify error messages
2025-10-13 13:55:23 +02:00
Ville Pietilä
d62f34348a Skeleton for the ckTileProfiler. 2025-10-13 11:40:31 +00:00
damien-lejeune
46c10c316d Update include path to break the remod's cyclic dep issue (#2978)
* Update include path to break the cyclic dep issue

* Use ck_tile::permute_vectors_i4x4_b in tile engine

---------

Co-authored-by: Damien Lejeune <damien.lejeune@amd.com>
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>
2025-10-13 13:24:47 +02:00
msaffari-amd
e9f0cc83a8 [CK Tile] contraction multi d - kernel & example (#2901)
* Initial commit. create batched_contraction_kernel file

* initial problem definition

* implement initial example to launch kernel

* add universal gemm to contraction. initial phase

* complete implementation for special case all Dims are 1 and no Ds

* clean code

* initial changes to support multi dimensional G

* more progress in implementing multiple G

* tmp commit

* manage dynamic NumDimG in kernel

* improving example for multi M,N,K,G handling. start generalizing kernel. it is a temporary commit

* implement the example for general Multi dimension G M N K and test different reference calculation algorithms

* 2 functions for reference using multi dimensional and flat indexing

* clean the code for muti dimentional G, M, N, K contraction and add some logs

* Add Make descriptor function in kernel for merging Ms, Ns, Ks for A, B, E

* some cleaning on kernel

* clean the code for  calculating the offsets from flatten batch number

* Start adding MultiD support to kernel and example

* more changes to manage multi D in kernel and example

* manage passing multi d to kernel and testing.

* complete multi D support in kernel. modify example code to support it

* Correct algorithm to calc the correct offset values for D tensor batches and some code cleaning

* Minor fix

* Generalize example code for variable NumD tensors and apply cleanup based on review feedback

* Refactored code and addressed review feedback

* refactoring, cleaning, add documents, in kernel side and example codes

* Optimize batch offset calculation in kernel

* Inline CalculateBatchOffset in batched contraction kernel, update CHANGELOG.md

---------

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>
2025-10-13 12:30:28 +02:00
Ville Pietilä
94569f3991 Build only grouped conv profilers. 2025-10-13 10:01:42 +00:00
Yi DING
95bdc7410c [CK_TILE] FMHA BWD Add Instance for D48 on GFX950 (#2866)
Co-authored-by: asleepzzz <hanwen.chang@amd.com>
2025-10-13 15:03:46 +08:00