Commit Graph

2488 Commits

Author SHA1 Message Date
Ville Pietilä
76ffa1bf0a Add more instances. 2025-10-16 11:33:06 +00:00
Ville Pietilä
044bcfcb1e Take universal GEMM pipeline into use for grouped convolutions. 2025-10-16 11:03:14 +00:00
Ville Pietilä
e99b5a8c28 Merge remote-tracking branch 'origin/develop' into vpietila/ck-vs-ck-tile-conv-benchmarking 2025-10-16 07:33:08 +00:00
Ville Pietilä
9b3c61cac2 Add more instances. 2025-10-16 07:32:52 +00:00
Ville Pietilä
19fac39880 Enable vector loads in grouped conv bwd weight kernels. 2025-10-16 07:17:12 +00:00
Haocong WANG
013ba3c737 Enable storelse for fmha_fwd_trload kernel (#3023) 2025-10-16 13:51:23 +08:00
Emily Martins
0dbd173500 Fix compiler noreturn error for ck tile permute test (#3036) 2025-10-15 19:42:02 -07:00
Aviral Goel
232523d9fa docs: add quant mode comparison to readme (#3032)
* docs: add quant mode comparison to readme

* Update example/ck_tile/38_block_scale_gemm/README.md

Co-authored-by: Christopher Millette <63608002+cgmillette@users.noreply.github.com>

---------

Co-authored-by: Christopher Millette <63608002+cgmillette@users.noreply.github.com>
2025-10-15 18:35:06 -07:00
Illia Silin
87d0a3ac17 use branch develop to test hipTensor (#3034) 2025-10-15 15:40:34 -07:00
Illia Silin
3348f01e6f re-enable clang-format by default (#3030)
* re-enable clang-format by default

* fix clang format
2025-10-15 07:43:11 -07:00
Ville Pietilä
a5b60ed2f2 Add more instances. 2025-10-15 14:33:01 +00:00
Christopher Millette
bde5f26db3 Disable streamk extended regression tests for now (#3016) 2025-10-15 09:05:47 -05:00
Ville Pietilä
96a7c26a0b Better split-K handling in the template instantiation. 2025-10-15 13:47:04 +00:00
Ville Pietilä
bbe13f4635 Add more instances. 2025-10-15 13:23:55 +00:00
Ville Pietilä
23aa650172 Add min blocks per CU to invoker name. 2025-10-15 13:21:29 +00:00
Ville Pietilä
57dbd2f4a4 Remove unnecessary compilations. 2025-10-15 13:20:58 +00:00
Ville Pietilä
3c08ce1e64 Improve the grouped conv kernel name generation in CK Tile. 2025-10-15 11:02:21 +00:00
felix
4c826abfff Felix/opt sorting (#2902)
* merge felix/sorting
* opt moe sorting  (#2822)
* opt moe storing for 2k
---------
Co-authored-by: lalala-sh <Jiaxing.Wen@amd.com>
Co-authored-by: coderfeli <coderfeli@163.com>
2025-10-15 09:24:03 +08:00
AviralGoelAMD
ca1ab083a7 test(grouped_gemm_multi_d): add unit test for bf16 support 2025-10-14 18:00:43 -04:00
AviralGoelAMD
8d8b49dec2 feat(grouped_gemm_multi_d): add support for bf16 2025-10-14 18:00:43 -04:00
Geo Min
706c2b281c fixing group id (#3002) 2025-10-14 08:51:52 -07:00
joyeamd
b9d74e7746 update s_barrier's logic in gfx12 architecture (#3003)
change s_waitcnt's logic in gfx1250

change s_waitcnt's logic in gfx1250

update comment
2025-10-14 08:49:34 -07:00
Illia Silin
e4298e55c7 Revert "[CK_TILE] Non-K Major from old CK to CK-Tile (#2442)" (#3017)
This reverts commit d2bbca3eca.
2025-10-14 08:43:14 -07:00
Ville Pietilä
3d0db2ca63 Fix transferring data back to host for validation. 2025-10-14 15:02:51 +00:00
jakpiase
6deaaa92cc [CK_TILE] Switch into universal gemms for conv bwds (#2981)
* switch into universal gemms for conv bwds

* some fixes and support universal gemm in conv fwd

* add reviewer comments
2025-10-14 16:09:16 +02:00
Ville Pietilä
bbed3a62dc Fully functional CK Tile profiler. 2025-10-14 13:35:37 +00:00
msaffari-amd
589e242eda Fix: Handle JSON boolean values (pad_m, pad_n, pad_k and persistent) in gemm_instance_builder (#3008) 2025-10-14 13:20:25 +02:00
Ville Pietilä
0f6bf78caa Add empty instance factory. 2025-10-14 07:13:20 +00:00
Ville Pietilä
eaf9ba4e45 Rename CK Tile grouped conv factory. 2025-10-14 06:31:34 +00:00
ClementLinCF
e1b0bdfbfa [CK_TILE] Correct BlockWarps calculation and fix smoke-test in rmsnorm (#2540)
* [CK_TILE] Correct BlockWarps calculation and fix smoke-test in rmsnorm

* Update rmsnorm host reference

* Update tree reduction of rmsnorm for reference host

* Fix cross warp for m > 1 cases

* Add RMSNorm model selectable option for host reference

* Fix save_unquant cases

* Update reference rmsnorm forward function to use enum for model sensitivity

* Update reference rmsnorm calculation for model sensitivity

* Fix m warp for layernorm

* Adjust parameter of reference for twoPass

* Fix clang format

* Run clang-format-overwrite.sh to fix formating issue

* fix clang format

---------

Co-authored-by: MHYang <mengyang@amd.com>
Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>
Co-authored-by: ThomasNing <thomas.ning@amd.com>
2025-10-13 11:52:37 -07:00
Ville Pietilä
fc6a9e3931 Create invoker for the kernel and a factory for creating invokers. 2025-10-13 15:22:50 +00:00
John Shumway
fc2a121c44 Enable GMock and improve gtest configuration (#2976)
Our current cmake/gtest.cmake file does not enable gmock. Gmock is needed for matchers that are needed for more readable unit tests. This PR enables gmock and does a little cleanup in gtest.cmake:

* Enable BUILD_GMOCK by default (was previously disabled)
* Patch gtest-src/googlemock/CMakeLists.txt for broken include path.
* Add configuration to gmock if the target is used.

No other changes in this PR, but I've verified I can use gmock matchers correctly once I include these changes in other code.
2025-10-13 08:11:51 -07:00
Ville Pietilä
a60dab521e Added a placeholder conv bwd instance factory for CK Tile profiler. 2025-10-13 14:32:20 +00:00
Ville Pietilä
6dcee56fee WIP: CK Tile conv bwd profiler. 2025-10-13 13:03:21 +00:00
Sami Remes
d2bbca3eca [CK_TILE] Non-K Major from old CK to CK-Tile (#2442)
* Enable the adapted LDS B layout for Row-Major

* fix formatting

* Implement specialized col-major A LDS block descriptor

* Fix formatting

* Use VecLoadSize for AK1/BK1

* Fix some thread access pattern values

* Use GetVectorSizeA for A

* Fix formatting

* Add extra condition to avoid division by zero

* disable layout for wave32

* remove extra else

* fix formatting

* Fix formatting

* Rename one remaining TileDistributionEncodingPattern2D

* Use integer ceil division

* revert remod.py changes

* also revert utility.hpp

* use getA/BTileAccessPattern everywhere

* use integer_divide_ceil for AK0 too

---------

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>
Co-authored-by: Adam Osewski <Adam.Osewski@amd.com>
2025-10-13 14:27:02 +02:00
aledudek
634634f5c0 [CK_TILE] Blockwise GEMM pipeline v6 - port of v5 from old CK (#2955)
* First checkpoint

* Second checkpoint - hot loop scheduler

* Third checkpoint - init main operator

* Fourth checkpoint - main loop ready

* Fifth checkpoint - main loop fix

* Sixth checkpoint - ReadWritecompFunc

* Seventh checkpoint - Tail finished

* [CK_TILE] Blockwise gemm pipeline v5 complete

* Working

* Working fixes 2

* Rename v5 to v77 temporarily

* Data type adjustment

* Data type adjustment 2

* [CK_TILE] Blockwise Gemm pipeline v5 add tests

* [CK_TILE] Fix calculation error

* TEMP: check pipeline

* Fix name to V6

* naming and documentation changes

* WIP dump

* Try fixing v1

* Failing tests v5

* Debugging

* Changes v2

* F16 tests working great

* Working BlockwiseGemmPipelineV5 as V6

* Cleanup and format

* Merging changes part1

* [CK_TILE] Blockwise Gemm Pipeline Comp V5/V6

* Remove commented code

* Fix gfx950 build issues

* Fix file formatting

* Review changes, more concat info, add bf16 bf8 tests

* Fix formatting

* Add bf16 and bf8 tests

---------

Co-authored-by: Adam Osewski <Adam.Osewski@amd.com>
2025-10-13 13:57:37 +02:00
aledudek
3021604213 [CK_TILE] Batched Gemm Kernel IsSupported function checks (#2860)
* Add valid check batched gemm part1

* [CK_TILE] Add batched gemm kernel IsSupported func checks

* revert broken pre-commit hook changes

* revert broken pre-commit hook changes v2

* Clarify error messages
2025-10-13 13:55:23 +02:00
Ville Pietilä
d62f34348a Skeleton for the ckTileProfiler. 2025-10-13 11:40:31 +00:00
damien-lejeune
46c10c316d Update include path to break the remod's cyclic dep issue (#2978)
* Update include path to break the cyclic dep issue

* Use ck_tile::permute_vectors_i4x4_b in tile engine

---------

Co-authored-by: Damien Lejeune <damien.lejeune@amd.com>
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>
2025-10-13 13:24:47 +02:00
msaffari-amd
e9f0cc83a8 [CK Tile] contraction multi d - kernel & example (#2901)
* Initial commit. create batched_contraction_kernel file

* initial problem definition

* implement initial example to launch kernel

* add universal gemm to contraction. initial phase

* complete implementation for special case all Dims are 1 and no Ds

* clean code

* initial changes to support multi dimensional G

* more progress in implementing multiple G

* tmp commit

* manage dynamic NumDimG in kernel

* improving example for multi M,N,K,G handling. start generalizing kernel. it is a temporary commit

* implement the example for general Multi dimension G M N K and test different reference calculation algorithms

* 2 functions for reference using multi dimensional and flat indexing

* clean the code for muti dimentional G, M, N, K contraction and add some logs

* Add Make descriptor function in kernel for merging Ms, Ns, Ks for A, B, E

* some cleaning on kernel

* clean the code for  calculating the offsets from flatten batch number

* Start adding MultiD support to kernel and example

* more changes to manage multi D in kernel and example

* manage passing multi d to kernel and testing.

* complete multi D support in kernel. modify example code to support it

* Correct algorithm to calc the correct offset values for D tensor batches and some code cleaning

* Minor fix

* Generalize example code for variable NumD tensors and apply cleanup based on review feedback

* Refactored code and addressed review feedback

* refactoring, cleaning, add documents, in kernel side and example codes

* Optimize batch offset calculation in kernel

* Inline CalculateBatchOffset in batched contraction kernel, update CHANGELOG.md

---------

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>
2025-10-13 12:30:28 +02:00
Ville Pietilä
94569f3991 Build only grouped conv profilers. 2025-10-13 10:01:42 +00:00
Yi DING
95bdc7410c [CK_TILE] FMHA BWD Add Instance for D48 on GFX950 (#2866)
Co-authored-by: asleepzzz <hanwen.chang@amd.com>
2025-10-13 15:03:46 +08:00
Christopher Millette
f5708882a3 Streamk functional tests (#2974)
* Add initial fp16_mem_128x128x32_2x2x1_32x32x16_NonPersistent test suite

* Account for stride when computing K offsets for A and B tensor

This change ensures that the correct stride is used when computing the K
offsets into the A and B tensors in the Stream-K Kernel's operator()
function. This ensures that the kernel executes correct regardless of
whether A and B are row or column major.

* Move helper code to test_gemm_streamk_util.hpp

* Separate tests into smoke/regression/extended. Add bf16 datatype

* Run clang-format

* Refactor combinatorial macro expansion and naming

* Adjust the initialization values to account for better tolerance on bf16

* Correct BF16 datatypes in comments

* Move the extended tests under the REGRESSION_TESTS label

* Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Emily Martins <emily.martins@amd.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-10-11 07:53:40 -05:00
John Shumway
0843815db7 Fix GCC 7 CTAD compilation error in test_fmha_bwd.cpp (#3001)
Fixes compilation error on SLES15 with GCC 7 for gfx942 builds:

error: 'vector' may not intend to support class template argument deduction [-Werror,-Wctad-maybe-unsupported]

Changes:

- Explicitly specify template argument for `std::vector<mode_enum>` instead of relying on C++17 CTAD
- Maintains compatibility with both older (GCC 7) and newer compilers
2025-10-10 19:13:34 -07:00
Khushbu Agarwal
3c39d279ab supporting prefill shapes for preshuffle block scale gemm (#2975)
* debugging

* debugging for prefill shapes

* comment unused code

* fix for prefill shapes

* clearing up the code

* add int4 to universal gemm example

* clang formatted

* adding test for prefill shapes in block scale gemm

* lil improv on the block pipeline

* Address Review Comment

---------

Co-authored-by: ThomasNing <thomas.ning@amd.com>
2025-10-10 15:36:24 -07:00
Max Podkorytov
9d060d3e3c [CK-Tile] functional support for transposed inputs in compute-bound double-lds-buffer pipeline with async loads from global memory to LDS (#2984)
* reuse local prefetch logic from compute v4 pipeline

add single-tile test

explicit lambda capture

reuse lds block descriptors from base policy for the transposed case

match the test case kernel configuration with compute v4

* add comments
2025-10-10 12:57:50 -07:00
yinglu
fada1a3cae Conv:TF32: add more instances - 2 (#2879)
* add instances of device_grouped_conv_fwd_xdl_f32_comp_instances
* add instances of device_grouped_conv_fwd_xdl_f32_tf32_mem_instances
* add instances of device_grouped_conv_fwd_xdl_large_tensor_f32_tf32_instances
* tf32:conv:add instances for base class DeviceConvFwd
* tf32:conv:add instances for base class DeviceGroupedConvBwdDataMultipleD
* tf32:conv:add instances for base class DeviceGroupedConvBwdWeight
* add tf32 in profiler
* remove gnhwc/ngchw/ngcdhw instances
* remove non-ndhwgc/nhwgc/nhwc instances
* add check in IsSupportedArgument()
2025-10-10 15:28:17 +08:00
Bartłomiej Kocot
ad7a215aba Fix splitK for grouped conv bwd data (#2991) 2025-10-10 09:24:21 +02:00
Yi DING
b6036bc76a [CK_TILE] FMHA Tests Enhancement (#2945)
* fmha-gtest-wip

* Thanks Copilot!
2025-10-10 11:34:47 +08:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟
fb66b4f5e4 [CK_TILE] fix pk_fp4 compilation for non-gfx950 GPUs (#2983)
See build error log from
https://github.com/ROCm/composable_kernel/issues/2271#issuecomment-3150218542

This PR make vector element access constexpr-safe by avoiding operator[] on
ext_vector_type(2) and replace those sites in the pk_fp4 conversions so they
can be used in constant expressions, as The operator[] on ext_vector_type(2)
isn't allowed in constant expressions, which caused "constexpr function never
produces a constant expression" with a note at x[0]. Using `bit_cast` to a
trivial array representation keeps it constexpr-compatible.

Signed-off-by: Hollow Man <hollowman@opensuse.org>
2025-10-09 07:43:41 -07:00