Commit Graph

4088 Commits

Author SHA1 Message Date
felix
b6f6b7cd2a Felix/opt sorting (#2902)
* merge felix/sorting
* opt moe sorting  (#2822)
* opt moe storing for 2k
---------
Co-authored-by: lalala-sh <Jiaxing.Wen@amd.com>
Co-authored-by: coderfeli <coderfeli@163.com>

[ROCm/composable_kernel commit: 4c826abfff]
2025-10-15 09:24:03 +08:00
assistant-librarian[bot]
b03940ab5f Merge commit 'ca1ab083a7da42a76a40f8a6802b72b61963efc1' into develop 2025-10-14 22:12:52 +00:00
AviralGoelAMD
f0b0b1e838 test(grouped_gemm_multi_d): add unit test for bf16 support
[ROCm/composable_kernel commit: ca1ab083a7]
2025-10-14 18:00:43 -04:00
AviralGoelAMD
49286aab3f feat(grouped_gemm_multi_d): add support for bf16
[ROCm/composable_kernel commit: 8d8b49dec2]
2025-10-14 18:00:43 -04:00
assistant-librarian[bot]
e787672715 Merge commit '706c2b281caa201d2c9064e8940e0eb6c9e6710b' into develop 2025-10-14 16:13:22 +00:00
Geo Min
266ab45f7a fixing group id (#3002)
[ROCm/composable_kernel commit: 706c2b281c]
2025-10-14 08:51:52 -07:00
joyeamd
ed83bcb9a2 update s_barrier's logic in gfx12 architecture (#3003)
change s_waitcnt's logic in gfx1250

change s_waitcnt's logic in gfx1250

update comment

[ROCm/composable_kernel commit: b9d74e7746]
2025-10-14 08:49:34 -07:00
Illia Silin
3a9bd7c1ff Revert "[CK_TILE] Non-K Major from old CK to CK-Tile (#2442)" (#3017)
This reverts commit 3653a0d01edd715f5c9759eaf547a7644c762b8e.

[ROCm/composable_kernel commit: e4298e55c7]
2025-10-14 08:43:14 -07:00
assistant-librarian[bot]
db6db740c4 Merge commit '6deaaa92cc561f5bc29d956d6f6de903db19a079' into develop 2025-10-14 14:13:13 +00:00
jakpiase
72a1a1ca59 [CK_TILE] Switch into universal gemms for conv bwds (#2981)
* switch into universal gemms for conv bwds

* some fixes and support universal gemm in conv fwd

* add reviewer comments

[ROCm/composable_kernel commit: 6deaaa92cc]
2025-10-14 16:09:16 +02:00
assistant-librarian[bot]
4c0b5201eb Merge commit '589e242eda730958b36c4f78bfad1991c499b0d2' into develop 2025-10-14 12:17:41 +00:00
msaffari-amd
6b4d770179 Fix: Handle JSON boolean values (pad_m, pad_n, pad_k and persistent) in gemm_instance_builder (#3008)
[ROCm/composable_kernel commit: 589e242eda]
2025-10-14 13:20:25 +02:00
assistant-librarian[bot]
63d907604b Merge commit 'e1b0bdfbfa92f47006fdbced627c7470eacdea2b' into develop 2025-10-13 19:10:56 +00:00
ClementLinCF
7907a466de [CK_TILE] Correct BlockWarps calculation and fix smoke-test in rmsnorm (#2540)
* [CK_TILE] Correct BlockWarps calculation and fix smoke-test in rmsnorm

* Update rmsnorm host reference

* Update tree reduction of rmsnorm for reference host

* Fix cross warp for m > 1 cases

* Add RMSNorm model selectable option for host reference

* Fix save_unquant cases

* Update reference rmsnorm forward function to use enum for model sensitivity

* Update reference rmsnorm calculation for model sensitivity

* Fix m warp for layernorm

* Adjust parameter of reference for twoPass

* Fix clang format

* Run clang-format-overwrite.sh to fix formating issue

* fix clang format

---------

Co-authored-by: MHYang <mengyang@amd.com>
Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>
Co-authored-by: ThomasNing <thomas.ning@amd.com>

[ROCm/composable_kernel commit: e1b0bdfbfa]
2025-10-13 11:52:37 -07:00
assistant-librarian[bot]
713691609c Merge commit 'fc2a121c4446b4ca939e977563528019b30e6114' into develop 2025-10-13 15:12:25 +00:00
John Shumway
d4601123d2 Enable GMock and improve gtest configuration (#2976)
Our current cmake/gtest.cmake file does not enable gmock. Gmock is needed for matchers that are needed for more readable unit tests. This PR enables gmock and does a little cleanup in gtest.cmake:

* Enable BUILD_GMOCK by default (was previously disabled)
* Patch gtest-src/googlemock/CMakeLists.txt for broken include path.
* Add configuration to gmock if the target is used.

No other changes in this PR, but I've verified I can use gmock matchers correctly once I include these changes in other code.

[ROCm/composable_kernel commit: fc2a121c44]
2025-10-13 08:11:51 -07:00
assistant-librarian[bot]
1fdfe40874 Merge commit 'd2bbca3eca2bd14014e3daae39ae70846ec8218b' into develop 2025-10-13 13:20:32 +00:00
Sami Remes
4426784f38 [CK_TILE] Non-K Major from old CK to CK-Tile (#2442)
* Enable the adapted LDS B layout for Row-Major

* fix formatting

* Implement specialized col-major A LDS block descriptor

* Fix formatting

* Use VecLoadSize for AK1/BK1

* Fix some thread access pattern values

* Use GetVectorSizeA for A

* Fix formatting

* Add extra condition to avoid division by zero

* disable layout for wave32

* remove extra else

* fix formatting

* Fix formatting

* Rename one remaining TileDistributionEncodingPattern2D

* Use integer ceil division

* revert remod.py changes

* also revert utility.hpp

* use getA/BTileAccessPattern everywhere

* use integer_divide_ceil for AK0 too

---------

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>
Co-authored-by: Adam Osewski <Adam.Osewski@amd.com>

[ROCm/composable_kernel commit: d2bbca3eca]
2025-10-13 14:27:02 +02:00
assistant-librarian[bot]
9bbf7016b6 Merge commit '634634f5c09a3b42f5f838a5af9c948602e246db' into develop 2025-10-13 12:17:18 +00:00
aledudek
bf0a5cbb11 [CK_TILE] Blockwise GEMM pipeline v6 - port of v5 from old CK (#2955)
* First checkpoint

* Second checkpoint - hot loop scheduler

* Third checkpoint - init main operator

* Fourth checkpoint - main loop ready

* Fifth checkpoint - main loop fix

* Sixth checkpoint - ReadWritecompFunc

* Seventh checkpoint - Tail finished

* [CK_TILE] Blockwise gemm pipeline v5 complete

* Working

* Working fixes 2

* Rename v5 to v77 temporarily

* Data type adjustment

* Data type adjustment 2

* [CK_TILE] Blockwise Gemm pipeline v5 add tests

* [CK_TILE] Fix calculation error

* TEMP: check pipeline

* Fix name to V6

* naming and documentation changes

* WIP dump

* Try fixing v1

* Failing tests v5

* Debugging

* Changes v2

* F16 tests working great

* Working BlockwiseGemmPipelineV5 as V6

* Cleanup and format

* Merging changes part1

* [CK_TILE] Blockwise Gemm Pipeline Comp V5/V6

* Remove commented code

* Fix gfx950 build issues

* Fix file formatting

* Review changes, more concat info, add bf16 bf8 tests

* Fix formatting

* Add bf16 and bf8 tests

---------

Co-authored-by: Adam Osewski <Adam.Osewski@amd.com>

[ROCm/composable_kernel commit: 634634f5c0]
2025-10-13 13:57:37 +02:00
aledudek
f1c8acbd71 [CK_TILE] Batched Gemm Kernel IsSupported function checks (#2860)
* Add valid check batched gemm part1

* [CK_TILE] Add batched gemm kernel IsSupported func checks

* revert broken pre-commit hook changes

* revert broken pre-commit hook changes v2

* Clarify error messages

[ROCm/composable_kernel commit: 3021604213]
2025-10-13 13:55:23 +02:00
damien-lejeune
cca873a770 Update include path to break the remod's cyclic dep issue (#2978)
* Update include path to break the cyclic dep issue

* Use ck_tile::permute_vectors_i4x4_b in tile engine

---------

Co-authored-by: Damien Lejeune <damien.lejeune@amd.com>
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

[ROCm/composable_kernel commit: 46c10c316d]
2025-10-13 13:24:47 +02:00
assistant-librarian[bot]
19ea53596e Merge commit 'e9f0cc83a8f3f94ad8462e50a9d9a92d8dca3388' into develop 2025-10-13 11:11:52 +00:00
msaffari-amd
bcc9d9e514 [CK Tile] contraction multi d - kernel & example (#2901)
* Initial commit. create batched_contraction_kernel file

* initial problem definition

* implement initial example to launch kernel

* add universal gemm to contraction. initial phase

* complete implementation for special case all Dims are 1 and no Ds

* clean code

* initial changes to support multi dimensional G

* more progress in implementing multiple G

* tmp commit

* manage dynamic NumDimG in kernel

* improving example for multi M,N,K,G handling. start generalizing kernel. it is a temporary commit

* implement the example for general Multi dimension G M N K and test different reference calculation algorithms

* 2 functions for reference using multi dimensional and flat indexing

* clean the code for muti dimentional G, M, N, K contraction and add some logs

* Add Make descriptor function in kernel for merging Ms, Ns, Ks for A, B, E

* some cleaning on kernel

* clean the code for  calculating the offsets from flatten batch number

* Start adding MultiD support to kernel and example

* more changes to manage multi D in kernel and example

* manage passing multi d to kernel and testing.

* complete multi D support in kernel. modify example code to support it

* Correct algorithm to calc the correct offset values for D tensor batches and some code cleaning

* Minor fix

* Generalize example code for variable NumD tensors and apply cleanup based on review feedback

* Refactored code and addressed review feedback

* refactoring, cleaning, add documents, in kernel side and example codes

* Optimize batch offset calculation in kernel

* Inline CalculateBatchOffset in batched contraction kernel, update CHANGELOG.md

---------

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

[ROCm/composable_kernel commit: e9f0cc83a8]
2025-10-13 12:30:28 +02:00
assistant-librarian[bot]
23394088f7 Merge commit '95bdc7410c99096652618759ff2ef3586951a0d0' into develop 2025-10-13 07:13:38 +00:00
Yi DING
2d6547b18c [CK_TILE] FMHA BWD Add Instance for D48 on GFX950 (#2866)
Co-authored-by: asleepzzz <hanwen.chang@amd.com>

[ROCm/composable_kernel commit: 95bdc7410c]
2025-10-13 15:03:46 +08:00
assistant-librarian[bot]
ef06eef341 Merge commit 'f5708882a3c0f391b7d02f5af926964170bd8f4e' into develop 2025-10-11 13:14:03 +00:00
Christopher Millette
31f0642364 Streamk functional tests (#2974)
* Add initial fp16_mem_128x128x32_2x2x1_32x32x16_NonPersistent test suite

* Account for stride when computing K offsets for A and B tensor

This change ensures that the correct stride is used when computing the K
offsets into the A and B tensors in the Stream-K Kernel's operator()
function. This ensures that the kernel executes correct regardless of
whether A and B are row or column major.

* Move helper code to test_gemm_streamk_util.hpp

* Separate tests into smoke/regression/extended. Add bf16 datatype

* Run clang-format

* Refactor combinatorial macro expansion and naming

* Adjust the initialization values to account for better tolerance on bf16

* Correct BF16 datatypes in comments

* Move the extended tests under the REGRESSION_TESTS label

* Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Emily Martins <emily.martins@amd.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

[ROCm/composable_kernel commit: f5708882a3]
2025-10-11 07:53:40 -05:00
assistant-librarian[bot]
ccea5c423a Merge commit '0843815db7763cf5650f7803185a3ab9d24194d7' into develop 2025-10-11 02:35:26 +00:00
John Shumway
b1acf5cbf5 Fix GCC 7 CTAD compilation error in test_fmha_bwd.cpp (#3001)
Fixes compilation error on SLES15 with GCC 7 for gfx942 builds:

error: 'vector' may not intend to support class template argument deduction [-Werror,-Wctad-maybe-unsupported]

Changes:

- Explicitly specify template argument for `std::vector<mode_enum>` instead of relying on C++17 CTAD
- Maintains compatibility with both older (GCC 7) and newer compilers

[ROCm/composable_kernel commit: 0843815db7]
2025-10-10 19:13:34 -07:00
assistant-librarian[bot]
702281d223 Merge commit '3c39d279ab4569d1b33399e7746465744ed662c0' into develop 2025-10-10 23:11:29 +00:00
Khushbu Agarwal
99902d395c supporting prefill shapes for preshuffle block scale gemm (#2975)
* debugging

* debugging for prefill shapes

* comment unused code

* fix for prefill shapes

* clearing up the code

* add int4 to universal gemm example

* clang formatted

* adding test for prefill shapes in block scale gemm

* lil improv on the block pipeline

* Address Review Comment

---------

Co-authored-by: ThomasNing <thomas.ning@amd.com>

[ROCm/composable_kernel commit: 3c39d279ab]
2025-10-10 15:36:24 -07:00
assistant-librarian[bot]
07d14c9618 Merge commit '9d060d3e3c7c943a6609a95e11ff48c35b30edef' into develop 2025-10-10 20:21:35 +00:00
Max Podkorytov
5e588dba5c [CK-Tile] functional support for transposed inputs in compute-bound double-lds-buffer pipeline with async loads from global memory to LDS (#2984)
* reuse local prefetch logic from compute v4 pipeline

add single-tile test

explicit lambda capture

reuse lds block descriptors from base policy for the transposed case

match the test case kernel configuration with compute v4

* add comments

[ROCm/composable_kernel commit: 9d060d3e3c]
2025-10-10 12:57:50 -07:00
assistant-librarian[bot]
96578b8d43 Merge commit 'fada1a3cae190aa6c1568b44eac7d6b2d4e33740' into develop 2025-10-10 08:15:20 +00:00
yinglu
c1780cfebe Conv:TF32: add more instances - 2 (#2879)
* add instances of device_grouped_conv_fwd_xdl_f32_comp_instances
* add instances of device_grouped_conv_fwd_xdl_f32_tf32_mem_instances
* add instances of device_grouped_conv_fwd_xdl_large_tensor_f32_tf32_instances
* tf32:conv:add instances for base class DeviceConvFwd
* tf32:conv:add instances for base class DeviceGroupedConvBwdDataMultipleD
* tf32:conv:add instances for base class DeviceGroupedConvBwdWeight
* add tf32 in profiler
* remove gnhwc/ngchw/ngcdhw instances
* remove non-ndhwgc/nhwgc/nhwc instances
* add check in IsSupportedArgument()

[ROCm/composable_kernel commit: fada1a3cae]
2025-10-10 15:28:17 +08:00
Bartłomiej Kocot
c7f3bcc81e Fix splitK for grouped conv bwd data (#2991)
[ROCm/composable_kernel commit: ad7a215aba]
2025-10-10 09:24:21 +02:00
assistant-librarian[bot]
f6db0f34b6 Merge commit 'b6036bc76a5ce55ef85b7f8578ae81c990f5932d' into develop 2025-10-10 04:13:14 +00:00
Yi DING
249105f297 [CK_TILE] FMHA Tests Enhancement (#2945)
* fmha-gtest-wip

* Thanks Copilot!

[ROCm/composable_kernel commit: b6036bc76a]
2025-10-10 11:34:47 +08:00
assistant-librarian[bot]
c1c15f6645 Merge commit 'fb66b4f5e4b5b178e3eee04189224e139e939c0c' into develop 2025-10-09 15:31:27 +00:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟
c81483b230 [CK_TILE] fix pk_fp4 compilation for non-gfx950 GPUs (#2983)
See build error log from
https://github.com/ROCm/composable_kernel/issues/2271#issuecomment-3150218542

This PR make vector element access constexpr-safe by avoiding operator[] on
ext_vector_type(2) and replace those sites in the pk_fp4 conversions so they
can be used in constant expressions, as The operator[] on ext_vector_type(2)
isn't allowed in constant expressions, which caused "constexpr function never
produces a constant expression" with a note at x[0]. Using `bit_cast` to a
trivial array representation keeps it constexpr-compatible.

Signed-off-by: Hollow Man <hollowman@opensuse.org>

[ROCm/composable_kernel commit: fb66b4f5e4]
2025-10-09 07:43:41 -07:00
Yashvardhan Agarwal
8ae2be1027 [CK_TILE] Pooling FWD (Lwpck 3683) (#2956)
* Pooling 2D/3D with refernce

* Tests & cleanup

- added test for ppoling
- cleanup
- removed 2d example

* Comment resolution

- README added
- example target name rectified
- appropriate arg description and comments added

* clang-format

* appropriate blocksize calc

* modifications for future indexing addition

- instead of transforming views we now transform the descriptors, so
that the same descriptor can be re-used for index tensor in the future

* some basic fixes

* comment resolutions

* comment resolutions

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: 7b6451b68e]
2025-10-09 16:13:26 +02:00
assistant-librarian[bot]
64603db299 Merge commit '9d4bfe393276317b5c1f9dda990eb0bd6c1ec3e7' into develop 2025-10-09 07:12:56 +00:00
Sami Remes
e7ef841a68 Add KBatch support for gemm_ab_scale (#2740)
* Add KBatch support for gemm_ab_scale

* Revert kernel parameters change

* Remove printing

* fix formatting

* fix check

* Use {} in if

---------

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

[ROCm/composable_kernel commit: 9d4bfe3932]
2025-10-09 08:33:16 +02:00
assistant-librarian[bot]
8620b69b7a Merge commit 'e99356dabce7c391423567297b934fae683e2c66' into develop 2025-10-09 00:33:38 +00:00
Aviral Goel
e9ade69185 Add Memory pipeline for AQuant Block Scale GEMM (#2987)
* WIP: add memory pipeline boiler plate code that compiles and works for one block

* WIP: tail handling works for memory pipeline

* WIP: numerical errors appears to have gone by adding block_sync_lds()

* fix: numerical error with memory pipeline by adding block_sync_lds() and new tail handler

* refactror: remove debug print statements and lints

* fix: remove redundant sync barriars

* chore: remove lint

* fix: remove unused code from tile handler and remove redundant block_sync_lds()

* fix: correct parent struct name for memory pipeline

* fix: remove static assert check from parent struct and add it to child struct because not all child structs needs to static assert

* fix: defer block sync lds to just before prefill

[ROCm/composable_kernel commit: e99356dabc]
2025-10-08 17:22:30 -07:00
assistant-librarian[bot]
0737a185c2 Merge commit 'e29151b53321697fec4fc028cc91bc976086655c' into develop 2025-10-08 23:11:38 +00:00
JC
5683f584a6 [CI] Enable ccache w/ namespace for external use (#2988)
* Enable ccache w/ namespace for external use

* Add TheRock parent directory to log path

* Fix typo for TheRock

[ROCm/composable_kernel commit: e29151b533]
2025-10-08 16:03:22 -07:00
assistant-librarian[bot]
a00a3bb2b5 Merge commit '0a4c45b4d3d4dff423c0777f5883d3067c65da20' into develop 2025-10-08 22:11:29 +00:00
andrew clark
433b969e7d CI Skip and Status Checks Fix (#2952)
* Update Jenkinsfile

Adding logic to skip CI checks when a commit contains changes to non-relevant files like docs, .md, licenses, and .github workflow files.

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

Testing skip env var

* Update Jenkinsfile

Fixing syntax

* Update Jenkinsfile

Simplifying CI check logic

* Update Jenkinsfile

Testing skipping logic on stages.

* Update Jenkinsfile

Removing post block. The status for skipped stages are already reported.

* Testing Docs

Testing modifications to files in the docs folder do not trigger a the build and test stages.

* Testing Multifile Trigger

Removed Jenkinsfile from the skip patterns. Reversed change to docs file. This test should not skip CI checks.

* Clean code

Renamed setup stage to be more descriptive.
Added pipeline env variable for consistency.
Moved performance test results stage conditional up a level so the parent stage appropriate reports the status if it is skipped.

* Fixing syntax error

* Updated CRON Flags

Added the FORCE_CI flag to the CRON instructions. This will ensure CI does not skip the job.

* Updating logging

Making logs more explicit.

* Comment update

Cleaning comments.

* Update Jenkinsfile

Reverting performance reports when condition.

* Parallel Test

Testing stage status with parallel stages

* Update Jenkinsfile

* Update Jenkinsfile

Removing stages for quick testing

* Update Jenkinsfile

* Testing skipped parallel stages

Testing the addition of a coordination stage to always pass and give an update to skipped parent stages with parallel sub-stages.

* Testing parallel stages

Adding coordination stage to test if parent check status is  correctly updated.

* Simplified performance results stage

Removed parent stage as there are no other parallel stages to execute (yet).

* Testing final clean up stage

* Testing check status update

Testing - forcing status to update after a stage skip.

* Testing results stage skip

* Removing test stage

* Testing pipeline

* Testing post status updates

* Process Test Results Post Event Update

The stage will report success when it skips or is successful.

* Testing non-relevant file change

This should skip build and test in CI

* Reverting test

updating regex file patterns to use strings instead of regex literal syntax.

* Fixing file matching regex

* Testing docs modification

* Fixing default env var value

* Correcting env var assignment

* Pipeline test

Updating docs file. Should skip ci.

* Testing Pipeline

Setting default run ci state.

* Adding debugging

* Removing debugging

* Pipeline test

Should skip pipeline

* Pipeline Test

Mixed files to trigger a CI run

* Adding additional status updates

The parent stage sometimes remains in pending even if the child stage completes when skipped. Added an additional status update for the parent stage.

* Fixing variable name

* Moving stage names

Moved the performance stage names to a single location because they are referenced multiple times. This reduces errors with typos in the future.

* Revert "Moving stage names"

This reverts commit 7cf6743e54.

* Update Jenkinsfile

Handle both truly empty arrays and arrays containing only empty strings.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

[ROCm/composable_kernel commit: 0a4c45b4d3]
2025-10-08 15:48:08 -06:00