Commit Graph

1954 Commits

Author SHA1 Message Date
Khushbu Agarwal
7f9758a186 [CK_Tile] Fix gemm kernel for 4,64,16 and 64,4,16 warp tile sizes (#2262)
* debugging issue

* debugging issue

* debugging

* debugging

* reverting debugging code

* clang formatted

* updating default_config.json

* fix ci failure

* clang formatted

[ROCm/composable_kernel commit: 59a85cb4bc]
2025-06-03 20:16:10 -07:00
Khushbu Agarwal
945a691ce5 Update changelog for Rotating buffer (#2283)
* Updating changelog for Rotating buffer

* address review comment.

[ROCm/composable_kernel commit: 1037b21cfe]
2025-06-03 20:14:22 -07:00
Bartłomiej Kocot
12c18b697e Add Clamp/Relu bf16/fp16 cast fixes (#2279)
* Add Clamp/Relu bf16/fp16 fixes

* fix

[ROCm/composable_kernel commit: 6e5acee0f9]
2025-06-03 18:31:46 +02:00
Xiaodong Wang
1f65826b77 Move pragma ahead (#2231)
[ROCm/composable_kernel commit: 7f9eef40b0]
2025-06-03 07:27:51 -07:00
Aviral Goel
79d6c89f2d Add 0 as an acceptable arguement for strides in CK GEMM example (Issue 2037) (#2268)
* add 0 as valid default arguement for strides

* add 0 as valid default arguement for strides

# Conflicts:
#	example/01_gemm/common.hpp

[ROCm/composable_kernel commit: 11f6c14e03]
2025-06-03 07:26:58 -07:00
Illia Silin
f973621968 Upgrade to ROCm6.4.1 and use generic targets for gfx1x. (#2274)
* upgrade to rocm6.4.1 and use gfx1x-generic targets

* add rocm version parsing

* fix the gfx10-3-generic syntax in cmake

[ROCm/composable_kernel commit: b76fdbe47f]
2025-06-03 07:17:35 -07:00
Khushbu Agarwal
2b6621fba8 Rotating buffer PR CI fix (#2257)
* Revert "Revert "[CK_tile] Add rotating buffer feature for universal gemm (#2200)" (#2256)"

This reverts commit 7baac527a1.

* fix regression

[ROCm/composable_kernel commit: 2e38eb4f1c]
2025-06-02 10:25:01 -07:00
dependabot[bot]
9bd2b80644 Bump rocm-docs-core[api_reference] from 1.19.1 to 1.20.0 in /docs/sphinx (#2272)
Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.19.1 to 1.20.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.19.1...v1.20.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-version: 1.20.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/composable_kernel commit: cffe8fa2a4]
2025-06-02 06:44:10 -07:00
valarLip
671f579903 extend buffer load for fp16/bf16x16 (#2270)
* extend buffer load for fp16/bf16x16

* format

[ROCm/composable_kernel commit: 0fdbf6bcd1]
2025-06-02 10:29:54 +08:00
Kiefer van Teutem
caad9d56fc Explicitly set the LINKER_LANGUAGE for the gemm_template_instances target to avoid Ninja build config failure. (#2265)
Co-authored-by: kiefer <kiefer.van.teutem@streamhpc.com>

[ROCm/composable_kernel commit: 2215a9edf0]
2025-05-30 13:32:28 -07:00
Illia Silin
06143ac1d6 Add a daily CI build on GFX950. (#2261)
* add CI build for gfx950

* make sure gfx950 CI always uses special docker and compiler

* enable codegen tests by default

[ROCm/composable_kernel commit: 654956bb02]
2025-05-30 12:50:08 -07:00
Mirza Halilčević
fff9030340 Define CHAR_BIT during hipRTC (#2264)
* Fix failing codegen tests.

* fix clang format

---------

Co-authored-by: illsilin <Illia.Silin@amd.com>

[ROCm/composable_kernel commit: fbce6c7bb6]
2025-05-30 08:23:44 -07:00
dependabot[bot]
f1cf8b77ad Bump rocm-docs-core[api_reference] from 1.19.0 to 1.19.1 in /docs/sphinx (#2263)
Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.19.0 to 1.19.1.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.19.0...v1.19.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-version: 1.19.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/composable_kernel commit: 61e6c382c6]
2025-05-30 05:56:59 -07:00
slippedJim
ae9edce7ad remove restriction of group mode hd192 no lse (#2252)
Co-authored-by: Jim <jimguo12@amd.com>

[ROCm/composable_kernel commit: 57f497452a]
2025-05-30 10:14:21 +08:00
Illia Silin
27e480ff4c Revert "add CShuffleM/NXdlPerWavePerShuffle in cshuffle_epilogue (#2185)" (#2260)
This reverts commit aef91ee587.

[ROCm/composable_kernel commit: 4e561af18c]
2025-05-29 16:22:16 -07:00
Paul Fultz II
6c3371e472 Export codegen targets (#2259)
[ROCm/composable_kernel commit: 306f4c537e]
2025-05-29 11:03:51 -07:00
joyeamd
aef91ee587 add CShuffleM/NXdlPerWavePerShuffle in cshuffle_epilogue (#2185)
* add cshuffle's mxdlperwavepershuffle support, not finished

* add epilogue functions

* add cshuffle's mxdlperwavepershuffle support, not finished

* add epilogue functions

* update cshuffle logic

* update cshuffle_logics

* add some change within review

* update some codes following the code review

* update epilogue logic

* remove from problem

* update codes following review.

* fix some issues

[ROCm/composable_kernel commit: fd6a859b44]
2025-05-29 14:31:14 +02:00
Po Yen Chen
144377ae38 [CK_TILE] FMHA forward batch_prefill optimization for low CU utilization (#2251)
* Add constraint on traits/tile/pipeline

* Use kM0=128 if max_seqlen_q == 8192

* Re-format codegen script

* Remove redundant attr name postix

* Fix import error: default field in dataclass

* Use kK0=64 & kK1=64 to hide latency

* Use CU utilization to decide tile size

[ROCm/composable_kernel commit: 28cd0dffc9]
2025-05-29 18:36:33 +09:00
Bartłomiej Kocot
e9723a3341 Change relu to clamp for grouped conv fwd instances (#2249)
[ROCm/composable_kernel commit: e7906dd644]
2025-05-29 00:51:25 +02:00
Adam Dickin
d9e29b81a7 Changes to allow MIOpen to build CK as part of its build. (#2247)
* tweaks to the miopen specific build.  add way to skip clang-tidy checks and a way to skip some custom build targets MIOpen also has.

* move the tidy if statment

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: 6df1c56ad6]
2025-05-28 13:51:15 -07:00
BrianHarrisonAMD
e12b6b41d5 Add option to disable offload compress for CK builds (#2250)
* Add option to disable offload compress for CK builds

* Remove gemm exe offload compress flag conditional

[ROCm/composable_kernel commit: e91be7d96a]
2025-05-28 13:47:56 -07:00
Casey-Shi
105a5aedad change from ninja to make (#2253)
[ROCm/composable_kernel commit: 29574f05f7]
2025-05-28 09:25:05 -07:00
Illia Silin
7baac527a1 Revert "[CK_tile] Add rotating buffer feature for universal gemm (#2200)" (#2256)
This reverts commit 0f77aa335d.

[ROCm/composable_kernel commit: bbdaf79a52]
2025-05-28 09:46:52 -06:00
Casey-Shi
f2fe8424ce fix type hint (#2254)
[ROCm/composable_kernel commit: 4286eae09a]
2025-05-28 08:43:58 -07:00
Sami Remes
ab12990991 Remove extra if from CMakeLists.txt of gemm tests (#2213)
[ROCm/composable_kernel commit: 9bd01b624e]
2025-05-28 15:25:09 +02:00
Khushbu Agarwal
0f77aa335d [CK_tile] Add rotating buffer feature for universal gemm (#2200)
* Add rotating buffer feature for universal gemm

* adding changes in tile_engine

* Updated code to merge kernel_launch

* removing comments

* Enable rotating buffer changes to flatmm

* Created diff launch_kernel function for rotating buffer

* Simplfied calculation using macros

* merge code with new changes in tile_engine

* clang formatted

* Redefine macros

[ROCm/composable_kernel commit: 99857e10e6]
2025-05-27 23:00:58 -07:00
Aviral Goel
9c99bdede7 Add catch blocks in example GEMM apps to enable better error handling (Issue: 1928) (#2234)
* added catch statements to examples

* clang format

[ROCm/composable_kernel commit: c52649ad57]
2025-05-27 22:32:42 -07:00
dependabot[bot]
de94750cf7 Bump rocm-docs-core[api_reference] from 1.18.4 to 1.19.0 in /docs/sphinx (#2237)
Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.18.4 to 1.19.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.18.4...v1.19.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-version: 1.19.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/composable_kernel commit: 132bd5b874]
2025-05-27 06:53:12 -07:00
Casey-Shi
64b17847fa [Tile Engine] Add benchmark for tile engine gemm. (#2193)
* initial commit -m benchmark

* only support profile

* fix

* fix doc

* add default config

* add ci

* fix cmake

* tmp save for gen blobs

* fix bug

* merge

* range config

* test success

* fix

* fix

* move struct

* remove config property

* fix config

* remove comment

* add cmake option & modify

* add changelog

* fix

* format

* add pydantic module to the docker image

* fix

* add benchmark for cold and warmp up

* python format

* add asm cache control

* fix README

* remove pydantic module

* modify changelog

* fix config

* recover benchmark_gemm and fix

* format python

* refactor profiler

* fix csv bug

* fix codegen bug

* add kernel instance object

* add benchmark gemm executable

* fix jenkins & delete extra header

* disable warning output & enable default config

* Disable sparsity for invalid warp tile combinations

* fix gemm host template func

* refactor gemm profiler

* filter out some inmstances

* default config test & fix codegen bug

* add sparse flag to gen more instances

---------

Co-authored-by: illsilin <Illia.Silin@amd.com>
Co-authored-by: khuagarw <khuagarw@amd.com>
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>

[ROCm/composable_kernel commit: 128f5a1eab]
2025-05-26 22:32:36 -07:00
Po Yen Chen
7ce58f6b46 [CK_TILE] For FMHA forward kernels, assign block indices reversely if using mask (#2209)
* Assign block indices reversely if kHasMask=true

* Assign block indices reversely for splitkv kernel

[ROCm/composable_kernel commit: c42b957d65]
2025-05-27 10:58:58 +08:00
Yi DING
f21bf5cf8d Add operator/instance filters to ckProfiler (#2233)
[ROCm/composable_kernel commit: 5727af98d1]
2025-05-27 09:51:20 +08:00
Bartłomiej Kocot
a7466d97f0 Revert "Remove not needed bwd wei merged groups instances (#2218)" (#2235)
This reverts commit 434aaaa478.

[ROCm/composable_kernel commit: b1ed92b131]
2025-05-26 23:26:04 +02:00
Bartłomiej Kocot
434aaaa478 Remove not needed bwd wei merged groups instances (#2218)
* Grouped conv bwd wei add two stage instances for larger filter and Merge Groups

* Fix

* fix

* Revert "Restore oddc instances (#2201)"

This reverts commit f4aaf5e9e3.

* fix

---------

Co-authored-by: Rostyslav Geyyer <46627076+geyyer@users.noreply.github.com>

[ROCm/composable_kernel commit: 4583aeffad]
2025-05-26 22:46:18 +02:00
Bartłomiej Kocot
52fb9c990f Fix grid size calc for bwd wei (#2226)
[ROCm/composable_kernel commit: 037764bbc6]
2025-05-26 16:51:09 +02:00
Zzz9990
6538aae676 [VLLM V1] Add chunked prefill for FA to pass seq with small seqlen_q (#2221)
* fix splitkv compiler issue since lse is used to select kernel instances

* bypass seqlen == 1

* add chunked prefill into mha varlen

This reverts commit 5ba7148f7b34b1b438e9806748211c153ee8c433.

* skip compile when receipt 2-4 and add comments

* fix

---------

Co-authored-by: fsx950223 <fsx950223@outlook.com>

[ROCm/composable_kernel commit: ece38b9d7a]
2025-05-26 19:17:18 +08:00
Illia Silin
24057b3662 fix the buffer intrinsic names for clang >=20 (#2228)
[ROCm/composable_kernel commit: 8146e471f1]
2025-05-23 14:58:25 -07:00
Illia Silin
9c7b0a65f9 Revert "Update the buffer load/store intrinsic names for clang>=20. (#2192)" (#2227)
This reverts commit 7d92e48278.

[ROCm/composable_kernel commit: 1b846143c6]
2025-05-22 15:41:17 -07:00
Illia Silin
1d71dd84c3 disable building device_mha_operations by default (#2225)
[ROCm/composable_kernel commit: bc2551ac3b]
2025-05-22 14:03:04 -07:00
Adam Dickin
a7ce23c1dc Add MIOPEN_REQ_LIBS_ONLY option for cmake to build only the libs MIOpen requires (#2224)
* cut out anything we dont need for MIOpen to test

* refactor exclusion code to be more streamlined.

[ROCm/composable_kernel commit: 417a6b65b6]
2025-05-22 11:14:33 -07:00
Aviral Goel
7d47d71bc3 Refactor tile_window.hpp, tile_window_linear.hpp into a CK Tile Hierarchy (#2214)
* window_origin variable now in base class

* abstracted more functions

* consolidated tile_window_static_distribution and tile_window_static_lengths

* clang format

* skeleton code for tile_window and tile_window_linear consolidation

* more abstraction

* moved variables from child to parent

* clang format

* removed comments

* removed debug code

* removed debug code

* abstracting traits WIP

* consolidated traits

* removed comments and clang formatted

[ROCm/composable_kernel commit: 534d4594d0]
2025-05-21 23:28:00 -07:00
Bartłomiej Kocot
598bf07121 Grouped conv bwd wei add for larger filter and Merge Groupes optimization (#2197)
* Grouped conv bwd wei add two stage instances for larger filter and Merge Groups

* Fix

* fix

* Restore removed instances

---------

Co-authored-by: Rostyslav Geyyer <46627076+geyyer@users.noreply.github.com>

[ROCm/composable_kernel commit: ebc5a6ef87]
2025-05-21 22:47:34 +02:00
Aviral Goel
0d99d4dd29 Add Doxygen Documentation for HostTesnor, HostTensorDescriptor, DeviceMem, FillUniformDistribution (#2160)
* added documentation for HostTensorDescriptor

* added documentation for DeviceMem and FillUniformDistribution

* fixed merging error

* fixed host_tensor_descriptor error

* clang format

[ROCm/composable_kernel commit: fa39c4e798]
2025-05-21 10:34:30 -07:00
Aviral Goel
5e2b1f23b9 added gemm universal example in readme (#2216)
[ROCm/composable_kernel commit: 990d645578]
2025-05-20 15:35:07 -07:00
SamiAario-AMD
a8b473a230 Fix 11_add_rmsnorm2d_rdquant (#2207)
[ROCm/composable_kernel commit: 380bca2b85]
2025-05-20 15:15:28 -07:00
Thomas Ning
2315555bc5 Add the instances for small sized GEMM in preshuffle and improve CMake Flag (#2212)
* Add small instance, add the bug fix, & improve the example CMake

* clang format

[ROCm/composable_kernel commit: 1386924749]
2025-05-20 15:05:08 -07:00
Sami Remes
038417bd3d [CK_TILE] Grouped GEMM tile loop (#2146)
* Add trait to use a persistent kernel and split the entrypoints in grouped gemm

* Some helper functions for persistent kernel case

* Get max occupancy grid using device properties

* Implement tile loop in main entry point to grouped gemm

* Enable GridSize() on device

* Handle offset tile index using real current block index

* Add persistent kernel choice to grouped gemm example

* Use a for-loop for iterating over the group

* Reduce VGPR spills by early-exit

* Enable persistent kernel choice in grouped_gemm example

* Add persistent kernel option to grouped_gemm test

* Fix formatting with remod.py

* Remove GridUpdateBlocks as blocks are now iteratively computed

* Add comment about VGPR spilling

* Fix formatting

* Use CK_TILE_HOST instead of __host__

* Enable all Row/Col combinations in grouped gemm unit test

* Add some KBatch=2 cases to grouped gemm tests

* Fix SplitK for grouped gemm

* Enable pipeline hotloop/tailnumber selection in-kernel for grouped gemm

* Add type traits

* Split examples to regular and tileloop

* Formatting

* Use hipExtStreamGetCUMask to get current active CUs for the given stream

* Align test and example kernel config, and disable validation for splitk repeats

* Remove debug options from CMakeLists.txt

* Separate the code paths for persistent/non-persistent in test

* Fix formatting

* Address review comments

---------

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

[ROCm/composable_kernel commit: d1e6f0982d]
2025-05-20 17:18:57 +03:00
Aviral Goel
bb8bb3d5c1 remove debug statements from CMakeLists (#2204)
[ROCm/composable_kernel commit: c4929225f6]
2025-05-19 17:31:04 -07:00
Jan Patrick Lehr
5233fc5efb [CMake] Disable newly added compiler warning -Wnrvo (#2210)
Recently a new warning was added to Clang to warn when no copy-elision
on return happens. That prevents our CK build. This disables the
warning.

[ROCm/composable_kernel commit: 0970f22221]
2025-05-19 17:30:15 -07:00
jefyang1
23a8bed9af Use new mfma instructions for FP8 on gfx950 (#2202)
* Add logic to use new mfma instructions for fp8 bf8

* Fix example_gemm_xdl_fp8_pk_i4_bpreshuffle_v3 on gfx950 and run clang format

* Update include/ck/tensor_operation/gpu/warp/xdlops_gemm.hpp

Co-authored-by: Andriy Roshchenko <107577548+andriy-ca@users.noreply.github.com>

* Fix intrin_mfma f8 calls due to merge mistake

---------

Co-authored-by: Andriy Roshchenko <107577548+andriy-ca@users.noreply.github.com>

[ROCm/composable_kernel commit: f18170064d]
2025-05-19 17:29:51 -07:00
Andriy Roshchenko
8c6eb1c0b8 MX GEMM - Expand MX MFMA Testing to BF8, FP6, and BF6 Data Types (#2199)
* Unify test interface for different layouts.

* WIP: Introducing FP4/FP6/FP8 abstractions

* WIP: Introducing packed storage abstraction

* WIP: Introducing packed storage abstraction

* WIP: Improved support for FP6 data type

* Refactor packed storage for f6_t

* WIP: FP6 MFMA test

* Test if we correctly represent all FP6/FP4 numbers

* Additional output for failed FP4 test.

* More failing conversion tests

* Even more failing conversion tests

* Working FP6 MFMA tests

* Expand MX MFMA testing to BF8/6

* Update and verify MX MFMA test for packed types

* Fix fp4 and fp6 conversions on host

* Working MX MFMA tests for FP8/6/4

* Cleanup

* Add missing type

* Cleanup

* Final cleanup

* Restrict FP6/4 values output to CK_LOGGING=1

* Use CHAR_BIT instead of number 8

* Fix typo

* Remove FP6 and FP4 from the list of native types

---------

Co-authored-by: Rostyslav Geyyer <rosty.geyyer@amd.com>

[ROCm/composable_kernel commit: 57e0f5df29]
2025-05-19 16:52:51 -05:00