Bartłomiej Kocot
dbdf79d541
Add Clamp/Relu bf16/fp16 cast fixes ( #2279 )
...
* Add Clamp/Relu bf16/fp16 fixes
* fix
[ROCm/composable_kernel commit: 6e5acee0f9 ]
2025-06-03 18:31:46 +02:00
Xiaodong Wang
095f9dbb63
Move pragma ahead ( #2231 )
...
[ROCm/composable_kernel commit: 7f9eef40b0 ]
2025-06-03 07:27:51 -07:00
Aviral Goel
2234480b5d
Add 0 as an acceptable arguement for strides in CK GEMM example (Issue 2037) ( #2268 )
...
* add 0 as valid default arguement for strides
* add 0 as valid default arguement for strides
# Conflicts:
# example/01_gemm/common.hpp
[ROCm/composable_kernel commit: 11f6c14e03 ]
2025-06-03 07:26:58 -07:00
Illia Silin
d5d10f8e88
Upgrade to ROCm6.4.1 and use generic targets for gfx1x. ( #2274 )
...
* upgrade to rocm6.4.1 and use gfx1x-generic targets
* add rocm version parsing
* fix the gfx10-3-generic syntax in cmake
[ROCm/composable_kernel commit: b76fdbe47f ]
2025-06-03 07:17:35 -07:00
Khushbu Agarwal
42ace38c07
Rotating buffer PR CI fix ( #2257 )
...
* Revert "Revert "[CK_tile] Add rotating buffer feature for universal gemm (#2200 )" (#2256 )"
This reverts commit 2c31e1e716b20a268cc6ffca4af7cc5ecbe44e3f.
* fix regression
[ROCm/composable_kernel commit: 2e38eb4f1c ]
2025-06-02 10:25:01 -07:00
dependabot[bot]
e858372a9b
Bump rocm-docs-core[api_reference] from 1.19.1 to 1.20.0 in /docs/sphinx ( #2272 )
...
Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core ) from 1.19.1 to 1.20.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.19.1...v1.20.0 )
---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
dependency-version: 1.20.0
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
[ROCm/composable_kernel commit: cffe8fa2a4 ]
2025-06-02 06:44:10 -07:00
valarLip
7028a112d3
extend buffer load for fp16/bf16x16 ( #2270 )
...
* extend buffer load for fp16/bf16x16
* format
[ROCm/composable_kernel commit: 0fdbf6bcd1 ]
2025-06-02 10:29:54 +08:00
Kiefer van Teutem
8f421515c0
Explicitly set the LINKER_LANGUAGE for the gemm_template_instances target to avoid Ninja build config failure. ( #2265 )
...
Co-authored-by: kiefer <kiefer.van.teutem@streamhpc.com >
[ROCm/composable_kernel commit: 2215a9edf0 ]
2025-05-30 13:32:28 -07:00
Illia Silin
3726830d59
Add a daily CI build on GFX950. ( #2261 )
...
* add CI build for gfx950
* make sure gfx950 CI always uses special docker and compiler
* enable codegen tests by default
[ROCm/composable_kernel commit: 654956bb02 ]
2025-05-30 12:50:08 -07:00
Mirza Halilčević
a16458864a
Define CHAR_BIT during hipRTC ( #2264 )
...
* Fix failing codegen tests.
* fix clang format
---------
Co-authored-by: illsilin <Illia.Silin@amd.com >
[ROCm/composable_kernel commit: fbce6c7bb6 ]
2025-05-30 08:23:44 -07:00
dependabot[bot]
40a75214f5
Bump rocm-docs-core[api_reference] from 1.19.0 to 1.19.1 in /docs/sphinx ( #2263 )
...
Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core ) from 1.19.0 to 1.19.1.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.19.0...v1.19.1 )
---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
dependency-version: 1.19.1
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
[ROCm/composable_kernel commit: 61e6c382c6 ]
2025-05-30 05:56:59 -07:00
slippedJim
b70b4c5caf
remove restriction of group mode hd192 no lse ( #2252 )
...
Co-authored-by: Jim <jimguo12@amd.com >
[ROCm/composable_kernel commit: 57f497452a ]
2025-05-30 10:14:21 +08:00
Illia Silin
3eaca9f232
Revert "add CShuffleM/NXdlPerWavePerShuffle in cshuffle_epilogue ( #2185 )" ( #2260 )
...
This reverts commit cdec424edde58fe081fb7f63fa8e247f3975b8b7.
[ROCm/composable_kernel commit: 4e561af18c ]
2025-05-29 16:22:16 -07:00
Paul Fultz II
348bfece9f
Export codegen targets ( #2259 )
...
[ROCm/composable_kernel commit: 306f4c537e ]
2025-05-29 11:03:51 -07:00
joyeamd
c1f3d81e76
add CShuffleM/NXdlPerWavePerShuffle in cshuffle_epilogue ( #2185 )
...
* add cshuffle's mxdlperwavepershuffle support, not finished
* add epilogue functions
* add cshuffle's mxdlperwavepershuffle support, not finished
* add epilogue functions
* update cshuffle logic
* update cshuffle_logics
* add some change within review
* update some codes following the code review
* update epilogue logic
* remove from problem
* update codes following review.
* fix some issues
[ROCm/composable_kernel commit: fd6a859b44 ]
2025-05-29 14:31:14 +02:00
Po Yen Chen
a7e36b2781
[CK_TILE] FMHA forward batch_prefill optimization for low CU utilization ( #2251 )
...
* Add constraint on traits/tile/pipeline
* Use kM0=128 if max_seqlen_q == 8192
* Re-format codegen script
* Remove redundant attr name postix
* Fix import error: default field in dataclass
* Use kK0=64 & kK1=64 to hide latency
* Use CU utilization to decide tile size
[ROCm/composable_kernel commit: 28cd0dffc9 ]
2025-05-29 18:36:33 +09:00
Bartłomiej Kocot
f1ec9e0be5
Change relu to clamp for grouped conv fwd instances ( #2249 )
...
[ROCm/composable_kernel commit: e7906dd644 ]
2025-05-29 00:51:25 +02:00
Adam Dickin
fe30e881d6
Changes to allow MIOpen to build CK as part of its build. ( #2247 )
...
* tweaks to the miopen specific build. add way to skip clang-tidy checks and a way to skip some custom build targets MIOpen also has.
* move the tidy if statment
---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
[ROCm/composable_kernel commit: 6df1c56ad6 ]
2025-05-28 13:51:15 -07:00
BrianHarrisonAMD
67c4eb2e99
Add option to disable offload compress for CK builds ( #2250 )
...
* Add option to disable offload compress for CK builds
* Remove gemm exe offload compress flag conditional
[ROCm/composable_kernel commit: e91be7d96a ]
2025-05-28 13:47:56 -07:00
Casey-Shi
b7c31ca612
change from ninja to make ( #2253 )
...
[ROCm/composable_kernel commit: 29574f05f7 ]
2025-05-28 09:25:05 -07:00
Illia Silin
fa9625d940
Revert "[CK_tile] Add rotating buffer feature for universal gemm ( #2200 )" ( #2256 )
...
This reverts commit b021b5f1d3ae599305e0b455035a6e01ad81fe23.
[ROCm/composable_kernel commit: bbdaf79a52 ]
2025-05-28 09:46:52 -06:00
Casey-Shi
83c018bb92
fix type hint ( #2254 )
...
[ROCm/composable_kernel commit: 4286eae09a ]
2025-05-28 08:43:58 -07:00
Sami Remes
c7466cc9e7
Remove extra if from CMakeLists.txt of gemm tests ( #2213 )
...
[ROCm/composable_kernel commit: 9bd01b624e ]
2025-05-28 15:25:09 +02:00
Khushbu Agarwal
2ca6f22fab
[CK_tile] Add rotating buffer feature for universal gemm ( #2200 )
...
* Add rotating buffer feature for universal gemm
* adding changes in tile_engine
* Updated code to merge kernel_launch
* removing comments
* Enable rotating buffer changes to flatmm
* Created diff launch_kernel function for rotating buffer
* Simplfied calculation using macros
* merge code with new changes in tile_engine
* clang formatted
* Redefine macros
[ROCm/composable_kernel commit: 99857e10e6 ]
2025-05-27 23:00:58 -07:00
Aviral Goel
bfdffc4704
Add catch blocks in example GEMM apps to enable better error handling (Issue: 1928) ( #2234 )
...
* added catch statements to examples
* clang format
[ROCm/composable_kernel commit: c52649ad57 ]
2025-05-27 22:32:42 -07:00
dependabot[bot]
40acbcfc60
Bump rocm-docs-core[api_reference] from 1.18.4 to 1.19.0 in /docs/sphinx ( #2237 )
...
Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core ) from 1.18.4 to 1.19.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.18.4...v1.19.0 )
---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
dependency-version: 1.19.0
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
[ROCm/composable_kernel commit: 132bd5b874 ]
2025-05-27 06:53:12 -07:00
Casey-Shi
3bcbdd608e
[Tile Engine] Add benchmark for tile engine gemm. ( #2193 )
...
* initial commit -m benchmark
* only support profile
* fix
* fix doc
* add default config
* add ci
* fix cmake
* tmp save for gen blobs
* fix bug
* merge
* range config
* test success
* fix
* fix
* move struct
* remove config property
* fix config
* remove comment
* add cmake option & modify
* add changelog
* fix
* format
* add pydantic module to the docker image
* fix
* add benchmark for cold and warmp up
* python format
* add asm cache control
* fix README
* remove pydantic module
* modify changelog
* fix config
* recover benchmark_gemm and fix
* format python
* refactor profiler
* fix csv bug
* fix codegen bug
* add kernel instance object
* add benchmark gemm executable
* fix jenkins & delete extra header
* disable warning output & enable default config
* Disable sparsity for invalid warp tile combinations
* fix gemm host template func
* refactor gemm profiler
* filter out some inmstances
* default config test & fix codegen bug
* add sparse flag to gen more instances
---------
Co-authored-by: illsilin <Illia.Silin@amd.com >
Co-authored-by: khuagarw <khuagarw@amd.com >
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com >
[ROCm/composable_kernel commit: 128f5a1eab ]
2025-05-26 22:32:36 -07:00
Po Yen Chen
4fc501ba2b
[CK_TILE] For FMHA forward kernels, assign block indices reversely if using mask ( #2209 )
...
* Assign block indices reversely if kHasMask=true
* Assign block indices reversely for splitkv kernel
[ROCm/composable_kernel commit: c42b957d65 ]
2025-05-27 10:58:58 +08:00
Yi DING
b1514f3b27
Add operator/instance filters to ckProfiler ( #2233 )
...
[ROCm/composable_kernel commit: 5727af98d1 ]
2025-05-27 09:51:20 +08:00
Bartłomiej Kocot
d2fe046545
Revert "Remove not needed bwd wei merged groups instances ( #2218 )" ( #2235 )
...
This reverts commit 14d9d42a6ec0e07553b3bdadc0d785a9ab6c2375.
[ROCm/composable_kernel commit: b1ed92b131 ]
2025-05-26 23:26:04 +02:00
Bartłomiej Kocot
525b200a33
Remove not needed bwd wei merged groups instances ( #2218 )
...
* Grouped conv bwd wei add two stage instances for larger filter and Merge Groups
* Fix
* fix
* Revert "Restore oddc instances (#2201 )"
This reverts commit 1590272e3f15dd147b9ff60422ad83b6cec6b2ac.
* fix
---------
Co-authored-by: Rostyslav Geyyer <46627076+geyyer@users.noreply.github.com >
[ROCm/composable_kernel commit: 4583aeffad ]
2025-05-26 22:46:18 +02:00
Bartłomiej Kocot
d14948041b
Fix grid size calc for bwd wei ( #2226 )
...
[ROCm/composable_kernel commit: 037764bbc6 ]
2025-05-26 16:51:09 +02:00
Zzz9990
3991de1cdc
[VLLM V1] Add chunked prefill for FA to pass seq with small seqlen_q ( #2221 )
...
* fix splitkv compiler issue since lse is used to select kernel instances
* bypass seqlen == 1
* add chunked prefill into mha varlen
This reverts commit 5ba7148f7b34b1b438e9806748211c153ee8c433.
* skip compile when receipt 2-4 and add comments
* fix
---------
Co-authored-by: fsx950223 <fsx950223@outlook.com >
[ROCm/composable_kernel commit: ece38b9d7a ]
2025-05-26 19:17:18 +08:00
Illia Silin
9305126446
fix the buffer intrinsic names for clang >=20 ( #2228 )
...
[ROCm/composable_kernel commit: 8146e471f1 ]
2025-05-23 14:58:25 -07:00
Illia Silin
b3be024c8c
Revert "Update the buffer load/store intrinsic names for clang>=20. ( #2192 )" ( #2227 )
...
This reverts commit 9553c67ab25cb25bf4b6e4d359937413e1f7fd6a.
[ROCm/composable_kernel commit: 1b846143c6 ]
2025-05-22 15:41:17 -07:00
Illia Silin
d0602443ec
disable building device_mha_operations by default ( #2225 )
...
[ROCm/composable_kernel commit: bc2551ac3b ]
2025-05-22 14:03:04 -07:00
Adam Dickin
0e7c84be7d
Add MIOPEN_REQ_LIBS_ONLY option for cmake to build only the libs MIOpen requires ( #2224 )
...
* cut out anything we dont need for MIOpen to test
* refactor exclusion code to be more streamlined.
[ROCm/composable_kernel commit: 417a6b65b6 ]
2025-05-22 11:14:33 -07:00
Aviral Goel
bf4e2dc0b3
Refactor tile_window.hpp, tile_window_linear.hpp into a CK Tile Hierarchy ( #2214 )
...
* window_origin variable now in base class
* abstracted more functions
* consolidated tile_window_static_distribution and tile_window_static_lengths
* clang format
* skeleton code for tile_window and tile_window_linear consolidation
* more abstraction
* moved variables from child to parent
* clang format
* removed comments
* removed debug code
* removed debug code
* abstracting traits WIP
* consolidated traits
* removed comments and clang formatted
[ROCm/composable_kernel commit: 534d4594d0 ]
2025-05-21 23:28:00 -07:00
Bartłomiej Kocot
c30af6a16a
Grouped conv bwd wei add for larger filter and Merge Groupes optimization ( #2197 )
...
* Grouped conv bwd wei add two stage instances for larger filter and Merge Groups
* Fix
* fix
* Restore removed instances
---------
Co-authored-by: Rostyslav Geyyer <46627076+geyyer@users.noreply.github.com >
[ROCm/composable_kernel commit: ebc5a6ef87 ]
2025-05-21 22:47:34 +02:00
Aviral Goel
45972932a7
Add Doxygen Documentation for HostTesnor, HostTensorDescriptor, DeviceMem, FillUniformDistribution ( #2160 )
...
* added documentation for HostTensorDescriptor
* added documentation for DeviceMem and FillUniformDistribution
* fixed merging error
* fixed host_tensor_descriptor error
* clang format
[ROCm/composable_kernel commit: fa39c4e798 ]
2025-05-21 10:34:30 -07:00
Aviral Goel
e2aa82930e
added gemm universal example in readme ( #2216 )
...
[ROCm/composable_kernel commit: 990d645578 ]
2025-05-20 15:35:07 -07:00
SamiAario-AMD
e6e42816b4
Fix 11_add_rmsnorm2d_rdquant ( #2207 )
...
[ROCm/composable_kernel commit: 380bca2b85 ]
2025-05-20 15:15:28 -07:00
Thomas Ning
f969f4d798
Add the instances for small sized GEMM in preshuffle and improve CMake Flag ( #2212 )
...
* Add small instance, add the bug fix, & improve the example CMake
* clang format
[ROCm/composable_kernel commit: 1386924749 ]
2025-05-20 15:05:08 -07:00
Sami Remes
5075c8de99
[CK_TILE] Grouped GEMM tile loop ( #2146 )
...
* Add trait to use a persistent kernel and split the entrypoints in grouped gemm
* Some helper functions for persistent kernel case
* Get max occupancy grid using device properties
* Implement tile loop in main entry point to grouped gemm
* Enable GridSize() on device
* Handle offset tile index using real current block index
* Add persistent kernel choice to grouped gemm example
* Use a for-loop for iterating over the group
* Reduce VGPR spills by early-exit
* Enable persistent kernel choice in grouped_gemm example
* Add persistent kernel option to grouped_gemm test
* Fix formatting with remod.py
* Remove GridUpdateBlocks as blocks are now iteratively computed
* Add comment about VGPR spilling
* Fix formatting
* Use CK_TILE_HOST instead of __host__
* Enable all Row/Col combinations in grouped gemm unit test
* Add some KBatch=2 cases to grouped gemm tests
* Fix SplitK for grouped gemm
* Enable pipeline hotloop/tailnumber selection in-kernel for grouped gemm
* Add type traits
* Split examples to regular and tileloop
* Formatting
* Use hipExtStreamGetCUMask to get current active CUs for the given stream
* Align test and example kernel config, and disable validation for splitk repeats
* Remove debug options from CMakeLists.txt
* Separate the code paths for persistent/non-persistent in test
* Fix formatting
* Address review comments
---------
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
[ROCm/composable_kernel commit: d1e6f0982d ]
2025-05-20 17:18:57 +03:00
Aviral Goel
e322ed4ebd
remove debug statements from CMakeLists ( #2204 )
...
[ROCm/composable_kernel commit: c4929225f6 ]
2025-05-19 17:31:04 -07:00
Jan Patrick Lehr
8f1ae91af0
[CMake] Disable newly added compiler warning -Wnrvo ( #2210 )
...
Recently a new warning was added to Clang to warn when no copy-elision
on return happens. That prevents our CK build. This disables the
warning.
[ROCm/composable_kernel commit: 0970f22221 ]
2025-05-19 17:30:15 -07:00
jefyang1
1d9c3ecf4f
Use new mfma instructions for FP8 on gfx950 ( #2202 )
...
* Add logic to use new mfma instructions for fp8 bf8
* Fix example_gemm_xdl_fp8_pk_i4_bpreshuffle_v3 on gfx950 and run clang format
* Update include/ck/tensor_operation/gpu/warp/xdlops_gemm.hpp
Co-authored-by: Andriy Roshchenko <107577548+andriy-ca@users.noreply.github.com >
* Fix intrin_mfma f8 calls due to merge mistake
---------
Co-authored-by: Andriy Roshchenko <107577548+andriy-ca@users.noreply.github.com >
[ROCm/composable_kernel commit: f18170064d ]
2025-05-19 17:29:51 -07:00
Andriy Roshchenko
9128d5e5cc
MX GEMM - Expand MX MFMA Testing to BF8, FP6, and BF6 Data Types ( #2199 )
...
* Unify test interface for different layouts.
* WIP: Introducing FP4/FP6/FP8 abstractions
* WIP: Introducing packed storage abstraction
* WIP: Introducing packed storage abstraction
* WIP: Improved support for FP6 data type
* Refactor packed storage for f6_t
* WIP: FP6 MFMA test
* Test if we correctly represent all FP6/FP4 numbers
* Additional output for failed FP4 test.
* More failing conversion tests
* Even more failing conversion tests
* Working FP6 MFMA tests
* Expand MX MFMA testing to BF8/6
* Update and verify MX MFMA test for packed types
* Fix fp4 and fp6 conversions on host
* Working MX MFMA tests for FP8/6/4
* Cleanup
* Add missing type
* Cleanup
* Final cleanup
* Restrict FP6/4 values output to CK_LOGGING=1
* Use CHAR_BIT instead of number 8
* Fix typo
* Remove FP6 and FP4 from the list of native types
---------
Co-authored-by: Rostyslav Geyyer <rosty.geyyer@amd.com >
[ROCm/composable_kernel commit: 57e0f5df29 ]
2025-05-19 16:52:51 -05:00
jefyang1
3602e0b139
Fix example_grouped_gemm_multiple_d_xdl_fp16 on gfx950 ( #2203 )
...
* Fix example_grouped_gemm_multiple_d_xdl_fp16 on gfx950
* Run clang format
[ROCm/composable_kernel commit: b8b12bb81e ]
2025-05-19 14:25:50 -07:00
Bartłomiej Kocot
531652604d
Restore oddc instances ( #2201 )
...
[ROCm/composable_kernel commit: 6342f6b5e8 ]
2025-05-16 18:42:02 -07:00