Commit Graph

2706 Commits

Author SHA1 Message Date
John Shumway
fc586d2de6 [CK_BUILDER] Update the testing documentation (#3312)
* [CKBuilder] Update the testing documentation

Now that we have clear sets for smoke tests and regression test, we rearange the CMakeLists.txt file to be organized and have description and instructional comments.

Move all the test targets that compile quickly into the smoke test suite.

Update the builder README.md to reflect this new test organization and functionality.

* Update experimental/builder/README.md

Clarify integration tests description from review comment.

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Correct README.md

The regression tests here still run very fast like smoke tests, but can take minutes or even tens of minutes to compile.  We test most of the builder functionality without compiling heavily-templated kernel code, but these regression tests do an expensive full build of the CK kernels.

---------

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

[ROCm/composable_kernel commit: 7873f8fa13]
2025-12-01 13:05:32 -08:00
John Shumway
6f2f67b0b6 [CK_BUILDER] Fix cosmetic problem with conv_description (#3333)
The ConvDescription::detailed command wasn't using TreeFormatter::writeLast correctly, which led to extra lines being drawn in the tree view. It's a simple fix, just a cosmetic improvment out reflection output (ASCII art).

[ROCm/composable_kernel commit: d17994f3df]
2025-12-01 12:45:04 -08:00
John Shumway
f645d827c8 Cleanup convolution description (#3329)
Remove obsolete feature for extracting a description from a builder, since this should apply directly to the instance type. Also add some documentation, including a README.md for reflection.

[ROCm/composable_kernel commit: abd6a4b3fc]
2025-12-01 10:03:58 +01:00
Yi DING
2688602697 [CK_TILE] Disable cast_tile_pk_fp16bf16_fp32 as It Causes Extra spills on Recent Compilers (#3327)
[ROCm/composable_kernel commit: 9ed9539ddf]
2025-12-01 14:48:22 +08:00
Gino Lu
0551d4412e [CK_TILE] Add unit test for fp4 warp gemm (#2817)
This update includes a unit test for warp GEMM

[ROCm/composable_kernel commit: ba6af9fe7c]
2025-12-01 13:56:48 +08:00
Aviral Goel
0861395425 chore(copyright) update library wide CMakeLists.txt copyright header template (#3313)
* chore(copyright) update library wide CMakeLists.txt files copyright header template

* Fix build

---------

Co-authored-by: Sami Remes <samremes@amd.com>

[ROCm/composable_kernel commit: 004784ef98]
2025-11-28 13:49:54 -08:00
Sami Remes
1c73a3d480 [CK_TILE] Fix Quant GEMM build (#3320)
* Fix build

* Fix ck_tile example 38 & 40

---------

Co-authored-by: Yi DING <yi.ding@amd.com>

[ROCm/composable_kernel commit: f981554c39]
2025-11-28 20:33:53 +08:00
msaffari-amd
1055485a38 Add validity checks for MoE FlatMM scatter and enable bf16 hardware atomic-add (#3236)
* Add validity checks for MoE FlatMM scatter and enable bf16 hardware atomic

* correct clang-format

* removed unused rtol_atol variable from example code

* clang format correction

* remove unused varable max_accumulated_value from example

[ROCm/composable_kernel commit: f875ab0bbc]
2025-11-28 09:43:01 +01:00
Cong Ma
f622d546f3 Tile engine for streamk (#3157)
* [CK TILE STREAMK] Introduce initial support for tile engine in streamk GEMM.

- This commit lays the groundwork for integrating the tile engine into streamk GEMM.
  It focuses on creating benchmark executables for streamk GEMM.
- Additional scripts like test_benchmark.sh and gemm_benchmark.py will be added once
  the streamk implementation reaches stability.

* [CK TILE STREAMK] Enable CI to execute tile engine benchmarks for StreamK GEMM

* [CK TILE STREAMK] Refactor: Extract common utility functions.

* [CK TILE STREAMK] Revise tile engine of streamk to align with the updated implementation

* Add pre-commit

* [CK TILE STREAMK] Add 'dp_persistent' and 'reduction_strategy' in output of CK TILE STREAMK

* [CK TILE STREAMK] Fix a bug about value of 'dp_persistent' of CK TILE STREAMK

* [CK TILE STREAMK] Update Jenkinsfile

* [CK TILE Engine] Update StreamK tile engine help message

Remove default value messages as they are automatically printed

* [CK TILE Engine] Update StreamK tile engine

- Remove namespace reboot

* [CK TILE Engine] Update StreamK tile engine

- Fix merge error

[ROCm/composable_kernel commit: 30727c48fc]
2025-11-27 15:49:57 -07:00
arai713
6d28d12b62 [CK_TILE] Move DataTypeTraits into a Common File (#3146)
This renames the typeToStr struct in the common utilities to DataTypeTraits and removes all duplication of DataTypeTraits across files in CK Tile.

Co-authored-by: Christopher Millette <63608002+cgmillette@users.noreply.github.com>

[ROCm/composable_kernel commit: 24d88d2472]
2025-11-27 09:09:54 -08:00
Matthias Gehre
f3e08fa9d6 Add support for gfx1153 (#3306)
[ROCm/composable_kernel commit: 678298d4c7]
2025-11-27 08:48:00 +01:00
Thomas Ning
997a24f3d0 Fix and improve the gemm quant pipeline infrastructure (#3245)
[ROCm/composable_kernel commit: a38aeceb21]
2025-11-26 18:04:27 -08:00
Max Podkorytov
0ce4a61da5 [CK Tile] enable building examples by default (#3259)
* remove EXCLUDE_FROM_ALL from ck-tile examples
-> +15 min build time w/ 64 threads for a single arch

* fix cpp17 compile error in the ck-tile examples

---------

Co-authored-by: khuagarw <khuagarw@amd.com>
Co-authored-by: Ding, Yi <yi.ding@amd.com>

[ROCm/composable_kernel commit: 79aae7c7f7]
2025-11-26 16:24:44 -08:00
andrew clark
544525b61d Automated Perfetto UI Notifications (#3255)
* Testing visualization generation

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

* Adding dummy test data

* Update Jenkinsfile

* Update Jenkinsfile

* Adding notifications

* Testing

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

* Image compression

* Update Jenkinsfile

* Moving capture logic to main Jenkins file

* Testing generation

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

* Fixing curl request

* Update Jenkinsfile

* Clean up

* Fix

* Fixing notification

* Testing message creation

* Adjusting message payload

* Testing notification generation

* Updating main jenkinsfile

* Fixing cleanup call

* Removing test pipeline code

* Comment clean up

* Testing pipeline

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

* Moving archive

Moving trace archive to safe location before source checkout

* Removing test pipeline

* Testing pipeline with unique file names

* Update Jenkinsfile

* Removing test files

Updated main pipeline

[ROCm/composable_kernel commit: 40d7217ac7]
2025-11-26 16:27:27 -07:00
Aviral Goel
ee7a68b10f chore(copyright): update copyright header for include directory (#3293)
[ROCm/composable_kernel commit: de6466481f]
2025-11-26 11:00:05 -07:00
John Shumway
d449d96c98 Fix template parameter macros (#3305)
Some of the device implementation templates have macros like GridwiseGemmMultiABDTemplateParameters that can cause build errors if multiple files are included together. This error comes up with our builder code.

To clean up the macros and make them safer, we follow these follow rules:
* Use more specific names to avoid duplication.
* Undefine the macro after it is used to avoid leaking out of the file scope.
* Use a prefix CK_ on the macro to avoid conflicting with other libraries.
* Use all caps with underscores for preprocessor macro names.

[ROCm/composable_kernel commit: 10a782d846]
2025-11-26 09:48:17 -08:00
Aviral Goel
cb0aaf8e90 fix: add dynamic selection of pipelines for aquant mode (#3282)
- Add conditional selection to use v3 pipeline when PreshuffleQuant is true
- Add static assertion in memory pipeline to prevent PreshuffleQuant usage
- Restore BaseBQuantGemmPipelineAgBgCrCompV3 for BQuant cases
- Update BaseGemmPipeline selection to handle all quant modes properly

[ROCm/composable_kernel commit: 35a4b26af0]
2025-11-26 10:58:09 +04:00
Yi DING
631655adb1 [CK_TILE] Refine warp_gemm_attribute_mfma (#3272)
[ROCm/composable_kernel commit: 8fa90025d0]
2025-11-26 10:57:15 +08:00
Yi DING
e303358608 [CK_TILE] Fix Compilation of Flatmm Examples (#3285)
[ROCm/composable_kernel commit: c7dce2ac29]
2025-11-26 10:11:43 +08:00
Illia Silin
250fd6be12 Enable ck_builder in CI. (#3296)
* build and run ck_builder tests

* add test_ckb_all to targets

* fix syntax

* fix test path

* Update CMake targets for builder testing in CI (#3290)

Our existing CMake only had build targets. Update CMakeLists.txt to have CTEST targets:
* smoke-builder
* regression-builder
* check-builder

Co-authored-by: John Shumway <jshumway@amd.com>

* use check-builder target

* get rid of test_ckb_all target

* call ninja check-builder separately

---------

Co-authored-by: John Shumway <jshumway@amd.com>

[ROCm/composable_kernel commit: a54f7b1138]
2025-11-25 17:45:59 -08:00
Aviral Goel
9f94579e5c chore(copyright): update copyright header for experimental & example directory (#3292)
[ROCm/composable_kernel commit: cd47293869]
2025-11-26 03:09:39 +04:00
Bartłomiej Kocot
91cc903d12 [CK TILE] Grouped Conv Explicit Gemm (#3289)
* [CK TILE] Grouped Conv Explicit Gemm

* fixes

* apply builder fixes

[ROCm/composable_kernel commit: 00dfa2f2ce]
2025-11-25 23:28:35 +01:00
Khushbu Agarwal
a6241c62cc [CK-Tile] fix block scale example for gfx1201 (#3283)
[ROCm/composable_kernel commit: 37ea160088]
2025-11-25 13:10:28 -08:00
Bartłomiej Kocot
083ea723a0 [CK_BUILDER] Add grouped conv bwd ck tile traits (#3281)
* [CK_BUILDER] Add grouped conv bwd ck tile traits

* copilot fixes

[ROCm/composable_kernel commit: 9ac2666d5b]
2025-11-25 14:57:43 +01:00
Aviral Goel
804730c0f3 chore(copyright): update copyright header for library directory (#3274)
* chore(copyright): update copyright header  for library directory

* chore(copyright): update copyright header for library directory

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: ab0101c59c]
2025-11-24 18:10:26 -08:00
Aviral Goel
91ffc9dd1e chore(copyright): update copyright header for example directory (#3273)
* chore(copyright): update copyright header for codegen directory

* chore(copyright): update copyright header for example directory

[ROCm/composable_kernel commit: d85f065b15]
2025-11-24 18:02:41 -08:00
rocking
9cb3d700da Fix batch prefill compile fail in aiter (#3279)
* Fix batch prefill aiter compile fail

* Fix compile error

[ROCm/composable_kernel commit: 229d43ea0c]
2025-11-25 09:46:32 +08:00
Thomas Ning
99e6b461db Reorganize of KPack in GEMM (#3247)
* add the reorganize of KPack

* fix the compilation error

* fix the compilation error

[ROCm/composable_kernel commit: de6a9590ab]
2025-11-24 12:38:59 -08:00
Aviral Goel
f65f0820ca chore(copyright): update copyright header for codegen directory (#3266)
[ROCm/composable_kernel commit: e95337c58c]
2025-11-24 10:12:40 -08:00
John Shumway
04f8fa2316 Guard a builder test to avoid gfx11 and gfx12 (#3268)
We're getting a compile error on gfx11 and gfx12 for an I8 test that doesn't have a supported WMMA implmentation. We'll need to build architecture support into the builder, but to get things green I'm just adding an ifndef guard around the test.

[ROCm/composable_kernel commit: 1bc7529977]
2025-11-24 10:10:09 -08:00
Christopher Millette
a049cdebba First look at mfma / wmma unification (#2704)
* First look at mfma / wmma unification

* Refactor

* Re-org file structure

* Restructure transform selection and WaveWiseMma class

* Update license files. Add missing gfx1151 support. Change wave size for HOST to 1. Update datatypes naming consistency

* Fixes default MmaSelector implentation

* Adds unit tests for amdgcn_mma and arch

* Consolidate common arch id checks to constexpr functions. Strongly type ids as amdgcn_target_arch_id object.

* Refactor is_any_value_of

* Fixes mma_selector logic

* Fix typo

* Add mma selector test for tile decomposition

* Fix compilation of mma.hpp

* Revert back to c++17 compatibility

* Fix compiler error by returning index_t from get_warp_size()

* Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Fixes compiler error for missing is_wave32() function

* Fixes compiler error for host wave_size() should be 64

* Fixes compiler errors where __cpp_concepts is not defined

* Fixes compiler errors where __cpp_concepts is not defined

* Fix test failure for host is wave64 by default

---------

Co-authored-by: Chris Millette <you@example.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

[ROCm/composable_kernel commit: b9c6cb1452]
2025-11-24 09:39:59 -08:00
Khushbu Agarwal
84b12586c6 [CK_Tile] Support for preshuffle weight(B) quant tensor for block scale gemm (#3165)
* formatted

* formatted

* formatting

* formatting

* formatting

* [CK TILE GEMM] Refactor block_scale_gemm examples

- Split cpp file to reduce building time
- Support multiple GemmConfig

* [CK TILE GEMM] Refactor block_scale_gemm examples

- Update Readme

* enable prefill shapes

* [CK TILE GEMM] Refactor block_scale_gemm examples

- Add support for rowcol and tensor GEMM operations

* [CK TILE GEMM] Refactor block_scale_gemm examples

- Update README

* adding preshuffle quant as new parameter and its associated new files

* remove debugging statements

* adding test

* enable preshuffle quant with permuteN

* updating readme and correcponding gemmconfigs

* updating cmake file

* fixing CI failures for grouped quant gemm

* addressing review comments

* fixing CI issue

* addressing reveiw comments

* formatting

* formatting

* fixing aquant operator overlaoding

* formatting

---------

Co-authored-by: Cong Ma <congma13@amd.com>
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>

[ROCm/composable_kernel commit: 8111572785]
2025-11-24 07:48:42 -08:00
Illia Silin
585a6a2048 disable CI on gfx1010 by default (#3280)
[ROCm/composable_kernel commit: e857e26bf6]
2025-11-24 07:06:41 -08:00
Qianfeng
8ec85b7617 Fix a bug for qr_ks_vs_async_trload pipeline (#3271)
[ROCm/composable_kernel commit: 81042ea574]
2025-11-24 21:31:48 +08:00
rocking
ca1a0da0c3 Support fp8 dynamic quantization for fmha (#3206)
* Support qscale for dynamic quant, remove static quant

* Support hdim=256

* Remove bias test case for fp8

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: asleepzzz <hanwen.chang@amd.com>

[ROCm/composable_kernel commit: 5948dbffe4]
2025-11-24 16:28:25 +08:00
Johannes Graner
dd7a2d199f [CK Tile] Fix example for conv fwd + bias + clamp (#3235)
* Fix clamp not being applied correctly

* Apply group offsets to D tensors

---------

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

[ROCm/composable_kernel commit: 096f0a3b23]
2025-11-24 07:36:26 +01:00
Aviral Goel
1bec1dd091 chore(copyright): update copyright header for test directory (#3265)
[ROCm/composable_kernel commit: f6c999bddb]
2025-11-22 19:38:27 -05:00
Emily Martins
ede105dd91 Fix CK Tile DP + 2 Tile Stream-K Validation Errors (#3269)
When there are multiple workgroups contributing to a tile, when using
atomics, there may be round off error in cases where the accumulator
type is not the same as the C type. To compute an error tolerance for
test validation, the Stream-K Tile Partitioner has a function called
estimate_num_wgs_per_tile to estimate the number of workgroups per tile.
That said, this function only provides an estimate. In some cases for
DP+2TSK, the function returns 1 rather than the more accurate value of
2.

Thus, this change updates the estimate_num_wgs_per_tile function to
explicitely return the value of 2 in cases for DP+2TSK to ensure that we
have a better error tolerance to avoid test failures due to round-off
error.

[ROCm/composable_kernel commit: 02ab76c2cb]
2025-11-21 20:29:47 -07:00
Illia Silin
6d7d99f91b Enable daily builds on gfx1010 (#3258)
* add build/test on gfx1010

* only build and run on gfx1010 once daily

[ROCm/composable_kernel commit: 21ae743acd]
2025-11-21 07:22:01 -08:00
John Shumway
34c3e1f562 Fix builder errors. (#3260)
There were four errors to fix:
1. The checks for defaulted direction were not implemented in the predicate concept.
2. Had to delete an obsolete and undefined operation enum.
3. A factory was passing a boolean in place of an integer.
4. Some of the factory tests are not compiling correctly when linking in the full source (with CK_EXPERIMENTAL_BUILDER=ON), so I commented them out.

[ROCm/composable_kernel commit: ea6e4fcbbc]
2025-11-21 15:25:45 +01:00
John Shumway
345dbb25f8 Fix copyright messages in experimental/builder. (#3253)
Our copyright were were mostly correct, but we inconsistently used (C) instead of (c) like the rest of the CK code. This PR fixes that (using lowercase c) and adds a missing copyright header to one file.

[ROCm/composable_kernel commit: f38c3de9f9]
2025-11-20 17:40:55 -08:00
Aviral Goel
89e3931da8 chore(copyright): update copyright header for test directory (#3252)
* chore(copyright): update copyright header for test directory

* chore(copyright): update copyright header for test directory

* chore(copyright): update copyright header for client_example directory

* chore(copyright): update copyright header for test directory

[ROCm/composable_kernel commit: c8563f2101]
2025-11-20 20:36:57 -05:00
Aviral Goel
a42ac42d00 chore(copyright): update copyright header for cmake directory (#3254)
[ROCm/composable_kernel commit: a960c9950b]
2025-11-20 20:36:37 -05:00
lalala-sh
391dfbb074 fix static assert (#3178)
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: f58bd56e6b]
2025-11-20 17:27:05 -08:00
yinglu
14d3bfa1eb fix:bf16x3:enable all instances on gfx950 (#3248)
* fix:bf16x3:enable all instances on gfx950

* fix clang-format fail

* fix clang-format fail

* fix:modified wrong params previously

[ROCm/composable_kernel commit: 4155eb24f9]
2025-11-20 17:09:43 -08:00
spolifroni-amd
3e09d4caf2 Spolifroni amd/update changelog 711 (#3211)
* Update CHANGELOG.md with 7.1.1 information

* Update CHANGELOG.md

[ROCm/composable_kernel commit: 938b8ed3bf]
2025-11-20 10:51:18 -08:00
Yi DING
f0702c1636 [CK_TILE] Refine FP32 => FP16/BF16 Conversion (#3215)
* [CK_TILE] Refine FP32 => FP16/BF16 Conversion

* Thank you Copilot

* Rename fix

* Fix example

* Fix accu checking

* Fix

* Fix

[ROCm/composable_kernel commit: 8b284a63a4]
2025-11-20 10:50:26 -08:00
Gavin Zhao
50e7d047f6 Add support for RDNA1 GPUs (#3220)
* Allow compilation for RDNA1 (__gfx101__)

Signed-off-by: Gavin Zhao <git@gzgz.dev>

* More RDNA1 changes

Signed-off-by: Gavin Zhao <git@gzgz.dev>

* Even more RDNA1 changes

Signed-off-by: Gavin Zhao <git@gzgz.dev>

* cmake: skip build quantization for unsupported arches

* add gfx10-1-generic support as well

* add gfx1013 and complete gfx10-1-generic

* fix clang format

* enable DL kernels on gfx101x

---------

Signed-off-by: Gavin Zhao <git@gzgz.dev>
Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: 07314ac543]
2025-11-20 10:45:57 -08:00
Robin Voetter
b558b6632c ck-builder: add remaining ck factory tests (#3223)
Now that the remaining reflection has been implemented, we
can add the remaining factory tests too. This is the complete set
of instances for forward grouped conv currently in CK.

[ROCm/composable_kernel commit: bb155ef678]
2025-11-20 10:42:36 -08:00
Robin Voetter
ea6bc81dd1 ck-builder: group transfer operations per tensor (#3217)
Grouping transfer operations per tensor makes it easier to
constrain on and operate with the transfer operations. As an
example, we can now deduplicate the logic for translating
the transfer operations from the ck-builder interface to the old
ck interface for the A and B tensors.

[ROCm/composable_kernel commit: 245c6011cf]
2025-11-20 10:40:48 -08:00