Commit Graph

3949 Commits

Author SHA1 Message Date
assistant-librarian[bot]
7bd01a9f5f Merge commit '096f0a3b23a49ffaef1e2dbed74bf366e36ad15c' into develop 2025-11-24 07:13:25 +00:00
Johannes Graner
dd7a2d199f [CK Tile] Fix example for conv fwd + bias + clamp (#3235)
* Fix clamp not being applied correctly

* Apply group offsets to D tensors

---------

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

[ROCm/composable_kernel commit: 096f0a3b23]
2025-11-24 07:36:26 +01:00
assistant-librarian[bot]
8abfd83364 Merge commit 'f6c999bddb9e0ae468c7b45bc68cc1410472dcf5' into develop 2025-11-23 00:40:28 +00:00
Aviral Goel
1bec1dd091 chore(copyright): update copyright header for test directory (#3265)
[ROCm/composable_kernel commit: f6c999bddb]
2025-11-22 19:38:27 -05:00
assistant-librarian[bot]
d7685c394a Merge commit '02ab76c2cb47143b82743bcf9d86389c540a608b' into develop 2025-11-22 04:13:58 +00:00
Emily Martins
ede105dd91 Fix CK Tile DP + 2 Tile Stream-K Validation Errors (#3269)
When there are multiple workgroups contributing to a tile, when using
atomics, there may be round off error in cases where the accumulator
type is not the same as the C type. To compute an error tolerance for
test validation, the Stream-K Tile Partitioner has a function called
estimate_num_wgs_per_tile to estimate the number of workgroups per tile.
That said, this function only provides an estimate. In some cases for
DP+2TSK, the function returns 1 rather than the more accurate value of
2.

Thus, this change updates the estimate_num_wgs_per_tile function to
explicitely return the value of 2 in cases for DP+2TSK to ensure that we
have a better error tolerance to avoid test failures due to round-off
error.

[ROCm/composable_kernel commit: 02ab76c2cb]
2025-11-21 20:29:47 -07:00
assistant-librarian[bot]
343e40d0e9 Merge commit '21ae743acd49c79913b3835236c5315983fa83ef' into develop 2025-11-21 16:13:44 +00:00
Illia Silin
6d7d99f91b Enable daily builds on gfx1010 (#3258)
* add build/test on gfx1010

* only build and run on gfx1010 once daily

[ROCm/composable_kernel commit: 21ae743acd]
2025-11-21 07:22:01 -08:00
assistant-librarian[bot]
323c839a2b Merge commit 'ea6e4fcbbc0bd76a562f246f743f5554edc312e4' into develop 2025-11-21 15:12:19 +00:00
John Shumway
34c3e1f562 Fix builder errors. (#3260)
There were four errors to fix:
1. The checks for defaulted direction were not implemented in the predicate concept.
2. Had to delete an obsolete and undefined operation enum.
3. A factory was passing a boolean in place of an integer.
4. Some of the factory tests are not compiling correctly when linking in the full source (with CK_EXPERIMENTAL_BUILDER=ON), so I commented them out.

[ROCm/composable_kernel commit: ea6e4fcbbc]
2025-11-21 15:25:45 +01:00
assistant-librarian[bot]
1829bc6596 Merge commit 'f38c3de9f9047e72429c796fd0445f36eceb142b' into develop 2025-11-21 03:31:42 +00:00
John Shumway
345dbb25f8 Fix copyright messages in experimental/builder. (#3253)
Our copyright were were mostly correct, but we inconsistently used (C) instead of (c) like the rest of the CK code. This PR fixes that (using lowercase c) and adds a missing copyright header to one file.

[ROCm/composable_kernel commit: f38c3de9f9]
2025-11-20 17:40:55 -08:00
assistant-librarian[bot]
967480c146 Merge commit 'c8563f2101d864ed0cc1f68f02763ee4ec6aa59d' into develop 2025-11-21 01:40:40 +00:00
Aviral Goel
89e3931da8 chore(copyright): update copyright header for test directory (#3252)
* chore(copyright): update copyright header for test directory

* chore(copyright): update copyright header for test directory

* chore(copyright): update copyright header for client_example directory

* chore(copyright): update copyright header for test directory

[ROCm/composable_kernel commit: c8563f2101]
2025-11-20 20:36:57 -05:00
Aviral Goel
a42ac42d00 chore(copyright): update copyright header for cmake directory (#3254)
[ROCm/composable_kernel commit: a960c9950b]
2025-11-20 20:36:37 -05:00
lalala-sh
391dfbb074 fix static assert (#3178)
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: f58bd56e6b]
2025-11-20 17:27:05 -08:00
yinglu
14d3bfa1eb fix:bf16x3:enable all instances on gfx950 (#3248)
* fix:bf16x3:enable all instances on gfx950

* fix clang-format fail

* fix clang-format fail

* fix:modified wrong params previously

[ROCm/composable_kernel commit: 4155eb24f9]
2025-11-20 17:09:43 -08:00
assistant-librarian[bot]
084d063087 Merge commit '938b8ed3bf40741176adbc897b66095c5453d15d' into develop 2025-11-20 19:11:49 +00:00
spolifroni-amd
3e09d4caf2 Spolifroni amd/update changelog 711 (#3211)
* Update CHANGELOG.md with 7.1.1 information

* Update CHANGELOG.md

[ROCm/composable_kernel commit: 938b8ed3bf]
2025-11-20 10:51:18 -08:00
Yi DING
f0702c1636 [CK_TILE] Refine FP32 => FP16/BF16 Conversion (#3215)
* [CK_TILE] Refine FP32 => FP16/BF16 Conversion

* Thank you Copilot

* Rename fix

* Fix example

* Fix accu checking

* Fix

* Fix

[ROCm/composable_kernel commit: 8b284a63a4]
2025-11-20 10:50:26 -08:00
Gavin Zhao
50e7d047f6 Add support for RDNA1 GPUs (#3220)
* Allow compilation for RDNA1 (__gfx101__)

Signed-off-by: Gavin Zhao <git@gzgz.dev>

* More RDNA1 changes

Signed-off-by: Gavin Zhao <git@gzgz.dev>

* Even more RDNA1 changes

Signed-off-by: Gavin Zhao <git@gzgz.dev>

* cmake: skip build quantization for unsupported arches

* add gfx10-1-generic support as well

* add gfx1013 and complete gfx10-1-generic

* fix clang format

* enable DL kernels on gfx101x

---------

Signed-off-by: Gavin Zhao <git@gzgz.dev>
Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: 07314ac543]
2025-11-20 10:45:57 -08:00
Robin Voetter
b558b6632c ck-builder: add remaining ck factory tests (#3223)
Now that the remaining reflection has been implemented, we
can add the remaining factory tests too. This is the complete set
of instances for forward grouped conv currently in CK.

[ROCm/composable_kernel commit: bb155ef678]
2025-11-20 10:42:36 -08:00
Robin Voetter
ea6bc81dd1 ck-builder: group transfer operations per tensor (#3217)
Grouping transfer operations per tensor makes it easier to
constrain on and operate with the transfer operations. As an
example, we can now deduplicate the logic for translating
the transfer operations from the ck-builder interface to the old
ck interface for the A and B tensors.

[ROCm/composable_kernel commit: 245c6011cf]
2025-11-20 10:40:48 -08:00
Aviral Goel
63a6703a81 chore(copyright): update copyright header for library directory (#3239)
[ROCm/composable_kernel commit: fb43760c66]
2025-11-20 10:36:05 -08:00
Aviral Goel
0058fb65ff chore(copyright): update copyright header for test directory (#3243)
* chore(copyright): update copyright header for test directory

* chore(copyright): update copyright header for test directory

[ROCm/composable_kernel commit: 7dfc46d73d]
2025-11-20 10:33:34 -08:00
assistant-librarian[bot]
8b93b58bcd Merge commit '2e4b8a8fc455a14ad5cf89f7f750060ff20c40bb' into develop 2025-11-20 17:12:11 +00:00
Emily Martins
2963649b29 [CK_TILE] Remove Old CK Tile Stream-K Artifacts (#3202)
* Remove old CK Tile Stream-K implementation

The original CK Stream-K implementation was based on old CK's Stream-K
block to C tile map. However, this implementation did not align with the
original Stream-K paper. Thus, we implemented a new tile partitioner and
associated Stream-K kernel, which was placed in the reboot namespace.

Now that the new Stream-K implementation is ready, this change removes
all artifacts of the old implementation. Specifically, the following
changes were made:
- Removes old Stream-K tile partitioner from CK Tile
- Removes the reboot namespace such that the new implementation resides
  in the ck_tile namespace only.
- Adds tests for bf8 and fp8 using the new implementation
- Removes tests for the old implementation
- Remove the v2 suffix from the new CK Tile Tile Partitioner
derived classes.
- Updates Stream-K Kernel ops file to use /** commenting style.

* Remove v2 from tile partitioner validation function names

[ROCm/composable_kernel commit: 2e4b8a8fc4]
2025-11-20 09:32:32 -07:00
assistant-librarian[bot]
b2e58aec1a Merge commit '5adaa201eda9337553459bc4321b11695e380832' into develop 2025-11-20 16:14:36 +00:00
asleepzzz
d115b3be4a Revert "Add attn sink (#2892)" (#3250)
This reverts commit cb7f05a8d3.

[ROCm/composable_kernel commit: 5adaa201ed]
2025-11-20 07:55:15 -08:00
assistant-librarian[bot]
d1c35e8426 Merge commit '9fa4e8d5ab0b80855b5aeafb2e7907302c1c004d' into develop 2025-11-20 12:20:10 +00:00
Linjun-AMD
cb7f05a8d3 Add attn sink (#2892)
* enable attn sink

Signed-off-by: JL-underdog <Jun.Lin@amd.com>

* update attn_sink script

Signed-off-by: JL-underdog <Jun.Lin@amd.com>

* fix some error

Signed-off-by: JL-underdog <Jun.Lin@amd.com>

* clang-format

Signed-off-by: JL-underdog <Jun.Lin@amd.com>

* update fmha_bwd mask

Signed-off-by: JL-underdog <Jun.Lin@amd.com>

* update fmha_bwd_kernel'mask

Signed-off-by: JL-underdog <Jun.Lin@amd.com>

* update block_fmha_pipeline_qr_ks_vs.hpp

Signed-off-by: JL-underdog <Jun.Lin@amd.com>

* fix ci error

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* fix format error

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* Update block_fmha_bwd_pipeline_default_policy.hpp

* Update fmha_fwd_runner.hpp

* Update block_fmha_batch_prefill_pipeline_qr_ks_vs_async.hpp

* Update fmha_fwd_runner.hpp

* Update fmha_fwd_runner.hpp

* Update fmha_fwd_runner.hpp

* update splitkv_pipline

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* update splitkv&pagedkv pipeline

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* add sink test

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* update attn_sink result log

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* update smoke_test_fwd_sink.sh

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* update test file

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* update test script

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* Update block_fmha_fwd_splitkv_pipeline_qr_ks_vs.hpp

* use constexpr kHasSink for sink in fmha pipeline

Signed-off-by: Linjun-AMD <Jun.Lin@amd.com>

* update by pre-commit

Signed-off-by: Linjun-AMD <Jun.Lin@amd.com>

* Update include/ck_tile/ops/fmha/pipeline/block_fmha_pipeline_qr_ks_vs.hpp

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update include/ck_tile/ops/fmha/pipeline/block_fmha_pipeline_qr_ks_vs.hpp

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update include/ck_tile/ops/fmha/kernel/fmha_fwd_pagedkv_kernel.hpp

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update fmha_fwd.py

* Update example/ck_tile/01_fmha/codegen/ops/fmha_fwd_splitkv.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update include/ck_tile/ops/fmha/pipeline/block_fmha_fwd_splitkv_pipeline_nwarp_sshuffle_qr_ks_vs.hpp

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Remove causal mask setting logic from mask.hpp

Removed the mask setting logic for causal masks.

* fix ci error that some usage of lamada not support in c++17

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* Update remod.py

* add smoke sink test

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* Update fmha_pagedkv_prefill.py

* Update FmhaFwdPipeline parameters in fmha_fwd.py

* update block_fmha_pipeline_qr_ks_vs_async_trload.hpp

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* fix c++17 unsupprot error

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* Update block_fmha_fwd_pagedkv_pipeline_qr_ks_vs.hpp

* Fix formatting of sink_seq_end assignment

* Fix indentation for sink_seq_end assignment

* Update block_fmha_fwd_pagedkv_pipeline_qr_ks_vs.hpp

---------

Signed-off-by: JL-underdog <Jun.Lin@amd.com>
Signed-off-by: LJ-underdog <Jun.Lin@amd.com>
Signed-off-by: Linjun-AMD <Jun.Lin@amd.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

[ROCm/composable_kernel commit: 9fa4e8d5ab]
2025-11-20 19:24:05 +08:00
assistant-librarian[bot]
fb2d106a2f Merge commit '84540edff312c18ba50c01e995774da37faa0a29' into develop 2025-11-20 05:13:08 +00:00
Illia Silin
c36a154c66 fix typo (#3244)
[ROCm/composable_kernel commit: 84540edff3]
2025-11-19 20:23:09 -08:00
assistant-librarian[bot]
809c1ead72 Merge commit '47e2ed838e3547bba1b48d3f559f20f46fd07b87' into develop 2025-11-20 02:43:03 +00:00
Yi DING
e27e760d5a [CK_TILE] Add Flatmm MX FP8 (#3208)
* Use async for flatmm mxfp4

* Fix preshuffle

* Add flatmm mxfp8

* Thanks, Copilot

* Thanks Copilot again~

[ROCm/composable_kernel commit: 47e2ed838e]
2025-11-20 10:35:15 +08:00
AviralGoelAMD
d9f0bdd5e3 chore(copyright): update copyright header for test directory
[ROCm/composable_kernel commit: 4e49e0228b]
2025-11-19 17:43:28 -07:00
linqunAMD
0739113989 [ck_tile] enable test grouped_gemm_quant and gemm_streamk on gfx12 (#3196)
1. Enable grouped_gemm_quant and gemm_streamk on gfx12
- test_ck_tile_streamk_smoke is kept on gfx9, since it looks someone is still working on it.
2. Update warp tile size in grouped_gemm_quant and gemm_streamk unit test
3. Reduce gemm tile size to pass the build on gfx12 in test_gemm_streamk_reboot_types.hpp

[ROCm/composable_kernel commit: d2e32b4305]
2025-11-20 08:40:27 +08:00
assistant-librarian[bot]
ca48bf3b98 Merge commit 'cd8af997e6d1fde6bc4397bd6ab4fca46510e776' into develop 2025-11-19 21:11:39 +00:00
Michal Kulikowski
dd53cdad01 [CK] s_prefetch unit test fixes.
Signed-off-by: Michal Kulikowski <Michal.Kulikowski@amd.com>


[ROCm/composable_kernel commit: cd8af997e6]
2025-11-19 21:54:50 +01:00
Michal Kulikowski
6c23879329 [CK] Added s_prefetch unit test.
-added s_buffer_load_b32/64 assembly
-added amd_s_buffer_load_impl

Signed-off-by: Michal Kulikowski <Michal.Kulikowski@amd.com>


[ROCm/composable_kernel commit: f3ef7acca0]
2025-11-19 21:54:50 +01:00
kabrahamAMD
2e71ebe0b1 [CK_Builder ]fixed accidental drop of get_elementwise_operation during merge and added usage of get_elementwise_operation() to other builder instances (#3238)
Fixed issues encountered during merge of #3192

* fixed accidental drop of get_elementwise_operation during merge and added call to get_elementwise_op to 4 other builders

* run clang-format

---------

Co-authored-by: Kevin Abraham <kevin.abraham@streamhpc.com>

[ROCm/composable_kernel commit: 964f8e1f60]
2025-11-19 12:31:05 -08:00
assistant-librarian[bot]
7ed276c492 Merge commit 'e6e2e04edbd5766afb388fc4ba64d57a9b52452e' into develop 2025-11-19 18:15:38 +00:00
Max Podkorytov
7098fd7442 [Inductor] Copy logic for ck-tile gemm instance configuration in Inductor max-autotune integration and test it (#2910)
* add op, gen_instances and test

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: e6e2e04edb]
2025-11-19 09:38:02 -08:00
assistant-librarian[bot]
29d162bec7 Merge commit '7fe7aa76f52ad7bdb0bb08c1f2d1de468cc8c070' into develop 2025-11-19 17:12:38 +00:00
Robin Voetter
e0bdf4511d [CK_BUILDER] fixes (#3222)
* ck-builder: some miscellaneous fixes
* ck-builder: fix InstanceSet.FromFactory test

The exact syntax that the instance string functionality
returns has changed. This commit updates the test to expect
the right string.

[ROCm/composable_kernel commit: 7fe7aa76f5]
2025-11-19 09:05:25 -08:00
assistant-librarian[bot]
abf4a7ea2f Merge commit '9837ba5af2d9a9fad3b5e7eddd871101c7402487' into develop 2025-11-19 16:14:27 +00:00
Aviral Goel
eceaba0da4 chore(copyright): update copyright header for tutorial directory (#3230)
* chore(copyright): update copyright header for tile_engine directory

* chore(copyright): update copyright header for script directory

* chore(copyright): update copyright header for test_data directory

* chore(copyright): update copyright header for python directory

* chore(copyright): update copyright header for profiler directory

* chore(copyright): update copyright header for library directory

* chore(copyright): update copyright header for include directory

* chore(copyright): update copyright header for docs directory

* chore(copyright): update copyright header for tutorial directory

[ROCm/composable_kernel commit: 9837ba5af2]
2025-11-19 07:20:53 -08:00
Illia Silin
c2247fa9a5 Refactor Jenkinsfile (#3229)
* allow using alternative compiler in all CI stages

* get rid of some redundancies in jenkinsfile

* clean up jenkinsfile a bit more

* further clean up jenkinsfile

* do not force user jenkins in ci dockers

[ROCm/composable_kernel commit: 3e8e6f7e4f]
2025-11-19 07:20:25 -08:00
assistant-librarian[bot]
2e77585ae1 Merge commit '1eb26460aa621028c5d5a8a20cf593ed8a3a3cc5' into develop 2025-11-19 15:13:07 +00:00
Yashvardhan Agarwal
5f7f81660d [ck_tile] Pooling example - Improved tile sizes (#3233)
* improved tile sizes

- modified tile sizes for improved example performance

* Update example/ck_tile/36_pooling/pool3d.cpp

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

---------

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

[ROCm/composable_kernel commit: 1eb26460aa]
2025-11-19 15:30:18 +01:00