Commit Graph

3879 Commits

Author SHA1 Message Date
assistant-librarian[bot]
1829bc6596 Merge commit 'f38c3de9f9047e72429c796fd0445f36eceb142b' into develop 2025-11-21 03:31:42 +00:00
John Shumway
3f33037f60 Fix copyright messages in experimental/builder. (#3253)
Our copyright were were mostly correct, but we inconsistently used (C) instead of (c) like the rest of the CK code. This PR fixes that (using lowercase c) and adds a missing copyright header to one file.

[ROCm/composable_kernel commit: f38c3de9f9]
2025-11-20 17:40:55 -08:00
assistant-librarian[bot]
967480c146 Merge commit 'c8563f2101d864ed0cc1f68f02763ee4ec6aa59d' into develop 2025-11-21 01:40:40 +00:00
Aviral Goel
7cee27c4a2 chore(copyright): update copyright header for test directory (#3252)
* chore(copyright): update copyright header for test directory

* chore(copyright): update copyright header for test directory

* chore(copyright): update copyright header for client_example directory

* chore(copyright): update copyright header for test directory

[ROCm/composable_kernel commit: c8563f2101]
2025-11-20 20:36:57 -05:00
Aviral Goel
ad1f388f7f chore(copyright): update copyright header for cmake directory (#3254)
[ROCm/composable_kernel commit: a960c9950b]
2025-11-20 20:36:37 -05:00
lalala-sh
ba44e7b7a4 fix static assert (#3178)
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: f58bd56e6b]
2025-11-20 17:27:05 -08:00
yinglu
03b79e2264 fix:bf16x3:enable all instances on gfx950 (#3248)
* fix:bf16x3:enable all instances on gfx950

* fix clang-format fail

* fix clang-format fail

* fix:modified wrong params previously

[ROCm/composable_kernel commit: 4155eb24f9]
2025-11-20 17:09:43 -08:00
assistant-librarian[bot]
084d063087 Merge commit '938b8ed3bf40741176adbc897b66095c5453d15d' into develop 2025-11-20 19:11:49 +00:00
spolifroni-amd
ff54ec9463 Spolifroni amd/update changelog 711 (#3211)
* Update CHANGELOG.md with 7.1.1 information

* Update CHANGELOG.md

[ROCm/composable_kernel commit: 938b8ed3bf]
2025-11-20 10:51:18 -08:00
Yi DING
ac4f4ffb79 [CK_TILE] Refine FP32 => FP16/BF16 Conversion (#3215)
* [CK_TILE] Refine FP32 => FP16/BF16 Conversion

* Thank you Copilot

* Rename fix

* Fix example

* Fix accu checking

* Fix

* Fix

[ROCm/composable_kernel commit: 8b284a63a4]
2025-11-20 10:50:26 -08:00
Gavin Zhao
d80f38f77f Add support for RDNA1 GPUs (#3220)
* Allow compilation for RDNA1 (__gfx101__)

Signed-off-by: Gavin Zhao <git@gzgz.dev>

* More RDNA1 changes

Signed-off-by: Gavin Zhao <git@gzgz.dev>

* Even more RDNA1 changes

Signed-off-by: Gavin Zhao <git@gzgz.dev>

* cmake: skip build quantization for unsupported arches

* add gfx10-1-generic support as well

* add gfx1013 and complete gfx10-1-generic

* fix clang format

* enable DL kernels on gfx101x

---------

Signed-off-by: Gavin Zhao <git@gzgz.dev>
Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: 07314ac543]
2025-11-20 10:45:57 -08:00
Robin Voetter
eb3ebe3b38 ck-builder: add remaining ck factory tests (#3223)
Now that the remaining reflection has been implemented, we
can add the remaining factory tests too. This is the complete set
of instances for forward grouped conv currently in CK.

[ROCm/composable_kernel commit: bb155ef678]
2025-11-20 10:42:36 -08:00
Robin Voetter
fe6bb0e811 ck-builder: group transfer operations per tensor (#3217)
Grouping transfer operations per tensor makes it easier to
constrain on and operate with the transfer operations. As an
example, we can now deduplicate the logic for translating
the transfer operations from the ck-builder interface to the old
ck interface for the A and B tensors.

[ROCm/composable_kernel commit: 245c6011cf]
2025-11-20 10:40:48 -08:00
Aviral Goel
635cf8df6c chore(copyright): update copyright header for library directory (#3239)
[ROCm/composable_kernel commit: fb43760c66]
2025-11-20 10:36:05 -08:00
Aviral Goel
ef107dac80 chore(copyright): update copyright header for test directory (#3243)
* chore(copyright): update copyright header for test directory

* chore(copyright): update copyright header for test directory

[ROCm/composable_kernel commit: 7dfc46d73d]
2025-11-20 10:33:34 -08:00
assistant-librarian[bot]
8b93b58bcd Merge commit '2e4b8a8fc455a14ad5cf89f7f750060ff20c40bb' into develop 2025-11-20 17:12:11 +00:00
Emily Martins
4aa8d64c9a [CK_TILE] Remove Old CK Tile Stream-K Artifacts (#3202)
* Remove old CK Tile Stream-K implementation

The original CK Stream-K implementation was based on old CK's Stream-K
block to C tile map. However, this implementation did not align with the
original Stream-K paper. Thus, we implemented a new tile partitioner and
associated Stream-K kernel, which was placed in the reboot namespace.

Now that the new Stream-K implementation is ready, this change removes
all artifacts of the old implementation. Specifically, the following
changes were made:
- Removes old Stream-K tile partitioner from CK Tile
- Removes the reboot namespace such that the new implementation resides
  in the ck_tile namespace only.
- Adds tests for bf8 and fp8 using the new implementation
- Removes tests for the old implementation
- Remove the v2 suffix from the new CK Tile Tile Partitioner
derived classes.
- Updates Stream-K Kernel ops file to use /** commenting style.

* Remove v2 from tile partitioner validation function names

[ROCm/composable_kernel commit: 2e4b8a8fc4]
2025-11-20 09:32:32 -07:00
assistant-librarian[bot]
b2e58aec1a Merge commit '5adaa201eda9337553459bc4321b11695e380832' into develop 2025-11-20 16:14:36 +00:00
asleepzzz
06d2e609cd Revert "Add attn sink (#2892)" (#3250)
This reverts commit bbe1d3a917ee92655224c0f1528ace3a7b0e82a8.

[ROCm/composable_kernel commit: 5adaa201ed]
2025-11-20 07:55:15 -08:00
assistant-librarian[bot]
d1c35e8426 Merge commit '9fa4e8d5ab0b80855b5aeafb2e7907302c1c004d' into develop 2025-11-20 12:20:10 +00:00
Linjun-AMD
f4ba63deb7 Add attn sink (#2892)
* enable attn sink

Signed-off-by: JL-underdog <Jun.Lin@amd.com>

* update attn_sink script

Signed-off-by: JL-underdog <Jun.Lin@amd.com>

* fix some error

Signed-off-by: JL-underdog <Jun.Lin@amd.com>

* clang-format

Signed-off-by: JL-underdog <Jun.Lin@amd.com>

* update fmha_bwd mask

Signed-off-by: JL-underdog <Jun.Lin@amd.com>

* update fmha_bwd_kernel'mask

Signed-off-by: JL-underdog <Jun.Lin@amd.com>

* update block_fmha_pipeline_qr_ks_vs.hpp

Signed-off-by: JL-underdog <Jun.Lin@amd.com>

* fix ci error

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* fix format error

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* Update block_fmha_bwd_pipeline_default_policy.hpp

* Update fmha_fwd_runner.hpp

* Update block_fmha_batch_prefill_pipeline_qr_ks_vs_async.hpp

* Update fmha_fwd_runner.hpp

* Update fmha_fwd_runner.hpp

* Update fmha_fwd_runner.hpp

* update splitkv_pipline

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* update splitkv&pagedkv pipeline

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* add sink test

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* update attn_sink result log

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* update smoke_test_fwd_sink.sh

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* update test file

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* update test script

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* Update block_fmha_fwd_splitkv_pipeline_qr_ks_vs.hpp

* use constexpr kHasSink for sink in fmha pipeline

Signed-off-by: Linjun-AMD <Jun.Lin@amd.com>

* update by pre-commit

Signed-off-by: Linjun-AMD <Jun.Lin@amd.com>

* Update include/ck_tile/ops/fmha/pipeline/block_fmha_pipeline_qr_ks_vs.hpp

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update include/ck_tile/ops/fmha/pipeline/block_fmha_pipeline_qr_ks_vs.hpp

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update include/ck_tile/ops/fmha/kernel/fmha_fwd_pagedkv_kernel.hpp

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update fmha_fwd.py

* Update example/ck_tile/01_fmha/codegen/ops/fmha_fwd_splitkv.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update include/ck_tile/ops/fmha/pipeline/block_fmha_fwd_splitkv_pipeline_nwarp_sshuffle_qr_ks_vs.hpp

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Remove causal mask setting logic from mask.hpp

Removed the mask setting logic for causal masks.

* fix ci error that some usage of lamada not support in c++17

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* Update remod.py

* add smoke sink test

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* Update fmha_pagedkv_prefill.py

* Update FmhaFwdPipeline parameters in fmha_fwd.py

* update block_fmha_pipeline_qr_ks_vs_async_trload.hpp

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* fix c++17 unsupprot error

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* Update block_fmha_fwd_pagedkv_pipeline_qr_ks_vs.hpp

* Fix formatting of sink_seq_end assignment

* Fix indentation for sink_seq_end assignment

* Update block_fmha_fwd_pagedkv_pipeline_qr_ks_vs.hpp

---------

Signed-off-by: JL-underdog <Jun.Lin@amd.com>
Signed-off-by: LJ-underdog <Jun.Lin@amd.com>
Signed-off-by: Linjun-AMD <Jun.Lin@amd.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

[ROCm/composable_kernel commit: 9fa4e8d5ab]
2025-11-20 19:24:05 +08:00
assistant-librarian[bot]
fb2d106a2f Merge commit '84540edff312c18ba50c01e995774da37faa0a29' into develop 2025-11-20 05:13:08 +00:00
Illia Silin
38605f5091 fix typo (#3244)
[ROCm/composable_kernel commit: 84540edff3]
2025-11-19 20:23:09 -08:00
assistant-librarian[bot]
809c1ead72 Merge commit '47e2ed838e3547bba1b48d3f559f20f46fd07b87' into develop 2025-11-20 02:43:03 +00:00
Yi DING
0d9f230577 [CK_TILE] Add Flatmm MX FP8 (#3208)
* Use async for flatmm mxfp4

* Fix preshuffle

* Add flatmm mxfp8

* Thanks, Copilot

* Thanks Copilot again~

[ROCm/composable_kernel commit: 47e2ed838e]
2025-11-20 10:35:15 +08:00
AviralGoelAMD
158fec303c chore(copyright): update copyright header for test directory
[ROCm/composable_kernel commit: 4e49e0228b]
2025-11-19 17:43:28 -07:00
linqunAMD
ac0fb4fec5 [ck_tile] enable test grouped_gemm_quant and gemm_streamk on gfx12 (#3196)
1. Enable grouped_gemm_quant and gemm_streamk on gfx12
- test_ck_tile_streamk_smoke is kept on gfx9, since it looks someone is still working on it.
2. Update warp tile size in grouped_gemm_quant and gemm_streamk unit test
3. Reduce gemm tile size to pass the build on gfx12 in test_gemm_streamk_reboot_types.hpp

[ROCm/composable_kernel commit: d2e32b4305]
2025-11-20 08:40:27 +08:00
assistant-librarian[bot]
ca48bf3b98 Merge commit 'cd8af997e6d1fde6bc4397bd6ab4fca46510e776' into develop 2025-11-19 21:11:39 +00:00
Michal Kulikowski
4a5e7d098d [CK] s_prefetch unit test fixes.
Signed-off-by: Michal Kulikowski <Michal.Kulikowski@amd.com>


[ROCm/composable_kernel commit: cd8af997e6]
2025-11-19 21:54:50 +01:00
Michal Kulikowski
8fc5eca798 [CK] Added s_prefetch unit test.
-added s_buffer_load_b32/64 assembly
-added amd_s_buffer_load_impl

Signed-off-by: Michal Kulikowski <Michal.Kulikowski@amd.com>


[ROCm/composable_kernel commit: f3ef7acca0]
2025-11-19 21:54:50 +01:00
kabrahamAMD
b9ee41c660 [CK_Builder ]fixed accidental drop of get_elementwise_operation during merge and added usage of get_elementwise_operation() to other builder instances (#3238)
Fixed issues encountered during merge of #3192

* fixed accidental drop of get_elementwise_operation during merge and added call to get_elementwise_op to 4 other builders

* run clang-format

---------

Co-authored-by: Kevin Abraham <kevin.abraham@streamhpc.com>

[ROCm/composable_kernel commit: 964f8e1f60]
2025-11-19 12:31:05 -08:00
assistant-librarian[bot]
7ed276c492 Merge commit 'e6e2e04edbd5766afb388fc4ba64d57a9b52452e' into develop 2025-11-19 18:15:38 +00:00
Max Podkorytov
8bcd57d8d4 [Inductor] Copy logic for ck-tile gemm instance configuration in Inductor max-autotune integration and test it (#2910)
* add op, gen_instances and test

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: e6e2e04edb]
2025-11-19 09:38:02 -08:00
assistant-librarian[bot]
29d162bec7 Merge commit '7fe7aa76f52ad7bdb0bb08c1f2d1de468cc8c070' into develop 2025-11-19 17:12:38 +00:00
Robin Voetter
2dabd450c1 [CK_BUILDER] fixes (#3222)
* ck-builder: some miscellaneous fixes
* ck-builder: fix InstanceSet.FromFactory test

The exact syntax that the instance string functionality
returns has changed. This commit updates the test to expect
the right string.

[ROCm/composable_kernel commit: 7fe7aa76f5]
2025-11-19 09:05:25 -08:00
assistant-librarian[bot]
abf4a7ea2f Merge commit '9837ba5af2d9a9fad3b5e7eddd871101c7402487' into develop 2025-11-19 16:14:27 +00:00
Aviral Goel
34c1fc5ae7 chore(copyright): update copyright header for tutorial directory (#3230)
* chore(copyright): update copyright header for tile_engine directory

* chore(copyright): update copyright header for script directory

* chore(copyright): update copyright header for test_data directory

* chore(copyright): update copyright header for python directory

* chore(copyright): update copyright header for profiler directory

* chore(copyright): update copyright header for library directory

* chore(copyright): update copyright header for include directory

* chore(copyright): update copyright header for docs directory

* chore(copyright): update copyright header for tutorial directory

[ROCm/composable_kernel commit: 9837ba5af2]
2025-11-19 07:20:53 -08:00
Illia Silin
3c389eb3f1 Refactor Jenkinsfile (#3229)
* allow using alternative compiler in all CI stages

* get rid of some redundancies in jenkinsfile

* clean up jenkinsfile a bit more

* further clean up jenkinsfile

* do not force user jenkins in ci dockers

[ROCm/composable_kernel commit: 3e8e6f7e4f]
2025-11-19 07:20:25 -08:00
assistant-librarian[bot]
2e77585ae1 Merge commit '1eb26460aa621028c5d5a8a20cf593ed8a3a3cc5' into develop 2025-11-19 15:13:07 +00:00
Yashvardhan Agarwal
94b3569da0 [ck_tile] Pooling example - Improved tile sizes (#3233)
* improved tile sizes

- modified tile sizes for improved example performance

* Update example/ck_tile/36_pooling/pool3d.cpp

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

---------

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

[ROCm/composable_kernel commit: 1eb26460aa]
2025-11-19 15:30:18 +01:00
assistant-librarian[bot]
055f643f4f Merge commit 'ad57f6ef0bcaeef7988bfd3954aac06554f12afb' into develop 2025-11-19 11:12:09 +00:00
John Shumway
b5e2f26808 [CK_BUILDER] Put global CK functions in an the CK namespace (#3232)
* Wrap ck host utitlies in CK namespace.

The CK and CK-Tile source code bases are incompatible because CK is not properly using namespaces everywhere. In particular, we need to put hip_check_error in the ck namespace.

Move all functions in include/ck_/host_utility that were in global namespace into the ck namespace.

There may be additional namespace problems like this, and it's possible we'll have namespace clashes. But it is good design to properly guard our to code bases (CK and CKTile) so that they can both coexist. Moreover, estabilishing this compatiblity is essential if we are going to allow the builder to instantiate  kernels from either template library.

* Add using declarations to test code.

After moving some of the untils into the ck namespace, most examples and a few tests had to be updated to recognize the new namespace declarations. We add using declarations to individual compute units for functions that were previously in the global namespace.

* Add using declarations to client examples.

[ROCm/composable_kernel commit: ad57f6ef0b]
2025-11-19 11:23:02 +01:00
assistant-librarian[bot]
05f83b643f Merge commit 'd7b31978692a6747f5fc232e2ac424566e40b0b8' into develop 2025-11-19 06:16:01 +00:00
Anton Gorenko
44936cfdec [CK_TILE] FMHA Reduce register spilling in fwd with dropout (workaround for CI failures with clang-22) (#3221)
* Use vectorized stores for dropout randvals

With no kPadSeqLenK the kernel uses 2 buffer_store_dwordx2 instead of
16 buffer_store_byte. This requires less registers and reduces spilling.

* Calculate dropout randvals for storing and applying only once

Even though it may add a small overhead when storing is not required,
it uses significantly less registers and hence no spilling.

[ROCm/composable_kernel commit: d7b3197869]
2025-11-19 10:40:12 +05:00
assistant-librarian[bot]
751e5d85a6 Merge commit 'e91ee8578cc9e493f12ee01055a35a405571effc' into develop 2025-11-18 19:12:13 +00:00
Aviral Goel
b6c966df35 chore(copyright): update copyright header for docs & include directory (#3226)
* chore(copyright): update copyright header for tile_engine directory

* chore(copyright): update copyright header for script directory

* chore(copyright): update copyright header for test_data directory

* chore(copyright): update copyright header for python directory

* chore(copyright): update copyright header for profiler directory

* chore(copyright): update copyright header for library directory

* chore(copyright): update copyright header for include directory

* chore(copyright): update copyright header for docs directory

[ROCm/composable_kernel commit: e91ee8578c]
2025-11-18 10:23:14 -08:00
Aviral Goel
902250eab3 chore(copyright): update copyright header for include directory (#3224)
* chore(copyright): update copyright header for tile_engine directory

* chore(copyright): update copyright header for script directory

* chore(copyright): update copyright header for test_data directory

* chore(copyright): update copyright header for python directory

* chore(copyright): update copyright header for profiler directory

* chore(copyright): update copyright header for library directory

* chore(copyright): update copyright header for include directory

[ROCm/composable_kernel commit: f5ac3ee359]
2025-11-18 10:17:18 -08:00
Max Podkorytov
3774b900d1 [CK-Tile] Remove usage of tile partitioner's full gemm shape (#3204)
gemm shape should be used from the pipeline instead (where it gets from a problem description struct)

[ROCm/composable_kernel commit: a3a4eb12bd]
2025-11-18 09:56:40 -08:00
assistant-librarian[bot]
86a4127e31 Merge commit 'ac70206b2c8b43447e46ad382057fe56dc639803' into develop 2025-11-18 15:13:30 +00:00
Aviral Goel
a07cd6bc71 feat: add support for bf16 for grouped_gemm & grouped_gemm_preshuffle… (#3225)
* feat: add support for bf16 for grouped_gemm & grouped_gemm_preshuffle kernel(s) along with unit test

* docs: Update CHANGELOG.MD

[ROCm/composable_kernel commit: ac70206b2c]
2025-11-18 09:32:27 -05:00