Commit Graph

2642 Commits

Author SHA1 Message Date
Linjun-AMD
f4ba63deb7 Add attn sink (#2892)
* enable attn sink

Signed-off-by: JL-underdog <Jun.Lin@amd.com>

* update attn_sink script

Signed-off-by: JL-underdog <Jun.Lin@amd.com>

* fix some error

Signed-off-by: JL-underdog <Jun.Lin@amd.com>

* clang-format

Signed-off-by: JL-underdog <Jun.Lin@amd.com>

* update fmha_bwd mask

Signed-off-by: JL-underdog <Jun.Lin@amd.com>

* update fmha_bwd_kernel'mask

Signed-off-by: JL-underdog <Jun.Lin@amd.com>

* update block_fmha_pipeline_qr_ks_vs.hpp

Signed-off-by: JL-underdog <Jun.Lin@amd.com>

* fix ci error

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* fix format error

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* Update block_fmha_bwd_pipeline_default_policy.hpp

* Update fmha_fwd_runner.hpp

* Update block_fmha_batch_prefill_pipeline_qr_ks_vs_async.hpp

* Update fmha_fwd_runner.hpp

* Update fmha_fwd_runner.hpp

* Update fmha_fwd_runner.hpp

* update splitkv_pipline

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* update splitkv&pagedkv pipeline

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* add sink test

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* update attn_sink result log

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* update smoke_test_fwd_sink.sh

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* update test file

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* update test script

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* Update block_fmha_fwd_splitkv_pipeline_qr_ks_vs.hpp

* use constexpr kHasSink for sink in fmha pipeline

Signed-off-by: Linjun-AMD <Jun.Lin@amd.com>

* update by pre-commit

Signed-off-by: Linjun-AMD <Jun.Lin@amd.com>

* Update include/ck_tile/ops/fmha/pipeline/block_fmha_pipeline_qr_ks_vs.hpp

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update include/ck_tile/ops/fmha/pipeline/block_fmha_pipeline_qr_ks_vs.hpp

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update include/ck_tile/ops/fmha/kernel/fmha_fwd_pagedkv_kernel.hpp

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update fmha_fwd.py

* Update example/ck_tile/01_fmha/codegen/ops/fmha_fwd_splitkv.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update include/ck_tile/ops/fmha/pipeline/block_fmha_fwd_splitkv_pipeline_nwarp_sshuffle_qr_ks_vs.hpp

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Remove causal mask setting logic from mask.hpp

Removed the mask setting logic for causal masks.

* fix ci error that some usage of lamada not support in c++17

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* Update remod.py

* add smoke sink test

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* Update fmha_pagedkv_prefill.py

* Update FmhaFwdPipeline parameters in fmha_fwd.py

* update block_fmha_pipeline_qr_ks_vs_async_trload.hpp

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* fix c++17 unsupprot error

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* Update block_fmha_fwd_pagedkv_pipeline_qr_ks_vs.hpp

* Fix formatting of sink_seq_end assignment

* Fix indentation for sink_seq_end assignment

* Update block_fmha_fwd_pagedkv_pipeline_qr_ks_vs.hpp

---------

Signed-off-by: JL-underdog <Jun.Lin@amd.com>
Signed-off-by: LJ-underdog <Jun.Lin@amd.com>
Signed-off-by: Linjun-AMD <Jun.Lin@amd.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

[ROCm/composable_kernel commit: 9fa4e8d5ab]
2025-11-20 19:24:05 +08:00
Illia Silin
38605f5091 fix typo (#3244)
[ROCm/composable_kernel commit: 84540edff3]
2025-11-19 20:23:09 -08:00
Yi DING
0d9f230577 [CK_TILE] Add Flatmm MX FP8 (#3208)
* Use async for flatmm mxfp4

* Fix preshuffle

* Add flatmm mxfp8

* Thanks, Copilot

* Thanks Copilot again~

[ROCm/composable_kernel commit: 47e2ed838e]
2025-11-20 10:35:15 +08:00
AviralGoelAMD
158fec303c chore(copyright): update copyright header for test directory
[ROCm/composable_kernel commit: 4e49e0228b]
2025-11-19 17:43:28 -07:00
linqunAMD
ac0fb4fec5 [ck_tile] enable test grouped_gemm_quant and gemm_streamk on gfx12 (#3196)
1. Enable grouped_gemm_quant and gemm_streamk on gfx12
- test_ck_tile_streamk_smoke is kept on gfx9, since it looks someone is still working on it.
2. Update warp tile size in grouped_gemm_quant and gemm_streamk unit test
3. Reduce gemm tile size to pass the build on gfx12 in test_gemm_streamk_reboot_types.hpp

[ROCm/composable_kernel commit: d2e32b4305]
2025-11-20 08:40:27 +08:00
Michal Kulikowski
4a5e7d098d [CK] s_prefetch unit test fixes.
Signed-off-by: Michal Kulikowski <Michal.Kulikowski@amd.com>


[ROCm/composable_kernel commit: cd8af997e6]
2025-11-19 21:54:50 +01:00
Michal Kulikowski
8fc5eca798 [CK] Added s_prefetch unit test.
-added s_buffer_load_b32/64 assembly
-added amd_s_buffer_load_impl

Signed-off-by: Michal Kulikowski <Michal.Kulikowski@amd.com>


[ROCm/composable_kernel commit: f3ef7acca0]
2025-11-19 21:54:50 +01:00
kabrahamAMD
b9ee41c660 [CK_Builder ]fixed accidental drop of get_elementwise_operation during merge and added usage of get_elementwise_operation() to other builder instances (#3238)
Fixed issues encountered during merge of #3192

* fixed accidental drop of get_elementwise_operation during merge and added call to get_elementwise_op to 4 other builders

* run clang-format

---------

Co-authored-by: Kevin Abraham <kevin.abraham@streamhpc.com>

[ROCm/composable_kernel commit: 964f8e1f60]
2025-11-19 12:31:05 -08:00
Max Podkorytov
8bcd57d8d4 [Inductor] Copy logic for ck-tile gemm instance configuration in Inductor max-autotune integration and test it (#2910)
* add op, gen_instances and test

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: e6e2e04edb]
2025-11-19 09:38:02 -08:00
Robin Voetter
2dabd450c1 [CK_BUILDER] fixes (#3222)
* ck-builder: some miscellaneous fixes
* ck-builder: fix InstanceSet.FromFactory test

The exact syntax that the instance string functionality
returns has changed. This commit updates the test to expect
the right string.

[ROCm/composable_kernel commit: 7fe7aa76f5]
2025-11-19 09:05:25 -08:00
Aviral Goel
34c1fc5ae7 chore(copyright): update copyright header for tutorial directory (#3230)
* chore(copyright): update copyright header for tile_engine directory

* chore(copyright): update copyright header for script directory

* chore(copyright): update copyright header for test_data directory

* chore(copyright): update copyright header for python directory

* chore(copyright): update copyright header for profiler directory

* chore(copyright): update copyright header for library directory

* chore(copyright): update copyright header for include directory

* chore(copyright): update copyright header for docs directory

* chore(copyright): update copyright header for tutorial directory

[ROCm/composable_kernel commit: 9837ba5af2]
2025-11-19 07:20:53 -08:00
Illia Silin
3c389eb3f1 Refactor Jenkinsfile (#3229)
* allow using alternative compiler in all CI stages

* get rid of some redundancies in jenkinsfile

* clean up jenkinsfile a bit more

* further clean up jenkinsfile

* do not force user jenkins in ci dockers

[ROCm/composable_kernel commit: 3e8e6f7e4f]
2025-11-19 07:20:25 -08:00
Yashvardhan Agarwal
94b3569da0 [ck_tile] Pooling example - Improved tile sizes (#3233)
* improved tile sizes

- modified tile sizes for improved example performance

* Update example/ck_tile/36_pooling/pool3d.cpp

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

---------

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

[ROCm/composable_kernel commit: 1eb26460aa]
2025-11-19 15:30:18 +01:00
John Shumway
b5e2f26808 [CK_BUILDER] Put global CK functions in an the CK namespace (#3232)
* Wrap ck host utitlies in CK namespace.

The CK and CK-Tile source code bases are incompatible because CK is not properly using namespaces everywhere. In particular, we need to put hip_check_error in the ck namespace.

Move all functions in include/ck_/host_utility that were in global namespace into the ck namespace.

There may be additional namespace problems like this, and it's possible we'll have namespace clashes. But it is good design to properly guard our to code bases (CK and CKTile) so that they can both coexist. Moreover, estabilishing this compatiblity is essential if we are going to allow the builder to instantiate  kernels from either template library.

* Add using declarations to test code.

After moving some of the untils into the ck namespace, most examples and a few tests had to be updated to recognize the new namespace declarations. We add using declarations to individual compute units for functions that were previously in the global namespace.

* Add using declarations to client examples.

[ROCm/composable_kernel commit: ad57f6ef0b]
2025-11-19 11:23:02 +01:00
Anton Gorenko
44936cfdec [CK_TILE] FMHA Reduce register spilling in fwd with dropout (workaround for CI failures with clang-22) (#3221)
* Use vectorized stores for dropout randvals

With no kPadSeqLenK the kernel uses 2 buffer_store_dwordx2 instead of
16 buffer_store_byte. This requires less registers and reduces spilling.

* Calculate dropout randvals for storing and applying only once

Even though it may add a small overhead when storing is not required,
it uses significantly less registers and hence no spilling.

[ROCm/composable_kernel commit: d7b3197869]
2025-11-19 10:40:12 +05:00
Aviral Goel
b6c966df35 chore(copyright): update copyright header for docs & include directory (#3226)
* chore(copyright): update copyright header for tile_engine directory

* chore(copyright): update copyright header for script directory

* chore(copyright): update copyright header for test_data directory

* chore(copyright): update copyright header for python directory

* chore(copyright): update copyright header for profiler directory

* chore(copyright): update copyright header for library directory

* chore(copyright): update copyright header for include directory

* chore(copyright): update copyright header for docs directory

[ROCm/composable_kernel commit: e91ee8578c]
2025-11-18 10:23:14 -08:00
Aviral Goel
902250eab3 chore(copyright): update copyright header for include directory (#3224)
* chore(copyright): update copyright header for tile_engine directory

* chore(copyright): update copyright header for script directory

* chore(copyright): update copyright header for test_data directory

* chore(copyright): update copyright header for python directory

* chore(copyright): update copyright header for profiler directory

* chore(copyright): update copyright header for library directory

* chore(copyright): update copyright header for include directory

[ROCm/composable_kernel commit: f5ac3ee359]
2025-11-18 10:17:18 -08:00
Max Podkorytov
3774b900d1 [CK-Tile] Remove usage of tile partitioner's full gemm shape (#3204)
gemm shape should be used from the pipeline instead (where it gets from a problem description struct)

[ROCm/composable_kernel commit: a3a4eb12bd]
2025-11-18 09:56:40 -08:00
Aviral Goel
a07cd6bc71 feat: add support for bf16 for grouped_gemm & grouped_gemm_preshuffle… (#3225)
* feat: add support for bf16 for grouped_gemm & grouped_gemm_preshuffle kernel(s) along with unit test

* docs: Update CHANGELOG.MD

[ROCm/composable_kernel commit: ac70206b2c]
2025-11-18 09:32:27 -05:00
Sami Remes
acb3b43bc0 [CK_TILE] Non-K Major from old CK to CK-Tile - fix reverted PR (#3199)
* Reapply "[CK_TILE] Non-K Major from old CK to CK-Tile (#2442)" (#3017)

This reverts commit 1cda0c4c95e5f15f3fcbb9a5edf118ea85bcccd2.

* WIP

* take Y2 as the AK1/BK1 value, that is the 'vector size' after shuffle

* use get_n_lds_banks()

* clang-format

---------

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

[ROCm/composable_kernel commit: 3ede8e2a6e]
2025-11-18 10:17:02 +02:00
Yi DING
7336398fb6 [CK_TILE] MX Flatmm Split kernel instances (#3207)
* [CK_TILE] MX Flatmm Split kernel instances

* Fix flatmm example compile

[ROCm/composable_kernel commit: b6720531de]
2025-11-18 13:46:30 +08:00
kabrahamAMD
cad9d98976 [CK_Builder] removed direction and elementwise_operation from required parameters … (#3192)
Removed direction and elementwise operation from default values required for convolution signature concept. Added constexpr helpers to set default values. Add compile-time tests.

[ROCm/composable_kernel commit: 92498464f6]
2025-11-17 15:23:48 -08:00
Aviral Goel
41ef9a10f5 chore(copyright): update copyright header for include directory (#3219)
* chore(copyright): update copyright header for tile_engine directory

* chore(copyright): update copyright header for script directory

* chore(copyright): update copyright header for test_data directory

* chore(copyright): update copyright header for python directory

* chore(copyright): update copyright header for profiler directory

* chore(copyright): update copyright header for library directory

[ROCm/composable_kernel commit: 22a934a229]
2025-11-17 08:57:45 -08:00
Illia Silin
f8ec330b69 Disable DL kernels on all architectures except gfx103x. (#3218)
* disable dl kernels on all archs except gfx103

* add gfx10-3-generic target to cmake

[ROCm/composable_kernel commit: b38bb492a1]
2025-11-14 17:39:50 -08:00
Aviral Goel
0577b5dd78 chore(copyright): update copyright header for profiler directory (#3205)
* chore(copyright): update copyright header for tile_engine directory

* chore(copyright): update copyright header for script directory

* chore(copyright): update copyright header for test_data directory

* chore(copyright): update copyright header for python directory

* chore(copyright): update copyright header for profiler directory

[ROCm/composable_kernel commit: 0aadb4b2c4]
2025-11-14 11:19:25 -08:00
Aviral Goel
90503f7e3d chore(copyright): update copyright header for python directory (#3200)
* chore(copyright): update copyright header for tile_engine directory

* chore(copyright): update copyright header for script directory

* chore(copyright): update copyright header for test_data directory

* chore(copyright): update copyright header for python directory

[ROCm/composable_kernel commit: 3aa883b9ff]
2025-11-14 08:21:36 -08:00
jefyang1
72dbbc7d77 Add new gemm multiply multiply instances on gfx950 (#3213)
[ROCm/composable_kernel commit: d30babbd00]
2025-11-14 08:20:41 -08:00
John Afaganis
b49e30206f 7.2 version bump (#3210)
* 7.2 version bump

* Update CHANGELOG.md

* Update Jenkinsfile

* Update CHANGELOG.md

* Update CMakeLists.txt

* Update Jenkinsfile

[ROCm/composable_kernel commit: caadb896f1]
2025-11-13 21:04:03 -08:00
BingYuan.Zhou
807c297a17 fix build error (#3195)
Co-authored-by: root <root@hjbog-srdc-39.amd.com>

[ROCm/composable_kernel commit: 4d629cd2b0]
2025-11-14 09:46:13 +08:00
Yi DING
fda95832b0 [CK_TILE] Improve device printing (#3198)
* [CK_TILE] Improve device printing

* fix host gtest build

* clean

[ROCm/composable_kernel commit: 4a8b17d1a4]
2025-11-14 09:46:06 +08:00
yinglu
bdbe3e4eb9 Simulate TF32 with BF16x3 (#3142)
* tf32:bf16x3:use bf16x3 emulate tf32 gemm

* change blockwiseGemm to demo bf16x3

* temp push

* self review

* self review

* fix multi-device compile error

* bug fix

* code refactor

* limit to gfx950

* enhance gemm gfx942 threshold

* lower change from blockwise to warpwise

* refact codes

* refact codes

* error fix

* change threshold

* bug fix

* fix threshold error

* change host reference implement to same as device

* bug fix

* bug fix

* code refact

* fix clang-format fail

* code refine

[ROCm/composable_kernel commit: 2a73eb3bc0]
2025-11-13 16:21:09 -08:00
SamiAario-AMD
d49eb1d431 Remove "basic" and universal GEMM tests, and incorporate their test cases into the GEMM pipeline tests (#3094)
* Add missing copyright statements

* Use ck_tile::host_tensor_descriptor instead of a custom lambda

* Refactor use of check_data_type in test classes

* Use TEST_SUITE_NAME with TYPED_TEST_SUITE

* Remove an unused namespace

* Make dim3 const

* Add BF8 x BF8 tests for CompV3 in test_gemm_pipeline_kernel_types.hpp

* Add F8 x BF8 tests for CompV3 in test_gemm_pipeline_kernel_types.hpp

* Add BF16 x I4 tests for CompV3 in test_gemm_pipeline_kernel_types.hpp

* Add BF16 x BF16 tests for CompV3 in test_gemm_pipeline_kernel_types.hpp

* Add BF8 x I4 tests for CompV3 in test_gemm_pipeline_kernel_types.hpp

* Add F8 x I4 tests for CompV3 in test_gemm_pipeline_kernel_types.hpp

* Add F16 x I4 tests for CompV3 in test_gemm_pipeline_kernel_types.hpp

* Skip failing tests of F16 x I4 for CompV3 with K == 2 * K_Tile

* Add missing precision type combinations to CompV4 from CompV3

* Move the INT8 tests around for consistency with KernelTypesCompV3Wmma

* Add missing precision type combinations to CompV3Wmma from CompV3

* Remove the basic and universal tests and their dependencies

* On __gfx950__, avoid using transposed loading of A with datatype pk_int4_t of B

* Use ADataType and BDataType instead of ComputeDataType for WarpGemm

* Explicitly set some return types to void

* Use more general typenames in InterleavedPKTypeLoader

* Add load_interleaved_pk_type.hpp to common.hpp

* Use std::is_same_v in load_int4_tile

* Add handling of LoadTranspose to load_int4_tile

* Factor out common code in several places using load_int4_tile

* Add support for pk_int4_t using load_int4_tile

* Fix formatting

[ROCm/composable_kernel commit: f2cfc6b94e]
2025-11-13 11:01:27 -08:00
Ville Pietilä
547165ce4c [CK_BUILDER] Forward convolution builder improvements (#3179)
Proposed changes
Improve the forward convolution builder implementation and addressed leftover feedback left from PR #3138. Main changes

Refactored tests such that they reflect better the builder pattern. The templates and types for the convolution algorithm concepts are created via factory that facilitates programmatic creation of the device op instances.
Moved tests into anonymous namespace.
The convolution factory had lot of if-else constructs when CK Builder types were converted into CK library types. I had initially trouble in using static_assert in the default branch of switch as the static_assert was evaluated at compile time even for valid types. However, if we change the static_assert to throw "<error message>", it will result in a compile-time error only if the default branch is actually hit. This assumes that the function is consteval. Hence, changed all conversions in the convolution factory to use switch, which is more intuitive.
Removed the explicit device op definition from convolution signature and the corresponding predicate file. The device ops are defined by the corresponding concepts. This allowed to remove lot of boilerplate code from the convolution factory.
Adde inheritance and convolution algorithm specialization to handle device ops that are specialization of a more generic ones. The large tensor support is more naturally expressed by this pattern.
Added support for the FP8 data type.

* WIP: Builder for expected test results.

* Improve ckb fwd conv instance tests.

* clang-format

* Change if-else statements into switch in conv factory.

* Fix clang-formatting.

* Removed unnecessary includes.

* Added missing copyright.

* Remove explicit device op flag from from convolution signature.

* Add missing concept.

* Fix build.

* clang-format

* Add test for building conv fwd FP8 instances.

* Add missing header to instance traits.

* Clean-up recently added instances.

* Introduce inheritance and specialization.

* Use builder to build conv algorithm templates and types.

* clang-format

* Fix conv description tests.

---------

Co-authored-by: John Shumway <john.shumwayjr@gmail.com>

[ROCm/composable_kernel commit: 7d57bc169f]
2025-11-13 08:47:25 -08:00
jefyang1
2a6a163e7d Fix test_gemm_multiply_multiply_wp_xdl_fp8 on gfx950 (#3191)
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: ca2ee0eb8a]
2025-11-13 09:32:54 -06:00
Yi DING
f5eb722fbe [CK_TILE] Improve F8F6F4 Scaled WarpGemm (#3197)
* [CK_TILE] Improve F8F6F4 Scaled WarpGemm

* Thanks, Copilot

[ROCm/composable_kernel commit: 8d50001b93]
2025-11-13 20:22:05 +08:00
Khushbu Agarwal
1ec766b17d fixing ambiguous shuffle definitions (#3175)
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>

[ROCm/composable_kernel commit: fb41a7b73b]
2025-11-12 23:44:12 -08:00
Cong Ma
fec8b3228b [CK TILE GEMM] Refactor block_scale_gemm examples (#3181)
* [CK TILE GEMM] Refactor block_scale_gemm examples

- Split cpp file to reduce building time
- Support multiple GemmConfig

* [CK TILE GEMM] Refactor block_scale_gemm examples

- Update Readme

* [CK TILE GEMM] Refactor block_scale_gemm examples

- Add support for rowcol and tensor GEMM operations

* [CK TILE GEMM] Refactor block_scale_gemm examples

- Update README

* [CK TILE GEMM] Refactor block_scale_gemm examples

- Set quant group size to (1, 1, 64) for targets excluding gfx950, where warp tile size (16, 16, 128) is incompatible.

[ROCm/composable_kernel commit: 6fd8ddabe7]
2025-11-12 23:43:40 -08:00
Thrupti Raj Lakshmana Gowda
5c19f34cb4 Ck tile engine commons (#3166)
* Moving Preshuffle to commons

* Fixing Common Validations

* Addressing Review Comments

* Partial Rebasing

* Partial Rebasing

* Partial Rebasing

* Rebasing Complete

[ROCm/composable_kernel commit: 9af30f04b6]
2025-11-13 00:56:18 -06:00
Aviral Goel
4c43e89a84 chore(copyright): update copyright header for test_data directory (#3194)
* chore(copyright): update copyright header for tile_engine directory

* chore(copyright): update copyright header for script directory

* chore(copyright): update copyright header for test_data directory

[ROCm/composable_kernel commit: 797ddfa41e]
2025-11-12 16:07:28 -08:00
John Afaganis
97dba6f3c5 Add C++17 deprecation warning to CHANGELOG.md (#3203)
* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

[ROCm/composable_kernel commit: 9342365713]
2025-11-12 16:05:53 -08:00
Illia Silin
fbab772ad4 add permissions for /tmp folder (#3201)
[ROCm/composable_kernel commit: 3784c0e7c3]
2025-11-12 11:47:07 -08:00
Enrico Degregori
e00db44d0c Wmma support for gemm_reduce (#3145)
* Initial implementation GEMM+Reduce:

 - device struct
 - epilogue struct

* Fix tests, improve profiler and add initial instances

* Add instances

* Fix compilation error

* Address review comments

* Fix logging

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: 7414a0f4d4]
2025-11-12 11:23:54 -08:00
Yashvardhan Agarwal
c8c5a7e1c6 [CK_Tile] Pooling example readme update (#3174)
* pooling example readme update

- The updated readme explains the transformations of the pooling kernel
using a mermaid diagram

* Update example/ck_tile/36_pooling/README.md

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* resolve comments

---------

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

[ROCm/composable_kernel commit: 299c9bca1b]
2025-11-12 07:30:20 -08:00
Po Yen Chen
97cb3abf33 [CK_TILE] Share partition index across threads and specify offset in load_tile()/async_load_tile()/load_tile_transpose() (#2905)
* Allow sharing partition index across threads

* Fix typo PartitoinIndex -> PartitionIndex

* Remove C++20 'requires' usages

* Add missing template arguments

* Fix load_tile() overload ambiguity issue

* Use SFINAE to exclude invalid arguments

* Add additional offset parameter to the async_load_tile()

* Remove async_load_tile() default argument to avoid ambiguity

* Extract tile_window coordinate compute logic as method

* Use warp-shared LDS base address in tile_window::async_load()

* Add constraint to tile_window::load() templates

* Fix wrong type traits is_class_v<> usages

* Add missing constraint to async_load_tile()

* Add missing tile_window::load() overload

* Add more constraint to avoid load_tile() call ambiguity

* Rename ParitionIndex as ReplacementPartitionIndex

* Update pre_computed_warp_coords_ in move_extended()

* Fix inconsistency between template parameters and documentation

* Allow specifying pre-computed parition index

* Add type straits is_sequence<> & is_tile_distribution<>

* Add type straits is_tensor_view<>

* Add type constraints to make_tile_window() templates

* Allow passing partition_index to set_tile_if()

* Allow specifying partition_index to store_tile()

* Add missing template parameter of replace_bottom_tensor_view()

* Allow passing partition_index to Default2DEpilogue

* Make get_partition_index() public

* Add _with_offset() postfix to avoid resolution error

* Remove ReplacementPartitionIndex template param

* Add missing comments

* Add load_tile_transpose_with_offset() overload

[ROCm/composable_kernel commit: 40d2ed0f2a]
2025-11-12 10:26:14 +08:00
Bartłomiej Kocot
a2a69e7649 [CK_BUILDER] Add grouped conv fwd ck tile traits (#3183)
* [CK BUILDER] Add grouped conv fwd ck tile traits

* Update instance_traits_tile_grouped_convolution_forward.hpp

* Update grouped_convolution_forward_kernel.hpp


[ROCm/composable_kernel commit: 92c1f4981a]
2025-11-11 13:55:33 -08:00
Aviral Goel
f01853cf46 Add CK Tile Tutorials Folder with GEMM and COPY Kernel (#3038)
* feat: add tutorial folder with gemm tutorial

* chore: move copy kernel from examples folder to tutorial

* Update tutorial/ck_tile/01_naive_gemm/README.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update tutorial/ck_tile/01_naive_gemm/README.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* chore: remove handdrawn images

* docs: add write ups to explain the gemm kernel

* docs: add about block level pipeline and static distributed tensors

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

[ROCm/composable_kernel commit: b145a5fe80]
2025-11-11 14:15:49 -06:00
Aviral Goel
a8d2ecc971 docs: update ckProfiler readme with selective building option (#3140)
* docs: update ckProfiler readme with selective building option

* docs: add list of operations for ckProfiler

[ROCm/composable_kernel commit: c54ecd905b]
2025-11-11 14:27:33 -05:00
Aviral Goel
9ec4b67288 chore(copyright): update copyright header for script directory (#3184)
* chore(copyright): update copyright header for tile_engine directory

* chore(copyright): update copyright header for script directory

---------

Co-authored-by: Vidyasagar Ananthan <vanantha@amd.com>

[ROCm/composable_kernel commit: ab68c9d384]
2025-11-11 11:26:01 -08:00
linqunAMD
13cf0bd17f [CK_TILE] Fix gemm_quant (#3186)
[ROCm/composable_kernel commit: 1b1c46e508]
2025-11-11 08:23:57 -08:00
Aviral Goel
c1b5372db3 chore(copyright): update copyright header for tile_engine directory (#3180)
[ROCm/composable_kernel commit: 88e3212fcc]
2025-11-11 08:17:24 -08:00