Commit Graph

2682 Commits

Author SHA1 Message Date
Aviral Goel
ab0101c59c chore(copyright): update copyright header for library directory (#3274)
* chore(copyright): update copyright header  for library directory

* chore(copyright): update copyright header for library directory

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
2025-11-24 18:10:26 -08:00
Aviral Goel
d85f065b15 chore(copyright): update copyright header for example directory (#3273)
* chore(copyright): update copyright header for codegen directory

* chore(copyright): update copyright header for example directory
2025-11-24 18:02:41 -08:00
rocking
229d43ea0c Fix batch prefill compile fail in aiter (#3279)
* Fix batch prefill aiter compile fail

* Fix compile error
2025-11-25 09:46:32 +08:00
Thomas Ning
de6a9590ab Reorganize of KPack in GEMM (#3247)
* add the reorganize of KPack

* fix the compilation error

* fix the compilation error
2025-11-24 12:38:59 -08:00
Aviral Goel
e95337c58c chore(copyright): update copyright header for codegen directory (#3266) 2025-11-24 10:12:40 -08:00
John Shumway
1bc7529977 Guard a builder test to avoid gfx11 and gfx12 (#3268)
We're getting a compile error on gfx11 and gfx12 for an I8 test that doesn't have a supported WMMA implmentation. We'll need to build architecture support into the builder, but to get things green I'm just adding an ifndef guard around the test.
2025-11-24 10:10:09 -08:00
Christopher Millette
b9c6cb1452 First look at mfma / wmma unification (#2704)
* First look at mfma / wmma unification

* Refactor

* Re-org file structure

* Restructure transform selection and WaveWiseMma class

* Update license files. Add missing gfx1151 support. Change wave size for HOST to 1. Update datatypes naming consistency

* Fixes default MmaSelector implentation

* Adds unit tests for amdgcn_mma and arch

* Consolidate common arch id checks to constexpr functions. Strongly type ids as amdgcn_target_arch_id object.

* Refactor is_any_value_of

* Fixes mma_selector logic

* Fix typo

* Add mma selector test for tile decomposition

* Fix compilation of mma.hpp

* Revert back to c++17 compatibility

* Fix compiler error by returning index_t from get_warp_size()

* Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Fixes compiler error for missing is_wave32() function

* Fixes compiler error for host wave_size() should be 64

* Fixes compiler errors where __cpp_concepts is not defined

* Fixes compiler errors where __cpp_concepts is not defined

* Fix test failure for host is wave64 by default

---------

Co-authored-by: Chris Millette <you@example.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-11-24 09:39:59 -08:00
Khushbu Agarwal
8111572785 [CK_Tile] Support for preshuffle weight(B) quant tensor for block scale gemm (#3165)
* formatted

* formatted

* formatting

* formatting

* formatting

* [CK TILE GEMM] Refactor block_scale_gemm examples

- Split cpp file to reduce building time
- Support multiple GemmConfig

* [CK TILE GEMM] Refactor block_scale_gemm examples

- Update Readme

* enable prefill shapes

* [CK TILE GEMM] Refactor block_scale_gemm examples

- Add support for rowcol and tensor GEMM operations

* [CK TILE GEMM] Refactor block_scale_gemm examples

- Update README

* adding preshuffle quant as new parameter and its associated new files

* remove debugging statements

* adding test

* enable preshuffle quant with permuteN

* updating readme and correcponding gemmconfigs

* updating cmake file

* fixing CI failures for grouped quant gemm

* addressing review comments

* fixing CI issue

* addressing reveiw comments

* formatting

* formatting

* fixing aquant operator overlaoding

* formatting

---------

Co-authored-by: Cong Ma <congma13@amd.com>
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>
2025-11-24 07:48:42 -08:00
Illia Silin
e857e26bf6 disable CI on gfx1010 by default (#3280) 2025-11-24 07:06:41 -08:00
Qianfeng
81042ea574 Fix a bug for qr_ks_vs_async_trload pipeline (#3271) 2025-11-24 21:31:48 +08:00
rocking
5948dbffe4 Support fp8 dynamic quantization for fmha (#3206)
* Support qscale for dynamic quant, remove static quant

* Support hdim=256

* Remove bias test case for fp8

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: asleepzzz <hanwen.chang@amd.com>
2025-11-24 16:28:25 +08:00
Johannes Graner
096f0a3b23 [CK Tile] Fix example for conv fwd + bias + clamp (#3235)
* Fix clamp not being applied correctly

* Apply group offsets to D tensors

---------

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>
2025-11-24 07:36:26 +01:00
Aviral Goel
f6c999bddb chore(copyright): update copyright header for test directory (#3265) 2025-11-22 19:38:27 -05:00
Emily Martins
02ab76c2cb Fix CK Tile DP + 2 Tile Stream-K Validation Errors (#3269)
When there are multiple workgroups contributing to a tile, when using
atomics, there may be round off error in cases where the accumulator
type is not the same as the C type. To compute an error tolerance for
test validation, the Stream-K Tile Partitioner has a function called
estimate_num_wgs_per_tile to estimate the number of workgroups per tile.
That said, this function only provides an estimate. In some cases for
DP+2TSK, the function returns 1 rather than the more accurate value of
2.

Thus, this change updates the estimate_num_wgs_per_tile function to
explicitely return the value of 2 in cases for DP+2TSK to ensure that we
have a better error tolerance to avoid test failures due to round-off
error.
2025-11-21 20:29:47 -07:00
Illia Silin
21ae743acd Enable daily builds on gfx1010 (#3258)
* add build/test on gfx1010

* only build and run on gfx1010 once daily
2025-11-21 07:22:01 -08:00
John Shumway
ea6e4fcbbc Fix builder errors. (#3260)
There were four errors to fix:
1. The checks for defaulted direction were not implemented in the predicate concept.
2. Had to delete an obsolete and undefined operation enum.
3. A factory was passing a boolean in place of an integer.
4. Some of the factory tests are not compiling correctly when linking in the full source (with CK_EXPERIMENTAL_BUILDER=ON), so I commented them out.
2025-11-21 15:25:45 +01:00
John Shumway
f38c3de9f9 Fix copyright messages in experimental/builder. (#3253)
Our copyright were were mostly correct, but we inconsistently used (C) instead of (c) like the rest of the CK code. This PR fixes that (using lowercase c) and adds a missing copyright header to one file.
2025-11-20 17:40:55 -08:00
Aviral Goel
c8563f2101 chore(copyright): update copyright header for test directory (#3252)
* chore(copyright): update copyright header for test directory

* chore(copyright): update copyright header for test directory

* chore(copyright): update copyright header for client_example directory

* chore(copyright): update copyright header for test directory
2025-11-20 20:36:57 -05:00
Aviral Goel
a960c9950b chore(copyright): update copyright header for cmake directory (#3254) 2025-11-20 20:36:37 -05:00
lalala-sh
f58bd56e6b fix static assert (#3178)
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
2025-11-20 17:27:05 -08:00
yinglu
4155eb24f9 fix:bf16x3:enable all instances on gfx950 (#3248)
* fix:bf16x3:enable all instances on gfx950

* fix clang-format fail

* fix clang-format fail

* fix:modified wrong params previously
2025-11-20 17:09:43 -08:00
spolifroni-amd
938b8ed3bf Spolifroni amd/update changelog 711 (#3211)
* Update CHANGELOG.md with 7.1.1 information

* Update CHANGELOG.md
2025-11-20 10:51:18 -08:00
Yi DING
8b284a63a4 [CK_TILE] Refine FP32 => FP16/BF16 Conversion (#3215)
* [CK_TILE] Refine FP32 => FP16/BF16 Conversion

* Thank you Copilot

* Rename fix

* Fix example

* Fix accu checking

* Fix

* Fix
2025-11-20 10:50:26 -08:00
Gavin Zhao
07314ac543 Add support for RDNA1 GPUs (#3220)
* Allow compilation for RDNA1 (__gfx101__)

Signed-off-by: Gavin Zhao <git@gzgz.dev>

* More RDNA1 changes

Signed-off-by: Gavin Zhao <git@gzgz.dev>

* Even more RDNA1 changes

Signed-off-by: Gavin Zhao <git@gzgz.dev>

* cmake: skip build quantization for unsupported arches

* add gfx10-1-generic support as well

* add gfx1013 and complete gfx10-1-generic

* fix clang format

* enable DL kernels on gfx101x

---------

Signed-off-by: Gavin Zhao <git@gzgz.dev>
Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
2025-11-20 10:45:57 -08:00
Robin Voetter
bb155ef678 ck-builder: add remaining ck factory tests (#3223)
Now that the remaining reflection has been implemented, we
can add the remaining factory tests too. This is the complete set
of instances for forward grouped conv currently in CK.
2025-11-20 10:42:36 -08:00
Robin Voetter
245c6011cf ck-builder: group transfer operations per tensor (#3217)
Grouping transfer operations per tensor makes it easier to
constrain on and operate with the transfer operations. As an
example, we can now deduplicate the logic for translating
the transfer operations from the ck-builder interface to the old
ck interface for the A and B tensors.
2025-11-20 10:40:48 -08:00
Aviral Goel
fb43760c66 chore(copyright): update copyright header for library directory (#3239) 2025-11-20 10:36:05 -08:00
Aviral Goel
7dfc46d73d chore(copyright): update copyright header for test directory (#3243)
* chore(copyright): update copyright header for test directory

* chore(copyright): update copyright header for test directory
2025-11-20 10:33:34 -08:00
Emily Martins
2e4b8a8fc4 [CK_TILE] Remove Old CK Tile Stream-K Artifacts (#3202)
* Remove old CK Tile Stream-K implementation

The original CK Stream-K implementation was based on old CK's Stream-K
block to C tile map. However, this implementation did not align with the
original Stream-K paper. Thus, we implemented a new tile partitioner and
associated Stream-K kernel, which was placed in the reboot namespace.

Now that the new Stream-K implementation is ready, this change removes
all artifacts of the old implementation. Specifically, the following
changes were made:
- Removes old Stream-K tile partitioner from CK Tile
- Removes the reboot namespace such that the new implementation resides
  in the ck_tile namespace only.
- Adds tests for bf8 and fp8 using the new implementation
- Removes tests for the old implementation
- Remove the v2 suffix from the new CK Tile Tile Partitioner
derived classes.
- Updates Stream-K Kernel ops file to use /** commenting style.

* Remove v2 from tile partitioner validation function names
2025-11-20 09:32:32 -07:00
asleepzzz
5adaa201ed Revert "Add attn sink (#2892)" (#3250)
This reverts commit 9fa4e8d5ab.
2025-11-20 07:55:15 -08:00
Linjun-AMD
9fa4e8d5ab Add attn sink (#2892)
* enable attn sink

Signed-off-by: JL-underdog <Jun.Lin@amd.com>

* update attn_sink script

Signed-off-by: JL-underdog <Jun.Lin@amd.com>

* fix some error

Signed-off-by: JL-underdog <Jun.Lin@amd.com>

* clang-format

Signed-off-by: JL-underdog <Jun.Lin@amd.com>

* update fmha_bwd mask

Signed-off-by: JL-underdog <Jun.Lin@amd.com>

* update fmha_bwd_kernel'mask

Signed-off-by: JL-underdog <Jun.Lin@amd.com>

* update block_fmha_pipeline_qr_ks_vs.hpp

Signed-off-by: JL-underdog <Jun.Lin@amd.com>

* fix ci error

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* fix format error

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* Update block_fmha_bwd_pipeline_default_policy.hpp

* Update fmha_fwd_runner.hpp

* Update block_fmha_batch_prefill_pipeline_qr_ks_vs_async.hpp

* Update fmha_fwd_runner.hpp

* Update fmha_fwd_runner.hpp

* Update fmha_fwd_runner.hpp

* update splitkv_pipline

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* update splitkv&pagedkv pipeline

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* add sink test

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* update attn_sink result log

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* update smoke_test_fwd_sink.sh

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* update test file

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* update test script

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* Update block_fmha_fwd_splitkv_pipeline_qr_ks_vs.hpp

* use constexpr kHasSink for sink in fmha pipeline

Signed-off-by: Linjun-AMD <Jun.Lin@amd.com>

* update by pre-commit

Signed-off-by: Linjun-AMD <Jun.Lin@amd.com>

* Update include/ck_tile/ops/fmha/pipeline/block_fmha_pipeline_qr_ks_vs.hpp

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update include/ck_tile/ops/fmha/pipeline/block_fmha_pipeline_qr_ks_vs.hpp

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update include/ck_tile/ops/fmha/kernel/fmha_fwd_pagedkv_kernel.hpp

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update fmha_fwd.py

* Update example/ck_tile/01_fmha/codegen/ops/fmha_fwd_splitkv.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update include/ck_tile/ops/fmha/pipeline/block_fmha_fwd_splitkv_pipeline_nwarp_sshuffle_qr_ks_vs.hpp

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Remove causal mask setting logic from mask.hpp

Removed the mask setting logic for causal masks.

* fix ci error that some usage of lamada not support in c++17

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* Update remod.py

* add smoke sink test

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* Update fmha_pagedkv_prefill.py

* Update FmhaFwdPipeline parameters in fmha_fwd.py

* update block_fmha_pipeline_qr_ks_vs_async_trload.hpp

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* fix c++17 unsupprot error

Signed-off-by: LJ-underdog <Jun.Lin@amd.com>

* Update block_fmha_fwd_pagedkv_pipeline_qr_ks_vs.hpp

* Fix formatting of sink_seq_end assignment

* Fix indentation for sink_seq_end assignment

* Update block_fmha_fwd_pagedkv_pipeline_qr_ks_vs.hpp

---------

Signed-off-by: JL-underdog <Jun.Lin@amd.com>
Signed-off-by: LJ-underdog <Jun.Lin@amd.com>
Signed-off-by: Linjun-AMD <Jun.Lin@amd.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-11-20 19:24:05 +08:00
Illia Silin
84540edff3 fix typo (#3244) 2025-11-19 20:23:09 -08:00
Yi DING
47e2ed838e [CK_TILE] Add Flatmm MX FP8 (#3208)
* Use async for flatmm mxfp4

* Fix preshuffle

* Add flatmm mxfp8

* Thanks, Copilot

* Thanks Copilot again~
2025-11-20 10:35:15 +08:00
AviralGoelAMD
4e49e0228b chore(copyright): update copyright header for test directory 2025-11-19 17:43:28 -07:00
linqunAMD
d2e32b4305 [ck_tile] enable test grouped_gemm_quant and gemm_streamk on gfx12 (#3196)
1. Enable grouped_gemm_quant and gemm_streamk on gfx12
- test_ck_tile_streamk_smoke is kept on gfx9, since it looks someone is still working on it.
2. Update warp tile size in grouped_gemm_quant and gemm_streamk unit test
3. Reduce gemm tile size to pass the build on gfx12 in test_gemm_streamk_reboot_types.hpp
2025-11-20 08:40:27 +08:00
Michal Kulikowski
cd8af997e6 [CK] s_prefetch unit test fixes.
Signed-off-by: Michal Kulikowski <Michal.Kulikowski@amd.com>
2025-11-19 21:54:50 +01:00
Michal Kulikowski
f3ef7acca0 [CK] Added s_prefetch unit test.
-added s_buffer_load_b32/64 assembly
-added amd_s_buffer_load_impl

Signed-off-by: Michal Kulikowski <Michal.Kulikowski@amd.com>
2025-11-19 21:54:50 +01:00
kabrahamAMD
964f8e1f60 [CK_Builder ]fixed accidental drop of get_elementwise_operation during merge and added usage of get_elementwise_operation() to other builder instances (#3238)
Fixed issues encountered during merge of #3192

* fixed accidental drop of get_elementwise_operation during merge and added call to get_elementwise_op to 4 other builders

* run clang-format

---------

Co-authored-by: Kevin Abraham <kevin.abraham@streamhpc.com>
2025-11-19 12:31:05 -08:00
Max Podkorytov
e6e2e04edb [Inductor] Copy logic for ck-tile gemm instance configuration in Inductor max-autotune integration and test it (#2910)
* add op, gen_instances and test

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
2025-11-19 09:38:02 -08:00
Robin Voetter
7fe7aa76f5 [CK_BUILDER] fixes (#3222)
* ck-builder: some miscellaneous fixes
* ck-builder: fix InstanceSet.FromFactory test

The exact syntax that the instance string functionality
returns has changed. This commit updates the test to expect
the right string.
2025-11-19 09:05:25 -08:00
Aviral Goel
9837ba5af2 chore(copyright): update copyright header for tutorial directory (#3230)
* chore(copyright): update copyright header for tile_engine directory

* chore(copyright): update copyright header for script directory

* chore(copyright): update copyright header for test_data directory

* chore(copyright): update copyright header for python directory

* chore(copyright): update copyright header for profiler directory

* chore(copyright): update copyright header for library directory

* chore(copyright): update copyright header for include directory

* chore(copyright): update copyright header for docs directory

* chore(copyright): update copyright header for tutorial directory
2025-11-19 07:20:53 -08:00
Illia Silin
3e8e6f7e4f Refactor Jenkinsfile (#3229)
* allow using alternative compiler in all CI stages

* get rid of some redundancies in jenkinsfile

* clean up jenkinsfile a bit more

* further clean up jenkinsfile

* do not force user jenkins in ci dockers
2025-11-19 07:20:25 -08:00
Yashvardhan Agarwal
1eb26460aa [ck_tile] Pooling example - Improved tile sizes (#3233)
* improved tile sizes

- modified tile sizes for improved example performance

* Update example/ck_tile/36_pooling/pool3d.cpp

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

---------

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>
2025-11-19 15:30:18 +01:00
John Shumway
ad57f6ef0b [CK_BUILDER] Put global CK functions in an the CK namespace (#3232)
* Wrap ck host utitlies in CK namespace.

The CK and CK-Tile source code bases are incompatible because CK is not properly using namespaces everywhere. In particular, we need to put hip_check_error in the ck namespace.

Move all functions in include/ck_/host_utility that were in global namespace into the ck namespace.

There may be additional namespace problems like this, and it's possible we'll have namespace clashes. But it is good design to properly guard our to code bases (CK and CKTile) so that they can both coexist. Moreover, estabilishing this compatiblity is essential if we are going to allow the builder to instantiate  kernels from either template library.

* Add using declarations to test code.

After moving some of the untils into the ck namespace, most examples and a few tests had to be updated to recognize the new namespace declarations. We add using declarations to individual compute units for functions that were previously in the global namespace.

* Add using declarations to client examples.
2025-11-19 11:23:02 +01:00
Anton Gorenko
d7b3197869 [CK_TILE] FMHA Reduce register spilling in fwd with dropout (workaround for CI failures with clang-22) (#3221)
* Use vectorized stores for dropout randvals

With no kPadSeqLenK the kernel uses 2 buffer_store_dwordx2 instead of
16 buffer_store_byte. This requires less registers and reduces spilling.

* Calculate dropout randvals for storing and applying only once

Even though it may add a small overhead when storing is not required,
it uses significantly less registers and hence no spilling.
2025-11-19 10:40:12 +05:00
Aviral Goel
e91ee8578c chore(copyright): update copyright header for docs & include directory (#3226)
* chore(copyright): update copyright header for tile_engine directory

* chore(copyright): update copyright header for script directory

* chore(copyright): update copyright header for test_data directory

* chore(copyright): update copyright header for python directory

* chore(copyright): update copyright header for profiler directory

* chore(copyright): update copyright header for library directory

* chore(copyright): update copyright header for include directory

* chore(copyright): update copyright header for docs directory
2025-11-18 10:23:14 -08:00
Aviral Goel
f5ac3ee359 chore(copyright): update copyright header for include directory (#3224)
* chore(copyright): update copyright header for tile_engine directory

* chore(copyright): update copyright header for script directory

* chore(copyright): update copyright header for test_data directory

* chore(copyright): update copyright header for python directory

* chore(copyright): update copyright header for profiler directory

* chore(copyright): update copyright header for library directory

* chore(copyright): update copyright header for include directory
2025-11-18 10:17:18 -08:00
Max Podkorytov
a3a4eb12bd [CK-Tile] Remove usage of tile partitioner's full gemm shape (#3204)
gemm shape should be used from the pipeline instead (where it gets from a problem description struct)
2025-11-18 09:56:40 -08:00
Aviral Goel
ac70206b2c feat: add support for bf16 for grouped_gemm & grouped_gemm_preshuffle… (#3225)
* feat: add support for bf16 for grouped_gemm & grouped_gemm_preshuffle kernel(s) along with unit test

* docs: Update CHANGELOG.MD
2025-11-18 09:32:27 -05:00
Sami Remes
3ede8e2a6e [CK_TILE] Non-K Major from old CK to CK-Tile - fix reverted PR (#3199)
* Reapply "[CK_TILE] Non-K Major from old CK to CK-Tile (#2442)" (#3017)

This reverts commit e4298e55c7.

* WIP

* take Y2 as the AK1/BK1 value, that is the 'vector size' after shuffle

* use get_n_lds_banks()

* clang-format

---------

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>
2025-11-18 10:17:02 +02:00