Commit Graph

241 Commits

Author SHA1 Message Date
Illia Silin
90a2126d1f add tabulate package to aiter docker (#3519)
[ROCm/composable_kernel commit: 2ffbf7f476]
2026-01-06 09:36:54 -08:00
Bartłomiej Kocot
e26a264f70 Fix large tensor grouped conv bwd data test (#3513)
[ROCm/composable_kernel commit: bbf0b1a3b3]
2026-01-05 09:42:02 -08:00
Thrupti Raj Lakshmana Gowda
d65cd6d0fa [CK TILE ENGINE] CI configuration with basic cases (#3475)
* [CK TILE ENGINE] Adding GEMM BASIC TEST in Kenkins

* fix RUN_TILE_ENGINE_BASIC_TESTS name typo

* [CK Tile Engine] Updating basic CI

* Resolving merging issues

* Resolving merging issues

---------

Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>

[ROCm/composable_kernel commit: 62a8ec155f]
2025-12-24 10:45:56 -08:00
Bartłomiej Kocot
035f6acf3f Fix jenkinsfile for large tensor conv test (#3478)
[ROCm/composable_kernel commit: 44f1b5c5de]
2025-12-21 17:39:30 -08:00
Thrupti Raj Lakshmana Gowda
a571bf9e3a [TILE ENGINE] Restructure to Base class of GEMM (#3434)
[ROCm/composable_kernel commit: e22622f0ec]
2025-12-19 23:53:56 +08:00
andrew clark
4b5c3e24ef Supporting Custom Build Trace File Names (#3443)
* Removing hard-coded trace filename

* Including stage name in notification

* Simplifying capture setup and tagging file names with arch

* Removed test property from notification message

* Fixing regex to get arch name

* Fixing error in notification and modified regex

[ROCm/composable_kernel commit: e77a7ca2bc]
2025-12-18 12:15:33 -08:00
andrew clark
2de39368c2 Adding sscache stats monitoring (#3428)
* Adding additional sccache and redis logging to each build

* Removing custom workspace

* Removing script reference

* Logging complete sccache stats

* Ensuring monitor is stopped if build fails

* Including additional sccache logging

* Removing build duration log

* Fixing groovy syntax error

* Fixing syntax

* Modifying logging statements

* Fixing syntax

* Modifying logging

* Modifying logging

* Including additional logging

* Fixing logging message

* Logging build path

* Testing

* Testing workspace path logs

* Adding additonal logging to monitor

* Modifying comments

* Adding copyright info

* Cleaning unnecessary logs

* Removing build time logs

* Merge branch 'develop' into aick-457

[ROCm/composable_kernel commit: e67cd7edeb]
2025-12-17 09:15:27 -07:00
Illia Silin
f35e7b59cc Add build trace diagnostics to CI. (#3432)
* generate and visualize build traces for all archs

* generate build traces in all cases

* fix jenkins logic

* fix typo

* use more threads for parsing dependency map

* add script to parse ninja traces and issue warnings

* fix python script syntax and header

* fix python syntax one more time

* fix python syntax

[ROCm/composable_kernel commit: 3dfa794fab]
2025-12-16 08:22:52 -08:00
Johannes Graner
2fe4c8acec Add grouped convnd dataset tests for bwd_data, bwd_weight and make them parallel (#3380)
* Parallelization in dataset generation

* Parallelizable tests for fwd, bwd data, bwd weight with datasets

* .gitignore generated datasets

* Test parallelization script with round-robin GPU scheduling

* Parallelization updates to test generation and running

* Dataset paths relative to executable

* Update output from test generation

* Default to one GPU in test generation

* Add small dataset tests to Jenkins

* Update copyright lines

* Update test_data/generate_test_dataset.sh

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Move trap disable

* Common get path function

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

[ROCm/composable_kernel commit: fe35ba5dac]
2025-12-15 13:38:25 +01:00
Illia Silin
ee0d92f8fc use hipTensor from monorepo for daily builds (#3386)
[ROCm/composable_kernel commit: 934ba1208a]
2025-12-09 14:39:08 -08:00
Illia Silin
5f4c14b336 temporarily disable daily builds on gfx1010 and gfx908 (#3384)
[ROCm/composable_kernel commit: 0d8259affd]
2025-12-09 10:37:13 -08:00
Illia Silin
cdacf1d5f5 Upgrade to ROCm7.1.1 as default compiler. (#3370)
* upgrade to rocm7.1.1 as new default compiler

* fix jenkinsfile

[ROCm/composable_kernel commit: 7582c9e73f]
2025-12-09 07:35:32 -08:00
Illia Silin
38076077ab build latest hipblaslt in ck_pytorch docker (#3347)
[ROCm/composable_kernel commit: cd21e20ae7]
2025-12-04 06:58:42 -08:00
Cong Ma
fa1c7bc6ba Tile engine for streamk (#3157)
* [CK TILE STREAMK] Introduce initial support for tile engine in streamk GEMM.

- This commit lays the groundwork for integrating the tile engine into streamk GEMM.
  It focuses on creating benchmark executables for streamk GEMM.
- Additional scripts like test_benchmark.sh and gemm_benchmark.py will be added once
  the streamk implementation reaches stability.

* [CK TILE STREAMK] Enable CI to execute tile engine benchmarks for StreamK GEMM

* [CK TILE STREAMK] Refactor: Extract common utility functions.

* [CK TILE STREAMK] Revise tile engine of streamk to align with the updated implementation

* Add pre-commit

* [CK TILE STREAMK] Add 'dp_persistent' and 'reduction_strategy' in output of CK TILE STREAMK

* [CK TILE STREAMK] Fix a bug about value of 'dp_persistent' of CK TILE STREAMK

* [CK TILE STREAMK] Update Jenkinsfile

* [CK TILE Engine] Update StreamK tile engine help message

Remove default value messages as they are automatically printed

* [CK TILE Engine] Update StreamK tile engine

- Remove namespace reboot

* [CK TILE Engine] Update StreamK tile engine

- Fix merge error

[ROCm/composable_kernel commit: 30727c48fc]
2025-11-27 15:49:57 -07:00
andrew clark
d790a9f9de Automated Perfetto UI Notifications (#3255)
* Testing visualization generation

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

* Adding dummy test data

* Update Jenkinsfile

* Update Jenkinsfile

* Adding notifications

* Testing

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

* Image compression

* Update Jenkinsfile

* Moving capture logic to main Jenkins file

* Testing generation

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

* Fixing curl request

* Update Jenkinsfile

* Clean up

* Fix

* Fixing notification

* Testing message creation

* Adjusting message payload

* Testing notification generation

* Updating main jenkinsfile

* Fixing cleanup call

* Removing test pipeline code

* Comment clean up

* Testing pipeline

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

* Moving archive

Moving trace archive to safe location before source checkout

* Removing test pipeline

* Testing pipeline with unique file names

* Update Jenkinsfile

* Removing test files

Updated main pipeline

[ROCm/composable_kernel commit: 40d7217ac7]
2025-11-26 16:27:27 -07:00
Illia Silin
b80f571425 Enable ck_builder in CI. (#3296)
* build and run ck_builder tests

* add test_ckb_all to targets

* fix syntax

* fix test path

* Update CMake targets for builder testing in CI (#3290)

Our existing CMake only had build targets. Update CMakeLists.txt to have CTEST targets:
* smoke-builder
* regression-builder
* check-builder

Co-authored-by: John Shumway <jshumway@amd.com>

* use check-builder target

* get rid of test_ckb_all target

* call ninja check-builder separately

---------

Co-authored-by: John Shumway <jshumway@amd.com>

[ROCm/composable_kernel commit: a54f7b1138]
2025-11-25 17:45:59 -08:00
Illia Silin
a1651a1b10 disable CI on gfx1010 by default (#3280)
[ROCm/composable_kernel commit: e857e26bf6]
2025-11-24 07:06:41 -08:00
Illia Silin
d43b58b3cc Enable daily builds on gfx1010 (#3258)
* add build/test on gfx1010

* only build and run on gfx1010 once daily

[ROCm/composable_kernel commit: 21ae743acd]
2025-11-21 07:22:01 -08:00
Illia Silin
38605f5091 fix typo (#3244)
[ROCm/composable_kernel commit: 84540edff3]
2025-11-19 20:23:09 -08:00
Illia Silin
3c389eb3f1 Refactor Jenkinsfile (#3229)
* allow using alternative compiler in all CI stages

* get rid of some redundancies in jenkinsfile

* clean up jenkinsfile a bit more

* further clean up jenkinsfile

* do not force user jenkins in ci dockers

[ROCm/composable_kernel commit: 3e8e6f7e4f]
2025-11-19 07:20:25 -08:00
Illia Silin
bb4b6e5961 Initialize new variable to prevent c++17 compiler error (#3156)
* initialize new variable to prevent c++17 compiler error

* build for gfx90a using -std=c++17 flag

[ROCm/composable_kernel commit: 930423ab3b]
2025-11-04 18:54:14 -08:00
Thrupti Raj Lakshmana Gowda
27dc4d9833 [CK TILE ENGINE] GEMM Multi D Restructure (#3121)
* Renaming old code

* Adding GEMM code with new Architecture

* Partial Progress : Errors

* Partial Progress : Working code

* Changes to element wise function

* Removing Debugging statements

* Working GEMM Multi D code

* Removing Stale Code

* Address Copilot review comments

* Address Copilot review comments

* Changes to validation file

* Changes to common code snippets

* Creating common folder

* Removing duplicate files

* Pointing to right common file

* Pointing to right common file

* Pointing to right common file

* Changing to VERBOSE

* Changing CMAKE messages to verbose

* Updating Cmake with right layout datatype configs

* Working code for GEMM Multi D

[ROCm/composable_kernel commit: a33d98f8e2]
2025-10-31 12:02:46 -07:00
andrew clark
d2474f5396 Adding new alert failure patterns (#3122)
* Adding GPU not found pattern

Also, failurePatterns does not need to be global. Moved variable to live in the failure notifications function scope.

* Testing new failure type

* Testing failure

* Removing the forced failure test

* Adding an additional failure pattern

[ROCm/composable_kernel commit: 1977e4b96a]
2025-10-31 07:38:31 -07:00
Anton Gorenko
220bd7a9bb [CK_TILE] Support WMMA (gfx12) in FMHA (#2528)
* Pass hdim to tile_example_fmha_fwd in fp8 tests

* Add WMMA support to fwd FMHA pipelines

* Tune tile sizes a bit for less spilling

fp16 256 is still quite slow

* Fix Q grad tile distribution for warp size = 32 and hdim >= 256

With AccDataType = float and warp size = 32, K0 becomes 0, K repeat is required to correcty distribute the tile.

* Use code based on BlockDropout in BlockDropoutBwd

* Fix split KV combine kernel for gfx12 (warp size 32) and make it more universal

* Fix LSE LDS tensor descriptors: kMaxSplits and kM0 were swapped, it worked on gfx9
  because they both equal to 8 while on gfx12 they are 8 and 4;
* Fix Oacc LDS tensor descriptor: it was transposed even though its shape=[4 * kM0, kN1],
  it worked on gfx9 because 4 * kM == kN1 == 32;
* Removing these hidden dependecies allows to support:
    * any number of warps (power-of-2), not only 4;
    * kN1 = 16, not only 32;
    * any number of splits;

* Rename ids like o_acc_4 and Oacc4 to eliminate confusion: kNumWarps doesn't have to be 4 now

* Replace hard-coded kN1 in dispatch code with the requested tile size

* Add gfx12-specific tile sizes for split KV

* Pass GPU architecture to kernel generation scripts

This is still a temporary solution.

* Build and run FMHA CI tests for gfx12

* Fix issue after merging

* Fix bwd tile sizes

The current pipelines always read only one tile K and V tile, this
requires bk0 == bhdq and bk2 == bhdv (kK0 == kQKHeaddim and
kK2 == kVHeaddim).

* Use hardware f32->f8 on gfx12, remove v_perm

__builtin_amdgcn_perm is not needed because
__builtin_amdgcn_cvt_pk_fp8_f32 allows to specify which word (16 bit of
 32-bit dword) is used to store results (two f8 values).

* Update changelog

* Add WMMA support to pagedkv

* Fix scripts after rebasing

* Support 16x16 (MFMA, WMMA) and 32x32 (MFMA) tiles in fwd and bwd BlockDropout

Add comments with dropout implementation details

Fix performance regression of fwd+dropout

    * Remove some usage of type punning (reinterpret_cast with ref or ptr) in Philox;
    * "scalarize" seed and offset, they may come either from kernel args or from device memory
      (presumably loaded with vector loads).

    These changes help the compiler to procude more optimal code and reduce register spilling.

Use WarpGemmDispatcher instead of explicit WarpGemmMfma... to get  CWarpDstrEncoding

Use code based on BlockDropout in BlockDropoutBwd

Refactor BlockDropout (fwd)

Implement BlockDropout (fwd) for WMMA

    Originally BlockDropout only supported 32x32 tiles (IsWG32 = true),
    this version supports 16x16 tiles.
    If MPerBlock > MWarp * 16, it can generate numbers for two 16x16 tiles, similarly
    to BlockDropoutBwd.

Implement BlockDropoutBwd for WMMA

Remove MakeRandValLds* functions unused in BlockDropoutBwd

Remove unused Run overload from BlockDropoutBwd

* Fix regression with philox seed and offset when they exceed 32-bit int

__builtin_amdgcn_readfirstlane works with 32-bit values, seed and offset
are 64-bit so they get truncated.

* Fix names after cherry-picking

* Fix selection of a fallback tile based on bm0

The assumption that the largest bm0 == 128 is not always true for
current fp32 tiles.

* Do not use filters related to qr_async_trload

They disable tiles/pipelines which are valid for gfx12.

* Use different dstr encoding when C is transposed

* Do not call GetQKBlockGemm (and hence WarpGemmDispatcher) in host code

Some WarpGemmDispatcher instantiations are defined only
for specific archs and undefined on host.
Calculations related to sched barriers are moved from Pipeline's public
fields into pipeline's operator().

* Fix incorrect name WarpGemmMfmaFp8Fp8F32M32N32K16SwizzleBTransposedCDistribution

Correct name is WarpGemmMfmaFp8Fp8F32M32N32K32SwizzleBTransposedCDistribution
because it's 32x32x16 with IterateK = 2 so K = 32, also all tiles used
in codegen scripts are 32, 32, 32.

* Generalize usages of WarpGemmDispatcher for MFMA and WMMA

WarpGemmMfmaFp8Fp8F32M32N32K32SwizzleBTransposedCDistribution is still
used explicitly becaus of swizzle factor = 4.

* Mark has_load_tr as maybe_unused

There are no transpose loading for RDNA.

* Remove CK_TILE_USE_MFMA/WMMA from fmha-related code

* Detect BlockSize on host based on warp size of the current device

If kBlockSize == kNumWarps * get_warp_size(), the kernel is launched with
kBlockSize / 2 because on host get_warp_size() == 64 always.

* Fix calculation of grid size for combine kernel with warp size = 32

* Add missing includes and header

* Support multiple archs in one binary for fwd

* Support multiple archs in one binary for fwd_splitkv, fwd_appendkv, pagedkv_prefill

* Support multiple archs in one binary for bwd

* trload kernels are compiled only for gfx950;
* instances with padding are checked after instances without padding so
  they can be used as fallbacks (similarly to fwd);

* Extract common code from register_traits

* Revert "Fix regression with philox seed and offset when they exceed 32-bit int"

To simplify merging , the proper fix is in develop already.

* Support new numerical d paddings in trait ordering checks

* Build fp32 tests only on gfx9

* Do not use hardcoded M0 = 64 for dot bwd kernel

* Use textwrap.indent from standard library

* Make fp8 pipelines on gfx12 consistent with gfx9

* Update tests for current pipelines

* Make ninja check more responsive in CI

ninja buffers output so this job looks hanging.

* Support fp8fp32 by limiting O vector size

The fp32 output type requires storing 8 * sizeof(float) = 32 bytes,
which is not implemented (here 8 is the number of C values per lane for
v_wmma_f32_16x16x16...).

* Remove unused cmake options

* Unify including  amd_buffer_addressing.hpp/_builtins.hpp

* Temporarily use amd_buffer_addressing.hpp on >=gfx10

amd_buffer_addressing_builtins.hpp uses inline asm for loads/stores
which is not compatible with >=gfx10:
 * 1 scalar for exec masks instead of 2,
 * gfx12 uses different instruction names etc.

* Update asm in bf16 conversions to work with warp 32

* Do not generate splitkv/appendkv with vlayout=col for consistency with fwd

* Add arch tags to kernels/host funcs, compile for each arch separately

* Add kM0 to fmha_bwd_dot_do_o kernel name to match filename

* Add workaround for miscompilation of bwd with padded hdim

SWDEV-559729: v_wmma instructions can be incorrectly placed in divergent
branches used to store padded tensors (when some lanes are inactive due
to padding). Inline asm with dummy dependencies on VGPRs of the tensors
prevents the compiler doing this.

* Fix add_gtest_executable for absolute paths

Some tests (like gemm_tile_engine) pass absolute paths to source files.
In CI the branch name is a part of the root dir, and if the branch name
contains "wmma", "xdl" etc., files can be incorrectly excluded.

* Run only hdim 128 smoke tests for fp8fp32

There are no instances for hdim 64 and 256.

* Format py with ruff to simplify merging develop

* Fix incorrect var name

* Codegen for gfx9,gfx950 when --targets is not specified

Aiter and Pytorch require changes for passing their targets to the codegen scripts.
With this temporary solution the files are generated but not all of them
have to be really built (depending on the used --offload-arch=).

* Combine arch-related values into ArchTrait

This more centralized approach removes duplication of various formatting templates.

* Try a workaround for Jenkins error "groovyjarjarasm.asm.MethodTooLargeException: Method too large"

Some code is extracted into a function.

[ROCm/composable_kernel commit: 1e77695fe8]
2025-10-29 13:31:08 -07:00
andrew clark
df90bcbfd0 Added failure pattern check (#3111)
[ROCm/composable_kernel commit: aa22da07be]
2025-10-29 08:19:56 -06:00
Illia Silin
78d7289839 Add option to build ckProfiler packages for individual architectures. (#3105)
* refactor package generation, add dedicated switch

* allow building packages not only on gfx9

* enable last stage to post packages

* stash packages from different arch into separate stashes

* build packages daily automatically

[ROCm/composable_kernel commit: 155d63f4fe]
2025-10-28 09:48:11 -07:00
Thrupti Raj Lakshmana Gowda
c1c7bc9368 Ck tile engine gemm (#2982)
* Partial Progress : CK Tile Engine GEMM

* Partial Progress : CK Tile Engine GEMM

* Partial Progress : Working GEMM Code

* Partial Progress : Working GEMM Code

* Changinf jenkins to remove preshuffle

* Partial Progress : CK TILE ENGINE GEMM Debugging

* Partial Progress : Removing changes that are not GEMM

* Partial Progress : Validation of full block size in GEMM

* Changes in Jenkins to run only fp16 and bf16

* Addressing Review Comments

* Partial Progress : Addressing CI issues

* Partial Progress - Runing GEMM for fp16,bf16 and rcr

* Clang

* Adding fp8 and bf8

* Adding fp8 and bf8

* Adding additional architrcture

* Limited datatypes and layouts

* Adding k_block_per_cu in test config

* Changes to faling CI errors

* Changes to faling CI errors

* Validation for GEMM

* Adding Layout support

* Adding Validations

* Adding layout in jenkins

* Update on Jenkins

* Distribution validation for GEMM

* Resolving merge conflicts

* Solving merge conflicts

[ROCm/composable_kernel commit: 7fc0a38e90]
2025-10-27 21:11:13 -05:00
andrew clark
742af334f4 Jenkins Alerts Notifications (#3086)
* Testing minimal pipeline

* Update Jenkinsfile

* Testing webhook

* Testing webhook

* Testing webhook

* Testing build log output

* Testing log retrieval

* Testing

* Testing pattern matching

* Fixing regex

* Testing error detection

* Testing log formatting

Including additional context around log failure.

* Testing notification message format

* Update Jenkinsfile

* Notification formatting

* Testing secure interpolation

* Testing string interpolation

* Notification format

* Fixing markdown

* Testing markdown

* Testing markdown

* Revert "Testing markdown"

This reverts commit adeb6d2d55.

* Testing different markdown format

* Revert "Testing different markdown format"

This reverts commit bf5406a1cd.

* Testing markdown

* Testing markdown

* Testing markdown

* Testing markdown

* Testing markdown

* Testing notification

* Testing notification

* Testing notification

* Testing failure mode

* Testing failure mode

* Adding new patterns and tests

* Commenting

* Stage name fix

* Moving to notification on failure only

* Fixing notification format

* Testing env vars

* Testing build url redirect

* Testing no log errors

* Testing no errors case

* Integrating into primary jenkinsfile

* Updating notification message

Removed emoji from message

[ROCm/composable_kernel commit: a1ce64374f]
2025-10-27 08:24:36 -06:00
Thrupti Raj Lakshmana Gowda
78ba0358bd Ck tile engine preshuffle (#2919)
* Partial Progress : Preshuffle working code for datatype

* Partial Progress : Preshuffle Cleanup

* Working code for default config with min max step

* Partial Progress : PermuteN implemented in validation

* Partial Progress : PermuteN changes in Preshuffle

* CK Tile Engine Preshuffle Complete

* CK TILE ENGINE : Preshuffle Layout validation

* CK Tile Engine Preshuffle Validation

* Preshuffle Validation check

* CK Tile Engine Preshuffle : Fixing Validation Cases

* Addressing PR review Comments

* Changes in config

* Addressing Review Comments

* Adding additional architecture in Jenkins

* Partial Progress : Selective Datatype and layouts

* Limited datatypes and layouts

* Addressing CI errors

* Datatype updates

* Datatype updates

* Datatype changes to Preshuffle

* Addressing Review Comments

* Addressing Review Comments

* Datatype changes

* Changes to Cmake

* Update on Jenkins

* Formatting with precommit

* Ruff Formatting

[ROCm/composable_kernel commit: 8b185e872e]
2025-10-27 09:15:34 -05:00
andrew clark
07d67497ff Fixing Run CI Check for Changed Files (#3072)
* Fixing check for changed files

* Testing CI skip behavior

* Testing CI Trigger

This should skip CI

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: 775b96ea6a]
2025-10-24 07:52:43 -07:00
Illia Silin
17261b3fb8 disable aiter test gemm_a8w8_blockscale (#3049)
[ROCm/composable_kernel commit: d88ea05c84]
2025-10-17 19:52:22 -07:00
Illia Silin
57a340cc4c use branch develop to test hipTensor (#3034)
[ROCm/composable_kernel commit: 87d0a3ac17]
2025-10-15 15:40:34 -07:00
Illia Silin
fd12e33f27 re-enable clang-format by default (#3030)
* re-enable clang-format by default

* fix clang format

[ROCm/composable_kernel commit: 3348f01e6f]
2025-10-15 07:43:11 -07:00
andrew clark
433b969e7d CI Skip and Status Checks Fix (#2952)
* Update Jenkinsfile

Adding logic to skip CI checks when a commit contains changes to non-relevant files like docs, .md, licenses, and .github workflow files.

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

Testing skip env var

* Update Jenkinsfile

Fixing syntax

* Update Jenkinsfile

Simplifying CI check logic

* Update Jenkinsfile

Testing skipping logic on stages.

* Update Jenkinsfile

Removing post block. The status for skipped stages are already reported.

* Testing Docs

Testing modifications to files in the docs folder do not trigger a the build and test stages.

* Testing Multifile Trigger

Removed Jenkinsfile from the skip patterns. Reversed change to docs file. This test should not skip CI checks.

* Clean code

Renamed setup stage to be more descriptive.
Added pipeline env variable for consistency.
Moved performance test results stage conditional up a level so the parent stage appropriate reports the status if it is skipped.

* Fixing syntax error

* Updated CRON Flags

Added the FORCE_CI flag to the CRON instructions. This will ensure CI does not skip the job.

* Updating logging

Making logs more explicit.

* Comment update

Cleaning comments.

* Update Jenkinsfile

Reverting performance reports when condition.

* Parallel Test

Testing stage status with parallel stages

* Update Jenkinsfile

* Update Jenkinsfile

Removing stages for quick testing

* Update Jenkinsfile

* Testing skipped parallel stages

Testing the addition of a coordination stage to always pass and give an update to skipped parent stages with parallel sub-stages.

* Testing parallel stages

Adding coordination stage to test if parent check status is  correctly updated.

* Simplified performance results stage

Removed parent stage as there are no other parallel stages to execute (yet).

* Testing final clean up stage

* Testing check status update

Testing - forcing status to update after a stage skip.

* Testing results stage skip

* Removing test stage

* Testing pipeline

* Testing post status updates

* Process Test Results Post Event Update

The stage will report success when it skips or is successful.

* Testing non-relevant file change

This should skip build and test in CI

* Reverting test

updating regex file patterns to use strings instead of regex literal syntax.

* Fixing file matching regex

* Testing docs modification

* Fixing default env var value

* Correcting env var assignment

* Pipeline test

Updating docs file. Should skip ci.

* Testing Pipeline

Setting default run ci state.

* Adding debugging

* Removing debugging

* Pipeline test

Should skip pipeline

* Pipeline Test

Mixed files to trigger a CI run

* Adding additional status updates

The parent stage sometimes remains in pending even if the child stage completes when skipped. Added an additional status update for the parent stage.

* Fixing variable name

* Moving stage names

Moved the performance stage names to a single location because they are referenced multiple times. This reduces errors with typos in the future.

* Revert "Moving stage names"

This reverts commit 7cf6743e54.

* Update Jenkinsfile

Handle both truly empty arrays and arrays containing only empty strings.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

[ROCm/composable_kernel commit: 0a4c45b4d3]
2025-10-08 15:48:08 -06:00
Illia Silin
390a427be6 increase time limit for AITER tests (#2948)
[ROCm/composable_kernel commit: 35e116f5c0]
2025-09-29 13:11:42 -07:00
John Afaganis
867351e019 Use git ls-files to select candidate files for clang format
This change ensures that the files being selected for clang format validation are exactly the ones tracked by the git repo we are testing.  This protects against an known issue where the repo being tested contained "stray files" from a previous test.

[ROCm/composable_kernel commit: e8842e3c1f]
2025-09-27 15:47:31 -06:00
Illia Silin
e94b2f02ac Enable CI on gfx1100 (#2930)
* run CI on different versions of gfx11

* do not use gfx1151 systems

[ROCm/composable_kernel commit: ec4d16b991]
2025-09-25 16:10:54 -07:00
Illia Silin
768e496178 use default docker for build/test on gfx950 (#2928)
[ROCm/composable_kernel commit: 8c1a959913]
2025-09-25 10:40:45 -07:00
Illia Silin
ff8105704b Add AITER test_mha_varlen (#2927)
* add aiter test_mha_varlen

* don't fail until all aiter test run

* use the original way to run tests, just add new test

[ROCm/composable_kernel commit: 64e61b8647]
2025-09-25 10:00:20 -07:00
Jobbins
62d78c7bba [Jenkins] Remove 'Jenkins - ' prefix (#2920)
The prefix is causing the status updates from
gitStatusWrapper to be unique to the status updates that
are created by the Jenkins server, which creates duplicates

[ROCm/composable_kernel commit: 929291741d]
2025-09-25 09:08:29 -06:00
Illia Silin
c143f0305c Upgrade to ROCm7.0.1 compiler. (#2909)
* upgrade default docker to rocm7.0.1

* turn on build and test on gfx950 by default

* use rocm-dev instead of rocm

* link libhiprtc for codegen targets

* resolving codegen compilation errors: removed calls to other std functions, resolved issues with int32_t: needed the correct header, put use of e8m0 into header guards

---------

Co-authored-by: Astha Rai <astha.rai713@gmail.com>

[ROCm/composable_kernel commit: 8fe3838c65]
2025-09-24 10:00:53 -07:00
pmaybank
3644370ec1 [CK_TILE] Add support for gfx12 in tile_engine for GEMM benchmarking (#2802)
* initial work on adding support of gfx12 in tile_engine for GEMM benchmarking
* add stage("Run TILE_ENGINE_GEMM Tests on gfx1201") to Jenkins config
* make tile_[m/n/k] validation arch dependent

[ROCm/composable_kernel commit: 592d73ad73]
2025-09-17 17:59:01 +01:00
Illia Silin
14d52a943c Enable FMHA and AITER tests on gfx950. (#2812)
* enable aiter and fmha test stages on gfx950

* use newer compiler for gfx950

* make sure gfx950 runs correct docker

* fix typo

* upgrade base docker for aiter

* change base docker for aiter tests

* do not add group render to ck_aiter image

* add group irc in ck_aiter docker

* do not fix the irc group id to 39

* do not set jenkins uid and gid

* skip group irc for aiter tests

* fix syntax error in dockerfile

* change the base docker for aiter tests

* add irc group back to ck_aiter docker

[ROCm/composable_kernel commit: b9d69d32a8]
2025-09-12 12:20:32 -07:00
Thrupti Raj Lakshmana Gowda
fcc5e31bff [CK TILE ENGINE] Adding GEMM Preshuffle to CK Tile Engine (#2712)
* Partial Progress : Completed ListBlob

* Additional changes in Listbob

* Partial Progress : Generate Blobs Completed

* Partial Progress : Added Host side code for Preshuffle

* Working code for Preshuffle before Cleanup

* Partial Progress : Cleanup

* Partial Progress : Datatype Validation

* Partial Progress : Warptiles for preshuffle changed from hardcoding to take from config

* Partial Progress : Cleanup

* Partial Progress : Code Cleanup

* Partial Progress : Passing all valid tiles failing for unsupported tiles

* Partial Progress : Working code, testing pending for edge cases

* Partial Progress for testing

* Completed Code

* kBlockPerCu as tunable parameter from config

* Update tile_engine/ops/gemm_preshuffle/README.md

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Update tile_engine/ops/gemm_preshuffle/README.md

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Update tile_engine/ops/gemm_preshuffle/README.md

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Update tile_engine/ops/gemm_preshuffle/README.md

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Update tile_engine/ops/gemm_preshuffle/README.md

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Update tile_engine/ops/gemm_preshuffle/README.md

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Update tile_engine/ops/gemm_preshuffle/README.md

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Update tile_engine/ops/gemm_preshuffle/README.md

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Update tile_engine/ops/gemm_preshuffle/README.md

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Update tile_engine/ops/gemm_preshuffle/README.md

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Update tile_engine/ops/gemm_preshuffle/README.md

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Update tile_engine/ops/gemm_preshuffle/README.md

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Update tile_engine/ops/gemm_preshuffle/README.md

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Update tile_engine/ops/gemm_preshuffle/README.md

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Partial Progress : Working listkernels

* Partial Progress : Cleanup Working listkernels

* Partial Progress : Single instance

* Partial Progress : Working single instance code

* Partial Progress : Working generate individual instance code

* Partial Progress : Working rewamped code for given config file needed validation and edge case testing

* Partial Progress : Working Code, testing pending

* Removing LOGS file

* Working code

* Minor changes to GEMM Preshuffle : Restructured

* Minor Changes in Preshuffle

* Changes to Jenkins File

* Changes to Jenkins file to consider new architecture

* Changes to Jenkins file for fixing CI

---------

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

[ROCm/composable_kernel commit: f6ba94fb5c]
2025-09-12 11:50:19 -07:00
Illia Silin
913846d28d build and test on gfx942 by default (#2830)
[ROCm/composable_kernel commit: bca99a499d]
2025-09-11 14:02:21 -07:00
Anton Gorenko
2ec8ebff31 [CK_TILE] Add gtests for FMHA (#2744)
* Improve random number generation

* use different seed for each input (Q, K, V...);
* use deterministic generation of:
  * seqstart_q/k (for group mode);
  * block_table (for paged-kvcahe);
  * cache_batch_idx (for kvcache);

* Extract arg_parser-related code from run functions to use them as tests

* Split examples into main programs and fmha runners, build instances separately

* Add dummy tests that use instances and runners

* Fix a missed corner case of f32->f8 conversion

When value if < min f8 denormal but > min f8 denormal / 2, it must be
rounded to min f8 denormal (i.e. 0b1), not to 0.

* Fix incorrect fp8 scales for P and O in validation code

DataTypeConfig was incorrectly compared with fp8_t.

* Add host generation of dropout random values and use it for validation

Previously host validation (reference_batched_dropout) used random
numbers generated by BlockDropout of the kernel, meaning that incorrect
generation on device (bad distribution, repeated numbers, too many zeros,
etc.) would not trigger any validation errors.

* Implement tests from smoke_test_bwd.sh

* Return result as enum to distinguish failure and missing instance

* Add tests for bwd features: bias, alibi, dropout

* Implement tests from smoke_test_fwd.sh

* Pass seqlen_q/k as vectors to fwd and bwd runners

* Add tests for fwd features: bias, alibi, dropout

* Add tests for pagedkv and splitkv

* Fix conditions when to use splitkv and pagedkv kernels

splitkv was executed only when use_kvcache which == (need_append_kvcache || use_cache_batch_idx || 0 < page_block_size).
In the SplitKV tests: the regular fwd kernel was executed if use_cache_batch_idx was not requested even when num_splitkv > 1.
In the AppendKV tests: the pagedkv kernel was executed but it often failed to find an instance.

* Add tests for appendkv

* Use is_v_rowmajor = true because there are no instances with column layout anymore

* Split public and private compile options for instances

Tests and examples need to know only about CK_TILE_FMHA_FWD_*_API.

* Improve parsing validation in bias and mask

* Pass bias as string for consistency with mask

* Catch parsing and other exceptions

* Add bwd test for deterministic flag

* Initialize fp8 tensors (-init=ufq) similarly to uf

* Fix splitkv/pagedkv invocation: use padded sk when seqlen_k_ptr is not null

seqlen_k cannot be used to determine padding when seqlen_k_ptr is
provided. The actual seqlen_k is taken from seqlen_k_ptr[b].
Even seqlen_k values (% bn0 == 0) use padded seqlen_k while seqlen_k_ptr
may contain arbitrary values.
In the example or tests this produces incorrect results with appendkv
(for example, -d=32 -s=1 -s_k=64 -s_knew=7 -vlayout=c -b=8).

* Fix use_pagedkv value when kvcache = true but page_block_size = 0

In this case block_table_ptr is nullptr which is accessed in the kernel.

* Clean up bwd tests

* Unify fwd tests for f16/bf16 and fp8

* Use better explicit instantiation declaration for fmha_bwd<2>

* Use the same seed for all tests, allow to override it with env variable

* Undo clang-format of one irrelevant file

For some reason my local clang-format-18 and the one in CI work differently.

* Do not build instances and tests on unsupported archs

* Build instance libraries as OBJECT library

* CI: Enable sccache for HIP

There are source files with LANGUAGE HIP, they need
-DCMAKE_HIP_COMPILER_LAUNCHER=sccache

* Add tests to REGRESSION_TESTS

* Fix OOB accesses in deterministic bwd due to incorrectly assumed kN0

The runner assumes kN0 = (hdim_q <= 128) ? 128 : 64 but there are
smaller tiles (for tr_load or fp32). This can create too small dq_acc_buf.

* Pass CK_TILE_FMHA_FWD_*_API as INTERFACE compile options

The instances don't actually depend on them, only examples and tests do.
Passing these definitions as INTERFACE allows to change FMHA_FWD_ENABLE_APIS
without recompiling instances that are already in ccache.

* Fix formatting and names

[ROCm/composable_kernel commit: ec006bb8e0]
2025-09-10 08:06:14 +05:00
Vidyasagar Ananthan
cff1d9a5e2 Fixing path for tile engine tests. (#2794)
[ROCm/composable_kernel commit: 5224d2ead3]
2025-09-05 17:34:48 -07:00
Illia Silin
e2e8e0b61c set number of cpu threads in CI to min(nproc,64) (#2793)
[ROCm/composable_kernel commit: 4e4a784d53]
2025-09-05 17:26:13 -07:00
Vidyasagar Ananthan
2a8d24efd4 Fixing tile engine tests after recent refactoring. (#2791)
* Fixing tile engine tests after recent refactoring.

* Fixing line break error.

[ROCm/composable_kernel commit: 60ea94f4fe]
2025-09-05 14:57:59 -07:00
Illia Silin
c217c0fa93 Fix latest AITER failure and add more AITER tests in CK CI. (#2782)
* add aiter tests and move json_dump header

* remove example/include path from cmake

* extend time for aiter and pytorch stages

[ROCm/composable_kernel commit: ef6c28e989]
2025-09-04 13:44:00 -07:00