composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-11 17:00:18 +00:00

Author	SHA1	Message	Date
Illia Silin	8705fdcb0c	add aiter test_batch_prefill and simplify jenkins file a bit (#3570 )	2026-01-14 14:07:47 -08:00
Thrupti Raj Lakshmana Gowda	51027474af	[CK TILE ENGINE] CI fix for Basic Tile Engine (#3554 ) * memory op changes * memory op changes * Fixing TILE_ENGINE_BASIC in Tile Engine * Removing gfx90a from Tile Engine Run * [CK TILE ENGINE] increasing ci configs for BASIC case * Setting RUN_TILE_ENGINE_BASIC_TESTS to ON by default --------- Co-authored-by: Max Podkorytov <4273004+tenpercent@users.noreply.github.com>	2026-01-13 16:20:30 -08:00
Thomas Ning	00c46785a8	Shuffle fix for gfx950 (#3491 ) * solve compiler issue * solve the gfx950 mfma shuffle regression * refactor jenkinsfile to handle arch name better * [CK TILE] set divisor to count of thread along k dimension * fix the compiler error * solve degradation * Finish the multiplies fix * fix the scales * solve compilation error * solve the composes * solve the error of tile sweeper * fix the test and example * fix for gfx950 --------- Co-authored-by: Max Podkorytov <4273004+tenpercent@users.noreply.github.com> Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com> Co-authored-by: Cong Ma <congma13@amd.com>	2026-01-13 09:21:29 -08:00
Illia Silin	2ffbf7f476	add tabulate package to aiter docker (#3519 )	2026-01-06 09:36:54 -08:00
Bartłomiej Kocot	bbf0b1a3b3	Fix large tensor grouped conv bwd data test (#3513 )	2026-01-05 09:42:02 -08:00
Thrupti Raj Lakshmana Gowda	62a8ec155f	[CK TILE ENGINE] CI configuration with basic cases (#3475 ) * [CK TILE ENGINE] Adding GEMM BASIC TEST in Kenkins * fix RUN_TILE_ENGINE_BASIC_TESTS name typo * [CK Tile Engine] Updating basic CI * Resolving merging issues * Resolving merging issues --------- Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>	2025-12-24 10:45:56 -08:00
Bartłomiej Kocot	44f1b5c5de	Fix jenkinsfile for large tensor conv test (#3478 )	2025-12-21 17:39:30 -08:00
Thrupti Raj Lakshmana Gowda	e22622f0ec	[TILE ENGINE] Restructure to Base class of GEMM (#3434 )	2025-12-19 23:53:56 +08:00
andrew clark	e77a7ca2bc	Supporting Custom Build Trace File Names (#3443 ) * Removing hard-coded trace filename * Including stage name in notification * Simplifying capture setup and tagging file names with arch * Removed test property from notification message * Fixing regex to get arch name * Fixing error in notification and modified regex	2025-12-18 12:15:33 -08:00
andrew clark	e67cd7edeb	Adding sscache stats monitoring (#3428 ) * Adding additional sccache and redis logging to each build * Removing custom workspace * Removing script reference * Logging complete sccache stats * Ensuring monitor is stopped if build fails * Including additional sccache logging * Removing build duration log * Fixing groovy syntax error * Fixing syntax * Modifying logging statements * Fixing syntax * Modifying logging * Modifying logging * Including additional logging * Fixing logging message * Logging build path * Testing * Testing workspace path logs * Adding additonal logging to monitor * Modifying comments * Adding copyright info * Cleaning unnecessary logs * Removing build time logs * Merge branch 'develop' into aick-457	2025-12-17 09:15:27 -07:00
Illia Silin	3dfa794fab	Add build trace diagnostics to CI. (#3432 ) * generate and visualize build traces for all archs * generate build traces in all cases * fix jenkins logic * fix typo * use more threads for parsing dependency map * add script to parse ninja traces and issue warnings * fix python script syntax and header * fix python syntax one more time * fix python syntax	2025-12-16 08:22:52 -08:00
Johannes Graner	fe35ba5dac	Add grouped convnd dataset tests for bwd_data, bwd_weight and make them parallel (#3380 ) * Parallelization in dataset generation * Parallelizable tests for fwd, bwd data, bwd weight with datasets * .gitignore generated datasets * Test parallelization script with round-robin GPU scheduling * Parallelization updates to test generation and running * Dataset paths relative to executable * Update output from test generation * Default to one GPU in test generation * Add small dataset tests to Jenkins * Update copyright lines * Update test_data/generate_test_dataset.sh Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Move trap disable * Common get path function --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-12-15 13:38:25 +01:00
Illia Silin	934ba1208a	use hipTensor from monorepo for daily builds (#3386 )	2025-12-09 14:39:08 -08:00
Illia Silin	0d8259affd	temporarily disable daily builds on gfx1010 and gfx908 (#3384 )	2025-12-09 10:37:13 -08:00
Illia Silin	7582c9e73f	Upgrade to ROCm7.1.1 as default compiler. (#3370 ) * upgrade to rocm7.1.1 as new default compiler * fix jenkinsfile	2025-12-09 07:35:32 -08:00
Illia Silin	cd21e20ae7	build latest hipblaslt in ck_pytorch docker (#3347 )	2025-12-04 06:58:42 -08:00
Cong Ma	30727c48fc	Tile engine for streamk (#3157 ) * [CK TILE STREAMK] Introduce initial support for tile engine in streamk GEMM. - This commit lays the groundwork for integrating the tile engine into streamk GEMM. It focuses on creating benchmark executables for streamk GEMM. - Additional scripts like test_benchmark.sh and gemm_benchmark.py will be added once the streamk implementation reaches stability. * [CK TILE STREAMK] Enable CI to execute tile engine benchmarks for StreamK GEMM * [CK TILE STREAMK] Refactor: Extract common utility functions. * [CK TILE STREAMK] Revise tile engine of streamk to align with the updated implementation * Add pre-commit * [CK TILE STREAMK] Add 'dp_persistent' and 'reduction_strategy' in output of CK TILE STREAMK * [CK TILE STREAMK] Fix a bug about value of 'dp_persistent' of CK TILE STREAMK * [CK TILE STREAMK] Update Jenkinsfile * [CK TILE Engine] Update StreamK tile engine help message Remove default value messages as they are automatically printed * [CK TILE Engine] Update StreamK tile engine - Remove namespace reboot * [CK TILE Engine] Update StreamK tile engine - Fix merge error	2025-11-27 15:49:57 -07:00
andrew clark	40d7217ac7	Automated Perfetto UI Notifications (#3255 ) * Testing visualization generation * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile * Adding dummy test data * Update Jenkinsfile * Update Jenkinsfile * Adding notifications * Testing * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile * Image compression * Update Jenkinsfile * Moving capture logic to main Jenkins file * Testing generation * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile * Fixing curl request * Update Jenkinsfile * Clean up * Fix * Fixing notification * Testing message creation * Adjusting message payload * Testing notification generation * Updating main jenkinsfile * Fixing cleanup call * Removing test pipeline code * Comment clean up * Testing pipeline * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile * Moving archive Moving trace archive to safe location before source checkout * Removing test pipeline * Testing pipeline with unique file names * Update Jenkinsfile * Removing test files Updated main pipeline	2025-11-26 16:27:27 -07:00
Illia Silin	a54f7b1138	Enable ck_builder in CI. (#3296 ) * build and run ck_builder tests * add test_ckb_all to targets * fix syntax * fix test path * Update CMake targets for builder testing in CI (#3290) Our existing CMake only had build targets. Update CMakeLists.txt to have CTEST targets: * smoke-builder * regression-builder * check-builder Co-authored-by: John Shumway <jshumway@amd.com> * use check-builder target * get rid of test_ckb_all target * call ninja check-builder separately --------- Co-authored-by: John Shumway <jshumway@amd.com>	2025-11-25 17:45:59 -08:00
Illia Silin	e857e26bf6	disable CI on gfx1010 by default (#3280 )	2025-11-24 07:06:41 -08:00
Illia Silin	21ae743acd	Enable daily builds on gfx1010 (#3258 ) * add build/test on gfx1010 * only build and run on gfx1010 once daily	2025-11-21 07:22:01 -08:00
Illia Silin	84540edff3	fix typo (#3244 )	2025-11-19 20:23:09 -08:00
Illia Silin	3e8e6f7e4f	Refactor Jenkinsfile (#3229 ) * allow using alternative compiler in all CI stages * get rid of some redundancies in jenkinsfile * clean up jenkinsfile a bit more * further clean up jenkinsfile * do not force user jenkins in ci dockers	2025-11-19 07:20:25 -08:00
Illia Silin	930423ab3b	Initialize new variable to prevent c++17 compiler error (#3156 ) * initialize new variable to prevent c++17 compiler error * build for gfx90a using -std=c++17 flag	2025-11-04 18:54:14 -08:00
Thrupti Raj Lakshmana Gowda	a33d98f8e2	[CK TILE ENGINE] GEMM Multi D Restructure (#3121 ) * Renaming old code * Adding GEMM code with new Architecture * Partial Progress : Errors * Partial Progress : Working code * Changes to element wise function * Removing Debugging statements * Working GEMM Multi D code * Removing Stale Code * Address Copilot review comments * Address Copilot review comments * Changes to validation file * Changes to common code snippets * Creating common folder * Removing duplicate files * Pointing to right common file * Pointing to right common file * Pointing to right common file * Changing to VERBOSE * Changing CMAKE messages to verbose * Updating Cmake with right layout datatype configs * Working code for GEMM Multi D	2025-10-31 12:02:46 -07:00
andrew clark	1977e4b96a	Adding new alert failure patterns (#3122 ) * Adding GPU not found pattern Also, failurePatterns does not need to be global. Moved variable to live in the failure notifications function scope. * Testing new failure type * Testing failure * Removing the forced failure test * Adding an additional failure pattern	2025-10-31 07:38:31 -07:00
Anton Gorenko	1e77695fe8	[CK_TILE] Support WMMA (gfx12) in FMHA (#2528 ) * Pass hdim to tile_example_fmha_fwd in fp8 tests * Add WMMA support to fwd FMHA pipelines * Tune tile sizes a bit for less spilling fp16 256 is still quite slow * Fix Q grad tile distribution for warp size = 32 and hdim >= 256 With AccDataType = float and warp size = 32, K0 becomes 0, K repeat is required to correcty distribute the tile. * Use code based on BlockDropout in BlockDropoutBwd * Fix split KV combine kernel for gfx12 (warp size 32) and make it more universal * Fix LSE LDS tensor descriptors: kMaxSplits and kM0 were swapped, it worked on gfx9 because they both equal to 8 while on gfx12 they are 8 and 4; * Fix Oacc LDS tensor descriptor: it was transposed even though its shape=[4 * kM0, kN1], it worked on gfx9 because 4 * kM == kN1 == 32; * Removing these hidden dependecies allows to support: * any number of warps (power-of-2), not only 4; * kN1 = 16, not only 32; * any number of splits; * Rename ids like o_acc_4 and Oacc4 to eliminate confusion: kNumWarps doesn't have to be 4 now * Replace hard-coded kN1 in dispatch code with the requested tile size * Add gfx12-specific tile sizes for split KV * Pass GPU architecture to kernel generation scripts This is still a temporary solution. * Build and run FMHA CI tests for gfx12 * Fix issue after merging * Fix bwd tile sizes The current pipelines always read only one tile K and V tile, this requires bk0 == bhdq and bk2 == bhdv (kK0 == kQKHeaddim and kK2 == kVHeaddim). * Use hardware f32->f8 on gfx12, remove v_perm __builtin_amdgcn_perm is not needed because __builtin_amdgcn_cvt_pk_fp8_f32 allows to specify which word (16 bit of 32-bit dword) is used to store results (two f8 values). * Update changelog * Add WMMA support to pagedkv * Fix scripts after rebasing * Support 16x16 (MFMA, WMMA) and 32x32 (MFMA) tiles in fwd and bwd BlockDropout Add comments with dropout implementation details Fix performance regression of fwd+dropout * Remove some usage of type punning (reinterpret_cast with ref or ptr) in Philox; * "scalarize" seed and offset, they may come either from kernel args or from device memory (presumably loaded with vector loads). These changes help the compiler to procude more optimal code and reduce register spilling. Use WarpGemmDispatcher instead of explicit WarpGemmMfma... to get CWarpDstrEncoding Use code based on BlockDropout in BlockDropoutBwd Refactor BlockDropout (fwd) Implement BlockDropout (fwd) for WMMA Originally BlockDropout only supported 32x32 tiles (IsWG32 = true), this version supports 16x16 tiles. If MPerBlock > MWarp * 16, it can generate numbers for two 16x16 tiles, similarly to BlockDropoutBwd. Implement BlockDropoutBwd for WMMA Remove MakeRandValLds* functions unused in BlockDropoutBwd Remove unused Run overload from BlockDropoutBwd * Fix regression with philox seed and offset when they exceed 32-bit int __builtin_amdgcn_readfirstlane works with 32-bit values, seed and offset are 64-bit so they get truncated. * Fix names after cherry-picking * Fix selection of a fallback tile based on bm0 The assumption that the largest bm0 == 128 is not always true for current fp32 tiles. * Do not use filters related to qr_async_trload They disable tiles/pipelines which are valid for gfx12. * Use different dstr encoding when C is transposed * Do not call GetQKBlockGemm (and hence WarpGemmDispatcher) in host code Some WarpGemmDispatcher instantiations are defined only for specific archs and undefined on host. Calculations related to sched barriers are moved from Pipeline's public fields into pipeline's operator(). * Fix incorrect name WarpGemmMfmaFp8Fp8F32M32N32K16SwizzleBTransposedCDistribution Correct name is WarpGemmMfmaFp8Fp8F32M32N32K32SwizzleBTransposedCDistribution because it's 32x32x16 with IterateK = 2 so K = 32, also all tiles used in codegen scripts are 32, 32, 32. * Generalize usages of WarpGemmDispatcher for MFMA and WMMA WarpGemmMfmaFp8Fp8F32M32N32K32SwizzleBTransposedCDistribution is still used explicitly becaus of swizzle factor = 4. * Mark has_load_tr as maybe_unused There are no transpose loading for RDNA. * Remove CK_TILE_USE_MFMA/WMMA from fmha-related code * Detect BlockSize on host based on warp size of the current device If kBlockSize == kNumWarps * get_warp_size(), the kernel is launched with kBlockSize / 2 because on host get_warp_size() == 64 always. * Fix calculation of grid size for combine kernel with warp size = 32 * Add missing includes and header * Support multiple archs in one binary for fwd * Support multiple archs in one binary for fwd_splitkv, fwd_appendkv, pagedkv_prefill * Support multiple archs in one binary for bwd * trload kernels are compiled only for gfx950; * instances with padding are checked after instances without padding so they can be used as fallbacks (similarly to fwd); * Extract common code from register_traits * Revert "Fix regression with philox seed and offset when they exceed 32-bit int" To simplify merging , the proper fix is in develop already. * Support new numerical d paddings in trait ordering checks * Build fp32 tests only on gfx9 * Do not use hardcoded M0 = 64 for dot bwd kernel * Use textwrap.indent from standard library * Make fp8 pipelines on gfx12 consistent with gfx9 * Update tests for current pipelines * Make ninja check more responsive in CI ninja buffers output so this job looks hanging. * Support fp8fp32 by limiting O vector size The fp32 output type requires storing 8 * sizeof(float) = 32 bytes, which is not implemented (here 8 is the number of C values per lane for v_wmma_f32_16x16x16...). * Remove unused cmake options * Unify including amd_buffer_addressing.hpp/_builtins.hpp * Temporarily use amd_buffer_addressing.hpp on >=gfx10 amd_buffer_addressing_builtins.hpp uses inline asm for loads/stores which is not compatible with >=gfx10: * 1 scalar for exec masks instead of 2, * gfx12 uses different instruction names etc. * Update asm in bf16 conversions to work with warp 32 * Do not generate splitkv/appendkv with vlayout=col for consistency with fwd * Add arch tags to kernels/host funcs, compile for each arch separately * Add kM0 to fmha_bwd_dot_do_o kernel name to match filename * Add workaround for miscompilation of bwd with padded hdim SWDEV-559729: v_wmma instructions can be incorrectly placed in divergent branches used to store padded tensors (when some lanes are inactive due to padding). Inline asm with dummy dependencies on VGPRs of the tensors prevents the compiler doing this. * Fix add_gtest_executable for absolute paths Some tests (like gemm_tile_engine) pass absolute paths to source files. In CI the branch name is a part of the root dir, and if the branch name contains "wmma", "xdl" etc., files can be incorrectly excluded. * Run only hdim 128 smoke tests for fp8fp32 There are no instances for hdim 64 and 256. * Format py with ruff to simplify merging develop * Fix incorrect var name * Codegen for gfx9,gfx950 when --targets is not specified Aiter and Pytorch require changes for passing their targets to the codegen scripts. With this temporary solution the files are generated but not all of them have to be really built (depending on the used --offload-arch=). * Combine arch-related values into ArchTrait This more centralized approach removes duplication of various formatting templates. * Try a workaround for Jenkins error "groovyjarjarasm.asm.MethodTooLargeException: Method too large" Some code is extracted into a function.	2025-10-29 13:31:08 -07:00
andrew clark	aa22da07be	Added failure pattern check (#3111 )	2025-10-29 08:19:56 -06:00
Illia Silin	155d63f4fe	Add option to build ckProfiler packages for individual architectures. (#3105 ) * refactor package generation, add dedicated switch * allow building packages not only on gfx9 * enable last stage to post packages * stash packages from different arch into separate stashes * build packages daily automatically	2025-10-28 09:48:11 -07:00
Thrupti Raj Lakshmana Gowda	7fc0a38e90	Ck tile engine gemm (#2982 ) * Partial Progress : CK Tile Engine GEMM * Partial Progress : CK Tile Engine GEMM * Partial Progress : Working GEMM Code * Partial Progress : Working GEMM Code * Changinf jenkins to remove preshuffle * Partial Progress : CK TILE ENGINE GEMM Debugging * Partial Progress : Removing changes that are not GEMM * Partial Progress : Validation of full block size in GEMM * Changes in Jenkins to run only fp16 and bf16 * Addressing Review Comments * Partial Progress : Addressing CI issues * Partial Progress - Runing GEMM for fp16,bf16 and rcr * Clang * Adding fp8 and bf8 * Adding fp8 and bf8 * Adding additional architrcture * Limited datatypes and layouts * Adding k_block_per_cu in test config * Changes to faling CI errors * Changes to faling CI errors * Validation for GEMM * Adding Layout support * Adding Validations * Adding layout in jenkins * Update on Jenkins * Distribution validation for GEMM * Resolving merge conflicts * Solving merge conflicts	2025-10-27 21:11:13 -05:00
andrew clark	a1ce64374f	Jenkins Alerts Notifications (#3086 ) * Testing minimal pipeline * Update Jenkinsfile * Testing webhook * Testing webhook * Testing webhook * Testing build log output * Testing log retrieval * Testing * Testing pattern matching * Fixing regex * Testing error detection * Testing log formatting Including additional context around log failure. * Testing notification message format * Update Jenkinsfile * Notification formatting * Testing secure interpolation * Testing string interpolation * Notification format * Fixing markdown * Testing markdown * Testing markdown * Revert "Testing markdown" This reverts commit `adeb6d2d55`. * Testing different markdown format * Revert "Testing different markdown format" This reverts commit `bf5406a1cd`. * Testing markdown * Testing markdown * Testing markdown * Testing markdown * Testing markdown * Testing notification * Testing notification * Testing notification * Testing failure mode * Testing failure mode * Adding new patterns and tests * Commenting * Stage name fix * Moving to notification on failure only * Fixing notification format * Testing env vars * Testing build url redirect * Testing no log errors * Testing no errors case * Integrating into primary jenkinsfile * Updating notification message Removed emoji from message	2025-10-27 08:24:36 -06:00
Thrupti Raj Lakshmana Gowda	8b185e872e	Ck tile engine preshuffle (#2919 ) * Partial Progress : Preshuffle working code for datatype * Partial Progress : Preshuffle Cleanup * Working code for default config with min max step * Partial Progress : PermuteN implemented in validation * Partial Progress : PermuteN changes in Preshuffle * CK Tile Engine Preshuffle Complete * CK TILE ENGINE : Preshuffle Layout validation * CK Tile Engine Preshuffle Validation * Preshuffle Validation check * CK Tile Engine Preshuffle : Fixing Validation Cases * Addressing PR review Comments * Changes in config * Addressing Review Comments * Adding additional architecture in Jenkins * Partial Progress : Selective Datatype and layouts * Limited datatypes and layouts * Addressing CI errors * Datatype updates * Datatype updates * Datatype changes to Preshuffle * Addressing Review Comments * Addressing Review Comments * Datatype changes * Changes to Cmake * Update on Jenkins * Formatting with precommit * Ruff Formatting	2025-10-27 09:15:34 -05:00
andrew clark	775b96ea6a	Fixing Run CI Check for Changed Files (#3072 ) * Fixing check for changed files * Testing CI skip behavior * Testing CI Trigger This should skip CI --------- Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>	2025-10-24 07:52:43 -07:00
Illia Silin	d88ea05c84	disable aiter test gemm_a8w8_blockscale (#3049 )	2025-10-17 19:52:22 -07:00
Illia Silin	87d0a3ac17	use branch develop to test hipTensor (#3034 )	2025-10-15 15:40:34 -07:00
Illia Silin	3348f01e6f	re-enable clang-format by default (#3030 ) * re-enable clang-format by default * fix clang format	2025-10-15 07:43:11 -07:00
andrew clark	0a4c45b4d3	CI Skip and Status Checks Fix (#2952 ) * Update Jenkinsfile Adding logic to skip CI checks when a commit contains changes to non-relevant files like docs, .md, licenses, and .github workflow files. * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile Testing skip env var * Update Jenkinsfile Fixing syntax * Update Jenkinsfile Simplifying CI check logic * Update Jenkinsfile Testing skipping logic on stages. * Update Jenkinsfile Removing post block. The status for skipped stages are already reported. * Testing Docs Testing modifications to files in the docs folder do not trigger a the build and test stages. * Testing Multifile Trigger Removed Jenkinsfile from the skip patterns. Reversed change to docs file. This test should not skip CI checks. * Clean code Renamed setup stage to be more descriptive. Added pipeline env variable for consistency. Moved performance test results stage conditional up a level so the parent stage appropriate reports the status if it is skipped. * Fixing syntax error * Updated CRON Flags Added the FORCE_CI flag to the CRON instructions. This will ensure CI does not skip the job. * Updating logging Making logs more explicit. * Comment update Cleaning comments. * Update Jenkinsfile Reverting performance reports when condition. * Parallel Test Testing stage status with parallel stages * Update Jenkinsfile * Update Jenkinsfile Removing stages for quick testing * Update Jenkinsfile * Testing skipped parallel stages Testing the addition of a coordination stage to always pass and give an update to skipped parent stages with parallel sub-stages. * Testing parallel stages Adding coordination stage to test if parent check status is correctly updated. * Simplified performance results stage Removed parent stage as there are no other parallel stages to execute (yet). * Testing final clean up stage * Testing check status update Testing - forcing status to update after a stage skip. * Testing results stage skip * Removing test stage * Testing pipeline * Testing post status updates * Process Test Results Post Event Update The stage will report success when it skips or is successful. * Testing non-relevant file change This should skip build and test in CI * Reverting test updating regex file patterns to use strings instead of regex literal syntax. * Fixing file matching regex * Testing docs modification * Fixing default env var value * Correcting env var assignment * Pipeline test Updating docs file. Should skip ci. * Testing Pipeline Setting default run ci state. * Adding debugging * Removing debugging * Pipeline test Should skip pipeline * Pipeline Test Mixed files to trigger a CI run * Adding additional status updates The parent stage sometimes remains in pending even if the child stage completes when skipped. Added an additional status update for the parent stage. * Fixing variable name * Moving stage names Moved the performance stage names to a single location because they are referenced multiple times. This reduces errors with typos in the future. * Revert "Moving stage names" This reverts commit `7cf6743e54`. * Update Jenkinsfile Handle both truly empty arrays and arrays containing only empty strings. Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-10-08 15:48:08 -06:00
Illia Silin	35e116f5c0	increase time limit for AITER tests (#2948 )	2025-09-29 13:11:42 -07:00
John Afaganis	e8842e3c1f	Use git ls-files to select candidate files for clang format This change ensures that the files being selected for clang format validation are exactly the ones tracked by the git repo we are testing. This protects against an known issue where the repo being tested contained "stray files" from a previous test.	2025-09-27 15:47:31 -06:00
Illia Silin	ec4d16b991	Enable CI on gfx1100 (#2930 ) * run CI on different versions of gfx11 * do not use gfx1151 systems	2025-09-25 16:10:54 -07:00
Illia Silin	8c1a959913	use default docker for build/test on gfx950 (#2928 )	2025-09-25 10:40:45 -07:00
Illia Silin	64e61b8647	Add AITER test_mha_varlen (#2927 ) * add aiter test_mha_varlen * don't fail until all aiter test run * use the original way to run tests, just add new test	2025-09-25 10:00:20 -07:00
Jobbins	929291741d	[Jenkins] Remove 'Jenkins - ' prefix (#2920 ) The prefix is causing the status updates from gitStatusWrapper to be unique to the status updates that are created by the Jenkins server, which creates duplicates	2025-09-25 09:08:29 -06:00
Illia Silin	8fe3838c65	Upgrade to ROCm7.0.1 compiler. (#2909 ) * upgrade default docker to rocm7.0.1 * turn on build and test on gfx950 by default * use rocm-dev instead of rocm * link libhiprtc for codegen targets * resolving codegen compilation errors: removed calls to other std functions, resolved issues with int32_t: needed the correct header, put use of e8m0 into header guards --------- Co-authored-by: Astha Rai <astha.rai713@gmail.com>	2025-09-24 10:00:53 -07:00
pmaybank	592d73ad73	[CK_TILE] Add support for gfx12 in tile_engine for GEMM benchmarking (#2802 ) * initial work on adding support of gfx12 in tile_engine for GEMM benchmarking * add stage("Run TILE_ENGINE_GEMM Tests on gfx1201") to Jenkins config * make tile_[m/n/k] validation arch dependent	2025-09-17 17:59:01 +01:00
Illia Silin	b9d69d32a8	Enable FMHA and AITER tests on gfx950. (#2812 ) * enable aiter and fmha test stages on gfx950 * use newer compiler for gfx950 * make sure gfx950 runs correct docker * fix typo * upgrade base docker for aiter * change base docker for aiter tests * do not add group render to ck_aiter image * add group irc in ck_aiter docker * do not fix the irc group id to 39 * do not set jenkins uid and gid * skip group irc for aiter tests * fix syntax error in dockerfile * change the base docker for aiter tests * add irc group back to ck_aiter docker	2025-09-12 12:20:32 -07:00
Thrupti Raj Lakshmana Gowda	f6ba94fb5c	[CK TILE ENGINE] Adding GEMM Preshuffle to CK Tile Engine (#2712 ) * Partial Progress : Completed ListBlob * Additional changes in Listbob * Partial Progress : Generate Blobs Completed * Partial Progress : Added Host side code for Preshuffle * Working code for Preshuffle before Cleanup * Partial Progress : Cleanup * Partial Progress : Datatype Validation * Partial Progress : Warptiles for preshuffle changed from hardcoding to take from config * Partial Progress : Cleanup * Partial Progress : Code Cleanup * Partial Progress : Passing all valid tiles failing for unsupported tiles * Partial Progress : Working code, testing pending for edge cases * Partial Progress for testing * Completed Code * kBlockPerCu as tunable parameter from config * Update tile_engine/ops/gemm_preshuffle/README.md Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Update tile_engine/ops/gemm_preshuffle/README.md Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Update tile_engine/ops/gemm_preshuffle/README.md Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Update tile_engine/ops/gemm_preshuffle/README.md Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Update tile_engine/ops/gemm_preshuffle/README.md Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Update tile_engine/ops/gemm_preshuffle/README.md Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Update tile_engine/ops/gemm_preshuffle/README.md Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Update tile_engine/ops/gemm_preshuffle/README.md Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Update tile_engine/ops/gemm_preshuffle/README.md Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Update tile_engine/ops/gemm_preshuffle/README.md Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Update tile_engine/ops/gemm_preshuffle/README.md Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Update tile_engine/ops/gemm_preshuffle/README.md Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Update tile_engine/ops/gemm_preshuffle/README.md Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Update tile_engine/ops/gemm_preshuffle/README.md Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Partial Progress : Working listkernels * Partial Progress : Cleanup Working listkernels * Partial Progress : Single instance * Partial Progress : Working single instance code * Partial Progress : Working generate individual instance code * Partial Progress : Working rewamped code for given config file needed validation and edge case testing * Partial Progress : Working Code, testing pending * Removing LOGS file * Working code * Minor changes to GEMM Preshuffle : Restructured * Minor Changes in Preshuffle * Changes to Jenkins File * Changes to Jenkins file to consider new architecture * Changes to Jenkins file for fixing CI --------- Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>	2025-09-12 11:50:19 -07:00
Illia Silin	bca99a499d	build and test on gfx942 by default (#2830 )	2025-09-11 14:02:21 -07:00
Anton Gorenko	ec006bb8e0	[CK_TILE] Add gtests for FMHA (#2744 ) * Improve random number generation * use different seed for each input (Q, K, V...); * use deterministic generation of: * seqstart_q/k (for group mode); * block_table (for paged-kvcahe); * cache_batch_idx (for kvcache); * Extract arg_parser-related code from run functions to use them as tests * Split examples into main programs and fmha runners, build instances separately * Add dummy tests that use instances and runners * Fix a missed corner case of f32->f8 conversion When value if < min f8 denormal but > min f8 denormal / 2, it must be rounded to min f8 denormal (i.e. 0b1), not to 0. * Fix incorrect fp8 scales for P and O in validation code DataTypeConfig was incorrectly compared with fp8_t. * Add host generation of dropout random values and use it for validation Previously host validation (reference_batched_dropout) used random numbers generated by BlockDropout of the kernel, meaning that incorrect generation on device (bad distribution, repeated numbers, too many zeros, etc.) would not trigger any validation errors. * Implement tests from smoke_test_bwd.sh * Return result as enum to distinguish failure and missing instance * Add tests for bwd features: bias, alibi, dropout * Implement tests from smoke_test_fwd.sh * Pass seqlen_q/k as vectors to fwd and bwd runners * Add tests for fwd features: bias, alibi, dropout * Add tests for pagedkv and splitkv * Fix conditions when to use splitkv and pagedkv kernels splitkv was executed only when use_kvcache which == (need_append_kvcache \|\| use_cache_batch_idx \|\| 0 < page_block_size). In the SplitKV tests: the regular fwd kernel was executed if use_cache_batch_idx was not requested even when num_splitkv > 1. In the AppendKV tests: the pagedkv kernel was executed but it often failed to find an instance. * Add tests for appendkv * Use is_v_rowmajor = true because there are no instances with column layout anymore * Split public and private compile options for instances Tests and examples need to know only about CK_TILE_FMHA_FWD__API. Improve parsing validation in bias and mask * Pass bias as string for consistency with mask * Catch parsing and other exceptions * Add bwd test for deterministic flag * Initialize fp8 tensors (-init=ufq) similarly to uf * Fix splitkv/pagedkv invocation: use padded sk when seqlen_k_ptr is not null seqlen_k cannot be used to determine padding when seqlen_k_ptr is provided. The actual seqlen_k is taken from seqlen_k_ptr[b]. Even seqlen_k values (% bn0 == 0) use padded seqlen_k while seqlen_k_ptr may contain arbitrary values. In the example or tests this produces incorrect results with appendkv (for example, -d=32 -s=1 -s_k=64 -s_knew=7 -vlayout=c -b=8). * Fix use_pagedkv value when kvcache = true but page_block_size = 0 In this case block_table_ptr is nullptr which is accessed in the kernel. * Clean up bwd tests * Unify fwd tests for f16/bf16 and fp8 * Use better explicit instantiation declaration for fmha_bwd<2> * Use the same seed for all tests, allow to override it with env variable * Undo clang-format of one irrelevant file For some reason my local clang-format-18 and the one in CI work differently. * Do not build instances and tests on unsupported archs * Build instance libraries as OBJECT library * CI: Enable sccache for HIP There are source files with LANGUAGE HIP, they need -DCMAKE_HIP_COMPILER_LAUNCHER=sccache * Add tests to REGRESSION_TESTS * Fix OOB accesses in deterministic bwd due to incorrectly assumed kN0 The runner assumes kN0 = (hdim_q <= 128) ? 128 : 64 but there are smaller tiles (for tr_load or fp32). This can create too small dq_acc_buf. * Pass CK_TILE_FMHA_FWD__API as INTERFACE compile options The instances don't actually depend on them, only examples and tests do. Passing these definitions as INTERFACE allows to change FMHA_FWD_ENABLE_APIS without recompiling instances that are already in ccache. Fix formatting and names	2025-09-10 08:06:14 +05:00
Vidyasagar Ananthan	5224d2ead3	Fixing path for tile engine tests. (#2794 )	2025-09-05 17:34:48 -07:00

1 2 3 4 5

244 Commits