composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-03 21:21:22 +00:00

Author	SHA1	Message	Date
yinglu	d460ab35b6	[rocm-libraries] ROCm/rocm-libraries#4302 (commit e62bd8a) [CK_TILE] add tf32 support MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## Proposed changes TF32 is added in CK on gfx942 and gfx950. This PR is to initiate tf32 in CK_TILE on gfx942 and gfx950. ## Checklist Please put an into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask. - [ ] I have added tests relevant to the introduced functionality, and the unit tests are passing locally - [ ] I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run. - [ ] I have added inline documentation which enables the maintainers with understanding the motivation - [ ] I have removed the stale documentation which is no longer relevant after this pull request - [ ] (If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request - [x] I have run on all changed files - [ ] Any dependent changes have been merged ## Discussion	2026-03-19 09:19:06 +00:00
Enrico Degregori	eb033ef208	[rocm-libraries] ROCm/rocm-libraries#4964 (commit 3271d9a) [CK Tile] Eight Waves pipeline GEMM ## Motivation Eight waves pipeline was added for ABQuant. The goal of this PR is to enable it also for GEMM ## Technical Details Summary: - Block: - Create block struct for GEMM using eight warps specific distribution encodings - Use this block struct in ABQuant for encodings - Pipeline: - Create impl pipeline for eight waves which can be used by GEMM and ABQuant as base (and for AQuant and BQuant in the future) - Create eight waves pipeline for GEMM (this can not be easily integrated in the existing async pipeline) - Pipeline policy: - Extract GEMM specific parts in the ABQuant policy to define GEMM policy (then ABQuant use it as base and add Quant specific methods) - Minor: naming was inconsistent between warp/wave, everything is now referred to as eight waves So overall we have: - block struct directly used by GEMM -> ABQuant derived struct to implement operator - Impl base pipeline with general implementation -> GEMM and ABQuant pipelines use it to avoid code duplication but still define their own pipelines - pipeline policy struct directly used by GEMM -> ABQuant derived policy struct for Quant specific parts ## Test Plan Added new tests for GEMM pipeline: `test_ck_tile_gemm_pipeline_comp_async_eight_waves` (only gfx950 supports it). Note: K padding test is disabled for this pipeline because it's not implemented yet ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.	2026-03-16 08:31:56 +00:00
Aviral Goel	f00ec5afd9	[rocm-libraries] ROCm/rocm-libraries#4301 (commit 0821c9f) test: Add umbrella test targets for CK Tile operations (#4301) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## Proposed changes Adds operation-specific umbrella test targets for CK Tile to enable running all tests for a specific operation without running the entire test suite. This improves the development workflow by allowing faster iteration when working on specific operations. ## Motivation Previously, developers working on CK Tile operations could only: - Run individual test executables one at a time - Run global labels (, , ) which test the entire codebase - Build all tests for an operation but had no simple way to run them all This made it cumbersome to validate changes to a specific operation (e.g., GEMM quantization) without either running tests individually or running the entire test suite. ### Documentation - - Comprehensive testing guide with usage examples and implementation details ## Usage Examples # Run all GEMM tests with 256 parallel jobs ninja -j256 ck_tile_gemm_tests # Run all GEMM block scale (quantization) tests ninja -j256 ck_tile_gemm_block_scale_tests # Run all GEMM StreamK tests ninja -j256 ck_tile_gemm_streamk_tests ## Checklist Please put an into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask. - [x] I have added tests relevant to the introduced functionality, and the unit tests are passing locally - [x] I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run. - [x] I have added inline documentation which enables the maintainers with understanding the motivation - [x] I have removed the stale documentation which is no longer relevant after this pull request - [ ] (If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request - [x] I have run on all changed files - [x] Any dependent changes have been merged ## Discussion If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered	2026-03-03 15:40:50 +00:00
Aviral Goel	004784ef98	chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 ) * chore(copyright) update library wide CMakeLists.txt files copyright header template * Fix build --------- Co-authored-by: Sami Remes <samremes@amd.com>	2025-11-28 13:49:54 -08:00
SamiAario-AMD	f2cfc6b94e	Remove "basic" and universal GEMM tests, and incorporate their test cases into the GEMM pipeline tests (#3094 ) * Add missing copyright statements * Use ck_tile::host_tensor_descriptor instead of a custom lambda * Refactor use of check_data_type in test classes * Use TEST_SUITE_NAME with TYPED_TEST_SUITE * Remove an unused namespace * Make dim3 const * Add BF8 x BF8 tests for CompV3 in test_gemm_pipeline_kernel_types.hpp * Add F8 x BF8 tests for CompV3 in test_gemm_pipeline_kernel_types.hpp * Add BF16 x I4 tests for CompV3 in test_gemm_pipeline_kernel_types.hpp * Add BF16 x BF16 tests for CompV3 in test_gemm_pipeline_kernel_types.hpp * Add BF8 x I4 tests for CompV3 in test_gemm_pipeline_kernel_types.hpp * Add F8 x I4 tests for CompV3 in test_gemm_pipeline_kernel_types.hpp * Add F16 x I4 tests for CompV3 in test_gemm_pipeline_kernel_types.hpp * Skip failing tests of F16 x I4 for CompV3 with K == 2 * K_Tile * Add missing precision type combinations to CompV4 from CompV3 * Move the INT8 tests around for consistency with KernelTypesCompV3Wmma * Add missing precision type combinations to CompV3Wmma from CompV3 * Remove the basic and universal tests and their dependencies * On __gfx950__, avoid using transposed loading of A with datatype pk_int4_t of B * Use ADataType and BDataType instead of ComputeDataType for WarpGemm * Explicitly set some return types to void * Use more general typenames in InterleavedPKTypeLoader * Add load_interleaved_pk_type.hpp to common.hpp * Use std::is_same_v in load_int4_tile * Add handling of LoadTranspose to load_int4_tile * Factor out common code in several places using load_int4_tile * Add support for pk_int4_t using load_int4_tile * Fix formatting	2025-11-13 11:01:27 -08:00
Manish Kumar	d5746dd120	[CK-Tile] Add gtests for compiler CI for faster testing (#3123 ) * Add gtests for compiler CI for faster testing * Add changes to have a custom target * Add a gtest suite for gemm kernel for running CI tests with compiler mode * Fix Clang error (EOL) * Removed compiler subfolder from CMake * Add gtest suite for gemm kernel * Disable failed tests * Fix build errors * Resolved PR comments * Update shape for persistent gemm kernel test * Seperated types by H/W archs * Made changes to persistent types * Fix persistent build failure issue --------- Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>	2025-11-10 10:42:23 -08:00
SamiAario-AMD	254bce9346	Lwpck 3550: Implement and test fixed precision fp8 x bf8 (#2963 ) * HasHotLoop is a constexpr * Remove an unused function * Remove some unused include statements * Add implementation and tests for fp8 x bf8 weight preshuffle GEMM * Add implementation and tests for fp8 x bf8 in CK Tile basic and universal GEMMs * Remove two barrier calls that HotLoopScheduler already calls * No need to suppress a variable that hasn't been declared * Replace six arg_parser arguments with constexpr literals * Simplify run_gemm_test_prec_type * The strides don't need to be passed via arg_parser as we use their default values * The layouts don't need to be passed as arguments twice * Pass M N and K as regular arguments, not using the argument parser * We can now remove the argument parser * Add a common file for precision types to be used in testing * Convert basic and universal GEMM tests to use gtest * Make GemmConfig a test parameter, and form test cases as the cartesian product GemmConfigs x PrecTypes * Add GemmConfigComputeV4 to the GEMM configs to run the universal tests on * Added a changelog entry * Add missing copyright statements * ifndef-define-endif is not needed with pragma once * Fix a comment * Add F8 x BF8 tests for CompV4 in test_gemm_pipeline_kernel_types.hpp * Disable the unreliable test MoeSortingCase4 --------- Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>	2025-10-30 13:36:10 +01:00
aledudek	634634f5c0	[CK_TILE] Blockwise GEMM pipeline v6 - port of v5 from old CK (#2955 ) * First checkpoint * Second checkpoint - hot loop scheduler * Third checkpoint - init main operator * Fourth checkpoint - main loop ready * Fifth checkpoint - main loop fix * Sixth checkpoint - ReadWritecompFunc * Seventh checkpoint - Tail finished * [CK_TILE] Blockwise gemm pipeline v5 complete * Working * Working fixes 2 * Rename v5 to v77 temporarily * Data type adjustment * Data type adjustment 2 * [CK_TILE] Blockwise Gemm pipeline v5 add tests * [CK_TILE] Fix calculation error * TEMP: check pipeline * Fix name to V6 * naming and documentation changes * WIP dump * Try fixing v1 * Failing tests v5 * Debugging * Changes v2 * F16 tests working great * Working BlockwiseGemmPipelineV5 as V6 * Cleanup and format * Merging changes part1 * [CK_TILE] Blockwise Gemm Pipeline Comp V5/V6 * Remove commented code * Fix gfx950 build issues * Fix file formatting * Review changes, more concat info, add bf16 bf8 tests * Fix formatting * Add bf16 and bf8 tests --------- Co-authored-by: Adam Osewski <Adam.Osewski@amd.com>	2025-10-13 13:57:37 +02:00
Max Podkorytov	a7da3c68b9	Add a new gemm pipeline based on ComputeV4 which utilizes async copy API (#2949 ) * check in pipeline and policy for async load in mi350, need to make sure TileAccessPattern is warp_raked or block_raked solve merge conflicts * fix cmakelists * make it build * fix? buffer async fence * relax fences; it appears it only is needed between pairs of ping-pongs * remove fences * remove fences * cleanup and reformat * add steps annotations * comment all pipeline steps / remove unexplainable syncs * clang-format * add comment * cleanup kernel types for test * fix comment * fix hardcoded warp size * faithfully copy block gemm from compute v4 policy to async policy * make async test gfx950 only * fix cmake logic * set separate compile options for async * refine comment in policy * try update hotloop scheduler * cleanup comments * test more K block sizes * unhardcode Ks, sort of * add large odd test case * fix build for quant * add comment to hot loop scheduler and rename enum * reformat * reword the pipeline description * reformat * address review / add static asserts / typo fix * update changelog	2025-10-01 15:38:07 -07:00
linqunAMD	b0ee317d83	[CK_TILE] Enable ck_tile tests on gfx11 and gfx12 (#2821 ) * [CK_TILE] Enable ck_tile test on gfx11 & gfx12 * revert an unnecessary change * enable pk_int4 on gfx11 & gfx12 * revert .pre-commit-config.yaml	2025-09-12 12:45:14 -07:00
msaffari-amd	b951416cdb	Ck tile gemm low prec data types int4 int8 unit tests (#2718 ) * add gemm unit tests for int4, int8 datatypes * minor changes based on reviews --------- Co-authored-by: msaffari-amd <msaffari@banff-cyxtera-s78-2.ctr.dcgpu>	2025-08-28 10:47:16 +02:00
Tianyuan Wu	68134b60e4	[CK_TILE] CK_TILE GEMM WMMA Support for GFX11/GFX12 (#2466 ) * WMMA GEMM F16 Implementation Signed-off-by: root <tianyuwu@amd.com> * Self-review Signed-off-by: root <tianyuwu@amd.com> * ASIC check minor tweak Signed-off-by: root <tianyuwu@amd.com> * add missing include file * Set GPU_TARGETS to gfx11/12 generic Signed-off-by: root <tianyuwu@amd.com> * INT8 GFX12 Signed-off-by: root <tianyuwu@amd.com> * add int8x16 branch * Fix CI script Signed-off-by: root <tianyuwu@amd.com> * Fix typo Signed-off-by: root <tianyuwu@amd.com> * Add CK_Tile WMMA example Signed-off-by: Tianyuan Wu <tianyuwu@amd.com> * Fix CI Signed-off-by: Tianyuan Wu <tianyuwu@amd.com> * fix clang format * Set M/N_Warp Back to Constant Signed-off-by: Tianyuan Wu <tianyuwu@amd.com> * Use GemmConfigComputeV3 by default Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com> * Enable CK_TILE_USE_AMD_BUFFER_ATOMIC_ADD_FLOAT for gfx12 Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com> * Remove CK_Tile wmma gemm examples from the CI list Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com> * Add atomic add fallback method for gfx11 Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com> * Fix typo Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com> * Omit copyright year Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com> * Support non-square cases Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com> * Fix CI Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com> * Add get_device_ip() Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com> * Revert "Add atomic add fallback method for gfx11" This reverts commit `07a79e797d`. Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com> * Revert "Enable CK_TILE_USE_AMD_BUFFER_ATOMIC_ADD_FLOAT for gfx12" This reverts commit `ceee918007`. * Revise method name and typos Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com> * clang-format Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com> * Try fix CI Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com> * Revert "Try fix CI" This reverts commit `7a7241085e`. * clang-format Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com> * Fix typo caused by merge Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com> * Fix typo caused by merging Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com> --------- Signed-off-by: root <tianyuwu@amd.com> Signed-off-by: Tianyuan Wu <tianyuwu@amd.com> Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com> Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com> Co-authored-by: joye <joye@amd.com> Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>	2025-08-15 16:22:27 -07:00
Cong Ma	f102eedfb3	[CK_TILE] Migrate CK Tile examples to Tests to autorun on CI (#2421 ) [CK_TILE] Add new ck tile unit test * Add new ck tile unit test smoke-gemm-universal * Add new ck tile unit test smoke-gemm-basic * Add new ck tile unit test topk_softmax * Add new ck tile unit test add_rmsnorm2d_rdquant_fwd	2025-07-22 08:15:18 -06:00
Thomas Ning	df6023e305	fix the mi350 error (#2378 )	2025-06-20 12:50:13 -07:00
Aviral Goel	aed0f5880c	Label CMakeLists message() as DEBUG or STATUS for clean build output (#2301 ) * - elevate important build messages to log level STATUS - comment out the rest (temporarily) * - marked all low importance build messages as log_level=DEBUG	2025-06-10 10:46:47 -07:00
Sami Remes	1c6f83df6c	[CK_TILE] Tileloop persistent gemm - resubmit (#2299 ) * Reapply "[CK_TILE] Tile loop persistent gemm kernel (#2191)" (#2293) This reverts commit `233e274077`. * Add missing header for kentry --------- Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>	2025-06-06 14:18:49 -07:00
Illia Silin	233e274077	Revert "[CK_TILE] Tile loop persistent gemm kernel (#2191 )" (#2293 ) This reverts commit `ffb52783d0`.	2025-06-05 09:24:00 -07:00
Sami Remes	ffb52783d0	[CK_TILE] Tile loop persistent gemm kernel (#2191 ) * Implement tile loop persistent gemm kernel * Enable timing * Add tests for persistent gemm * Fix formatting * Fix gemm_basic * Rename True/False to Persistent/NonPersistent * Use only one set of layouts for persistent tests * Fix gemm example persistent template parameter * Fix formatting	2025-06-04 11:46:28 +03:00
Sami Remes	9bd01b624e	Remove extra if from CMakeLists.txt of gemm tests (#2213 )	2025-05-28 15:25:09 +02:00
Thomas Ning	50d1f8ff90	Add the MI355 support for CK TILE GEMM (#2046 ) * Get the root cause of the ck tile gemm failing on mi355 * Fix the ck tile gemm on MI355 * delete the debug info	2025-04-03 11:48:54 -07:00
Illia Silin	d4a6d69643	disable tests that take too long to build for gfx90a (#1975 )	2025-03-12 17:54:03 -07:00
kylasa	66c5f5b0b6	Addressing (Post Merge) code review comments for PR 1845 (#1883 ) * Addressing code review comments. * Addressing code review comments. * Reorganized code for better readability. * add ck_tile gemms for new types in CI * fix jenkins syntax * fix script syntax * Add the test cases back * Address the review comments * Address review comments * clang format * Solve the merging issues * Addressed the comments * clang format --------- Co-authored-by: illsilin <Illia.Silin@amd.com> Co-authored-by: ThomasNing <thomas.ning@amd.com> Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>	2025-03-06 11:40:30 -08:00
jakpiase	627a27bda3	Added unit tests for CK Tile compute bound gemm pipeline (#1728 )	2024-12-17 14:25:22 +01:00
Adam Osewski	24d996aae1	[CK-Tile] Universal gemm memory bound pipeline (#1558 ) * CK-Tile GEMM with memory bound pipeline. * Memory bound gemm pipeline. * Fix not closed namespace. * Block gemm mem pipeline draft. * Do not use ck_tile:: within ck_tile namespace. * Refactoring & Move Layout info to pipeline problem. * Get hot loop and TailNum information before lunching kernel. * Fixes in pipeline. * Add comment to load_tile_raw and change variable naming style. * Few small changes & formatting. * Do not use macro. * Add gtests. * Use AccDataType for Output of MFMA instruction. * Formatting. * Refactor gemm examples. * Switch over to current block gemm. * Use currently available pipeline policy. * Refactoring and review comment.s * Fixes after merge. * Add missing include. * Add load tile overload which accepts output tensor as parameter. * This give 8% perf boost at the cost of using more registers. * Rename example. * Small changes. * Fix compilation err and lower K. * Support different layouts for A/B * Fix vector size for different layouts. * Rename Alignment into VectorSize * Unblock tests.	2024-10-30 10:05:15 +01:00

24 Commits