composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-07-01 20:27:42 +00:00

Author	SHA1	Message	Date
Ville Pietilä	76ffa1bf0a	Add more instances.	2025-10-16 11:33:06 +00:00
Ville Pietilä	044bcfcb1e	Take universal GEMM pipeline into use for grouped convolutions.	2025-10-16 11:03:14 +00:00
Ville Pietilä	e99b5a8c28	Merge remote-tracking branch 'origin/develop' into vpietila/ck-vs-ck-tile-conv-benchmarking	2025-10-16 07:33:08 +00:00
Ville Pietilä	9b3c61cac2	Add more instances.	2025-10-16 07:32:52 +00:00
Ville Pietilä	19fac39880	Enable vector loads in grouped conv bwd weight kernels.	2025-10-16 07:17:12 +00:00
Haocong WANG	013ba3c737	Enable storelse for fmha_fwd_trload kernel (#3023 )	2025-10-16 13:51:23 +08:00
Emily Martins	0dbd173500	Fix compiler noreturn error for ck tile permute test (#3036 )	2025-10-15 19:42:02 -07:00
Aviral Goel	232523d9fa	docs: add quant mode comparison to readme (#3032 ) * docs: add quant mode comparison to readme * Update example/ck_tile/38_block_scale_gemm/README.md Co-authored-by: Christopher Millette <63608002+cgmillette@users.noreply.github.com> --------- Co-authored-by: Christopher Millette <63608002+cgmillette@users.noreply.github.com>	2025-10-15 18:35:06 -07:00
Illia Silin	87d0a3ac17	use branch develop to test hipTensor (#3034 )	2025-10-15 15:40:34 -07:00
Illia Silin	3348f01e6f	re-enable clang-format by default (#3030 ) * re-enable clang-format by default * fix clang format	2025-10-15 07:43:11 -07:00
Ville Pietilä	a5b60ed2f2	Add more instances.	2025-10-15 14:33:01 +00:00
Christopher Millette	bde5f26db3	Disable streamk extended regression tests for now (#3016 )	2025-10-15 09:05:47 -05:00
Ville Pietilä	96a7c26a0b	Better split-K handling in the template instantiation.	2025-10-15 13:47:04 +00:00
Ville Pietilä	bbe13f4635	Add more instances.	2025-10-15 13:23:55 +00:00
Ville Pietilä	23aa650172	Add min blocks per CU to invoker name.	2025-10-15 13:21:29 +00:00
Ville Pietilä	57dbd2f4a4	Remove unnecessary compilations.	2025-10-15 13:20:58 +00:00
Ville Pietilä	3c08ce1e64	Improve the grouped conv kernel name generation in CK Tile.	2025-10-15 11:02:21 +00:00
felix	4c826abfff	Felix/opt sorting (#2902 ) * merge felix/sorting * opt moe sorting (#2822) * opt moe storing for 2k --------- Co-authored-by: lalala-sh <Jiaxing.Wen@amd.com> Co-authored-by: coderfeli <coderfeli@163.com>	2025-10-15 09:24:03 +08:00
AviralGoelAMD	ca1ab083a7	test(grouped_gemm_multi_d): add unit test for bf16 support	2025-10-14 18:00:43 -04:00
AviralGoelAMD	8d8b49dec2	feat(grouped_gemm_multi_d): add support for bf16	2025-10-14 18:00:43 -04:00
Geo Min	706c2b281c	fixing group id (#3002 )	2025-10-14 08:51:52 -07:00
joyeamd	b9d74e7746	update s_barrier's logic in gfx12 architecture (#3003 ) change s_waitcnt's logic in gfx1250 change s_waitcnt's logic in gfx1250 update comment	2025-10-14 08:49:34 -07:00
Illia Silin	e4298e55c7	Revert "[CK_TILE] Non-K Major from old CK to CK-Tile (#2442 )" (#3017 ) This reverts commit `d2bbca3eca`.	2025-10-14 08:43:14 -07:00
Ville Pietilä	3d0db2ca63	Fix transferring data back to host for validation.	2025-10-14 15:02:51 +00:00
jakpiase	6deaaa92cc	[CK_TILE] Switch into universal gemms for conv bwds (#2981 ) * switch into universal gemms for conv bwds * some fixes and support universal gemm in conv fwd * add reviewer comments	2025-10-14 16:09:16 +02:00
Ville Pietilä	bbed3a62dc	Fully functional CK Tile profiler.	2025-10-14 13:35:37 +00:00
msaffari-amd	589e242eda	Fix: Handle JSON boolean values (pad_m, pad_n, pad_k and persistent) in gemm_instance_builder (#3008 )	2025-10-14 13:20:25 +02:00
Ville Pietilä	0f6bf78caa	Add empty instance factory.	2025-10-14 07:13:20 +00:00
Ville Pietilä	eaf9ba4e45	Rename CK Tile grouped conv factory.	2025-10-14 06:31:34 +00:00
ClementLinCF	e1b0bdfbfa	[CK_TILE] Correct BlockWarps calculation and fix smoke-test in rmsnorm (#2540 ) * [CK_TILE] Correct BlockWarps calculation and fix smoke-test in rmsnorm * Update rmsnorm host reference * Update tree reduction of rmsnorm for reference host * Fix cross warp for m > 1 cases * Add RMSNorm model selectable option for host reference * Fix save_unquant cases * Update reference rmsnorm forward function to use enum for model sensitivity * Update reference rmsnorm calculation for model sensitivity * Fix m warp for layernorm * Adjust parameter of reference for twoPass * Fix clang format * Run clang-format-overwrite.sh to fix formating issue * fix clang format --------- Co-authored-by: MHYang <mengyang@amd.com> Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com> Co-authored-by: ThomasNing <thomas.ning@amd.com>	2025-10-13 11:52:37 -07:00
Ville Pietilä	fc6a9e3931	Create invoker for the kernel and a factory for creating invokers.	2025-10-13 15:22:50 +00:00
John Shumway	fc2a121c44	Enable GMock and improve gtest configuration (#2976 ) Our current cmake/gtest.cmake file does not enable gmock. Gmock is needed for matchers that are needed for more readable unit tests. This PR enables gmock and does a little cleanup in gtest.cmake: * Enable BUILD_GMOCK by default (was previously disabled) * Patch gtest-src/googlemock/CMakeLists.txt for broken include path. * Add configuration to gmock if the target is used. No other changes in this PR, but I've verified I can use gmock matchers correctly once I include these changes in other code.	2025-10-13 08:11:51 -07:00
Ville Pietilä	a60dab521e	Added a placeholder conv bwd instance factory for CK Tile profiler.	2025-10-13 14:32:20 +00:00
Ville Pietilä	6dcee56fee	WIP: CK Tile conv bwd profiler.	2025-10-13 13:03:21 +00:00
Sami Remes	d2bbca3eca	[CK_TILE] Non-K Major from old CK to CK-Tile (#2442 ) * Enable the adapted LDS B layout for Row-Major * fix formatting * Implement specialized col-major A LDS block descriptor * Fix formatting * Use VecLoadSize for AK1/BK1 * Fix some thread access pattern values * Use GetVectorSizeA for A * Fix formatting * Add extra condition to avoid division by zero * disable layout for wave32 * remove extra else * fix formatting * Fix formatting * Rename one remaining TileDistributionEncodingPattern2D * Use integer ceil division * revert remod.py changes * also revert utility.hpp * use getA/BTileAccessPattern everywhere * use integer_divide_ceil for AK0 too --------- Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com> Co-authored-by: Adam Osewski <Adam.Osewski@amd.com>	2025-10-13 14:27:02 +02:00
aledudek	634634f5c0	[CK_TILE] Blockwise GEMM pipeline v6 - port of v5 from old CK (#2955 ) * First checkpoint * Second checkpoint - hot loop scheduler * Third checkpoint - init main operator * Fourth checkpoint - main loop ready * Fifth checkpoint - main loop fix * Sixth checkpoint - ReadWritecompFunc * Seventh checkpoint - Tail finished * [CK_TILE] Blockwise gemm pipeline v5 complete * Working * Working fixes 2 * Rename v5 to v77 temporarily * Data type adjustment * Data type adjustment 2 * [CK_TILE] Blockwise Gemm pipeline v5 add tests * [CK_TILE] Fix calculation error * TEMP: check pipeline * Fix name to V6 * naming and documentation changes * WIP dump * Try fixing v1 * Failing tests v5 * Debugging * Changes v2 * F16 tests working great * Working BlockwiseGemmPipelineV5 as V6 * Cleanup and format * Merging changes part1 * [CK_TILE] Blockwise Gemm Pipeline Comp V5/V6 * Remove commented code * Fix gfx950 build issues * Fix file formatting * Review changes, more concat info, add bf16 bf8 tests * Fix formatting * Add bf16 and bf8 tests --------- Co-authored-by: Adam Osewski <Adam.Osewski@amd.com>	2025-10-13 13:57:37 +02:00
aledudek	3021604213	[CK_TILE] Batched Gemm Kernel IsSupported function checks (#2860 ) * Add valid check batched gemm part1 * [CK_TILE] Add batched gemm kernel IsSupported func checks * revert broken pre-commit hook changes * revert broken pre-commit hook changes v2 * Clarify error messages	2025-10-13 13:55:23 +02:00
Ville Pietilä	d62f34348a	Skeleton for the ckTileProfiler.	2025-10-13 11:40:31 +00:00
damien-lejeune	46c10c316d	Update include path to break the remod's cyclic dep issue (#2978 ) * Update include path to break the cyclic dep issue * Use ck_tile::permute_vectors_i4x4_b in tile engine --------- Co-authored-by: Damien Lejeune <damien.lejeune@amd.com> Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>	2025-10-13 13:24:47 +02:00
msaffari-amd	e9f0cc83a8	[CK Tile] contraction multi d - kernel & example (#2901 ) * Initial commit. create batched_contraction_kernel file * initial problem definition * implement initial example to launch kernel * add universal gemm to contraction. initial phase * complete implementation for special case all Dims are 1 and no Ds * clean code * initial changes to support multi dimensional G * more progress in implementing multiple G * tmp commit * manage dynamic NumDimG in kernel * improving example for multi M,N,K,G handling. start generalizing kernel. it is a temporary commit * implement the example for general Multi dimension G M N K and test different reference calculation algorithms * 2 functions for reference using multi dimensional and flat indexing * clean the code for muti dimentional G, M, N, K contraction and add some logs * Add Make descriptor function in kernel for merging Ms, Ns, Ks for A, B, E * some cleaning on kernel * clean the code for calculating the offsets from flatten batch number * Start adding MultiD support to kernel and example * more changes to manage multi D in kernel and example * manage passing multi d to kernel and testing. * complete multi D support in kernel. modify example code to support it * Correct algorithm to calc the correct offset values for D tensor batches and some code cleaning * Minor fix * Generalize example code for variable NumD tensors and apply cleanup based on review feedback * Refactored code and addressed review feedback * refactoring, cleaning, add documents, in kernel side and example codes * Optimize batch offset calculation in kernel * Inline CalculateBatchOffset in batched contraction kernel, update CHANGELOG.md --------- Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>	2025-10-13 12:30:28 +02:00
Ville Pietilä	94569f3991	Build only grouped conv profilers.	2025-10-13 10:01:42 +00:00
Yi DING	95bdc7410c	[CK_TILE] FMHA BWD Add Instance for D48 on GFX950 (#2866 ) Co-authored-by: asleepzzz <hanwen.chang@amd.com>	2025-10-13 15:03:46 +08:00
Christopher Millette	f5708882a3	Streamk functional tests (#2974 ) * Add initial fp16_mem_128x128x32_2x2x1_32x32x16_NonPersistent test suite * Account for stride when computing K offsets for A and B tensor This change ensures that the correct stride is used when computing the K offsets into the A and B tensors in the Stream-K Kernel's operator() function. This ensures that the kernel executes correct regardless of whether A and B are row or column major. * Move helper code to test_gemm_streamk_util.hpp * Separate tests into smoke/regression/extended. Add bf16 datatype * Run clang-format * Refactor combinatorial macro expansion and naming * Adjust the initialization values to account for better tolerance on bf16 * Correct BF16 datatypes in comments * Move the extended tests under the REGRESSION_TESTS label * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Emily Martins <emily.martins@amd.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-10-11 07:53:40 -05:00
John Shumway	0843815db7	Fix GCC 7 CTAD compilation error in test_fmha_bwd.cpp (#3001 ) Fixes compilation error on SLES15 with GCC 7 for gfx942 builds: error: 'vector' may not intend to support class template argument deduction [-Werror,-Wctad-maybe-unsupported] Changes: - Explicitly specify template argument for `std::vector<mode_enum>` instead of relying on C++17 CTAD - Maintains compatibility with both older (GCC 7) and newer compilers	2025-10-10 19:13:34 -07:00
Khushbu Agarwal	3c39d279ab	supporting prefill shapes for preshuffle block scale gemm (#2975 ) * debugging * debugging for prefill shapes * comment unused code * fix for prefill shapes * clearing up the code * add int4 to universal gemm example * clang formatted * adding test for prefill shapes in block scale gemm * lil improv on the block pipeline * Address Review Comment --------- Co-authored-by: ThomasNing <thomas.ning@amd.com>	2025-10-10 15:36:24 -07:00
Max Podkorytov	9d060d3e3c	[CK-Tile] functional support for transposed inputs in compute-bound double-lds-buffer pipeline with async loads from global memory to LDS (#2984 ) * reuse local prefetch logic from compute v4 pipeline add single-tile test explicit lambda capture reuse lds block descriptors from base policy for the transposed case match the test case kernel configuration with compute v4 * add comments	2025-10-10 12:57:50 -07:00
yinglu	fada1a3cae	Conv:TF32: add more instances - 2 (#2879 ) * add instances of device_grouped_conv_fwd_xdl_f32_comp_instances * add instances of device_grouped_conv_fwd_xdl_f32_tf32_mem_instances * add instances of device_grouped_conv_fwd_xdl_large_tensor_f32_tf32_instances * tf32:conv:add instances for base class DeviceConvFwd * tf32:conv:add instances for base class DeviceGroupedConvBwdDataMultipleD * tf32:conv:add instances for base class DeviceGroupedConvBwdWeight * add tf32 in profiler * remove gnhwc/ngchw/ngcdhw instances * remove non-ndhwgc/nhwgc/nhwc instances * add check in IsSupportedArgument()	2025-10-10 15:28:17 +08:00
Bartłomiej Kocot	ad7a215aba	Fix splitK for grouped conv bwd data (#2991 )	2025-10-10 09:24:21 +02:00
Yi DING	b6036bc76a	[CK_TILE] FMHA Tests Enhancement (#2945 ) * fmha-gtest-wip * Thanks Copilot!	2025-10-10 11:34:47 +08:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	fb66b4f5e4	[CK_TILE] fix pk_fp4 compilation for non-gfx950 GPUs (#2983 ) See build error log from https://github.com/ROCm/composable_kernel/issues/2271#issuecomment-3150218542 This PR make vector element access constexpr-safe by avoiding operator[] on ext_vector_type(2) and replace those sites in the pk_fp4 conversions so they can be used in constant expressions, as The operator[] on ext_vector_type(2) isn't allowed in constant expressions, which caused "constexpr function never produces a constant expression" with a note at x[0]. Using `bit_cast` to a trivial array representation keeps it constexpr-compatible. Signed-off-by: Hollow Man <hollowman@opensuse.org>	2025-10-09 07:43:41 -07:00

1 2 3 4 5 ...

2488 Commits