composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-07-01 04:07:56 +00:00

Author	SHA1	Message	Date
Ville Pietilä	7722f901df	Fix validation.	2025-10-17 13:07:06 +00:00
Ville Pietilä	6789c219c1	Add missing header.	2025-10-17 12:43:49 +00:00
Ville Pietilä	a92c965667	Fix fwd layouts.	2025-10-17 11:07:39 +00:00
Ville Pietilä	ef3e871e6e	Add grouped conv fwd direction profiling into CK Tile profiler.	2025-10-17 10:47:23 +00:00
Ville Pietilä	0e0fb54b9f	Rename conv factory.	2025-10-17 06:26:41 +00:00
Ville Pietilä	a708b177fc	Add double smem buffer instances.	2025-10-17 06:24:11 +00:00
Ville Pietilä	c0b68c8a85	Add more instances.	2025-10-16 14:18:40 +00:00
Ville Pietilä	6c5531a4ae	Disqualify benchmarking results from kernels that do not pass validation.	2025-10-16 12:22:51 +00:00
Ville Pietilä	76ffa1bf0a	Add more instances.	2025-10-16 11:33:06 +00:00
Ville Pietilä	044bcfcb1e	Take universal GEMM pipeline into use for grouped convolutions.	2025-10-16 11:03:14 +00:00
Ville Pietilä	e99b5a8c28	Merge remote-tracking branch 'origin/develop' into vpietila/ck-vs-ck-tile-conv-benchmarking	2025-10-16 07:33:08 +00:00
Ville Pietilä	9b3c61cac2	Add more instances.	2025-10-16 07:32:52 +00:00
Ville Pietilä	19fac39880	Enable vector loads in grouped conv bwd weight kernels.	2025-10-16 07:17:12 +00:00
Haocong WANG	013ba3c737	Enable storelse for fmha_fwd_trload kernel (#3023 )	2025-10-16 13:51:23 +08:00
Emily Martins	0dbd173500	Fix compiler noreturn error for ck tile permute test (#3036 )	2025-10-15 19:42:02 -07:00
Aviral Goel	232523d9fa	docs: add quant mode comparison to readme (#3032 ) * docs: add quant mode comparison to readme * Update example/ck_tile/38_block_scale_gemm/README.md Co-authored-by: Christopher Millette <63608002+cgmillette@users.noreply.github.com> --------- Co-authored-by: Christopher Millette <63608002+cgmillette@users.noreply.github.com>	2025-10-15 18:35:06 -07:00
Illia Silin	87d0a3ac17	use branch develop to test hipTensor (#3034 )	2025-10-15 15:40:34 -07:00
Illia Silin	3348f01e6f	re-enable clang-format by default (#3030 ) * re-enable clang-format by default * fix clang format	2025-10-15 07:43:11 -07:00
Ville Pietilä	a5b60ed2f2	Add more instances.	2025-10-15 14:33:01 +00:00
Christopher Millette	bde5f26db3	Disable streamk extended regression tests for now (#3016 )	2025-10-15 09:05:47 -05:00
Ville Pietilä	96a7c26a0b	Better split-K handling in the template instantiation.	2025-10-15 13:47:04 +00:00
Ville Pietilä	bbe13f4635	Add more instances.	2025-10-15 13:23:55 +00:00
Ville Pietilä	23aa650172	Add min blocks per CU to invoker name.	2025-10-15 13:21:29 +00:00
Ville Pietilä	57dbd2f4a4	Remove unnecessary compilations.	2025-10-15 13:20:58 +00:00
Ville Pietilä	3c08ce1e64	Improve the grouped conv kernel name generation in CK Tile.	2025-10-15 11:02:21 +00:00
felix	4c826abfff	Felix/opt sorting (#2902 ) * merge felix/sorting * opt moe sorting (#2822) * opt moe storing for 2k --------- Co-authored-by: lalala-sh <Jiaxing.Wen@amd.com> Co-authored-by: coderfeli <coderfeli@163.com>	2025-10-15 09:24:03 +08:00
AviralGoelAMD	ca1ab083a7	test(grouped_gemm_multi_d): add unit test for bf16 support	2025-10-14 18:00:43 -04:00
AviralGoelAMD	8d8b49dec2	feat(grouped_gemm_multi_d): add support for bf16	2025-10-14 18:00:43 -04:00
Geo Min	706c2b281c	fixing group id (#3002 )	2025-10-14 08:51:52 -07:00
joyeamd	b9d74e7746	update s_barrier's logic in gfx12 architecture (#3003 ) change s_waitcnt's logic in gfx1250 change s_waitcnt's logic in gfx1250 update comment	2025-10-14 08:49:34 -07:00
Illia Silin	e4298e55c7	Revert "[CK_TILE] Non-K Major from old CK to CK-Tile (#2442 )" (#3017 ) This reverts commit `d2bbca3eca`.	2025-10-14 08:43:14 -07:00
Ville Pietilä	3d0db2ca63	Fix transferring data back to host for validation.	2025-10-14 15:02:51 +00:00
jakpiase	6deaaa92cc	[CK_TILE] Switch into universal gemms for conv bwds (#2981 ) * switch into universal gemms for conv bwds * some fixes and support universal gemm in conv fwd * add reviewer comments	2025-10-14 16:09:16 +02:00
Ville Pietilä	bbed3a62dc	Fully functional CK Tile profiler.	2025-10-14 13:35:37 +00:00
msaffari-amd	589e242eda	Fix: Handle JSON boolean values (pad_m, pad_n, pad_k and persistent) in gemm_instance_builder (#3008 )	2025-10-14 13:20:25 +02:00
Ville Pietilä	0f6bf78caa	Add empty instance factory.	2025-10-14 07:13:20 +00:00
Ville Pietilä	eaf9ba4e45	Rename CK Tile grouped conv factory.	2025-10-14 06:31:34 +00:00
ClementLinCF	e1b0bdfbfa	[CK_TILE] Correct BlockWarps calculation and fix smoke-test in rmsnorm (#2540 ) * [CK_TILE] Correct BlockWarps calculation and fix smoke-test in rmsnorm * Update rmsnorm host reference * Update tree reduction of rmsnorm for reference host * Fix cross warp for m > 1 cases * Add RMSNorm model selectable option for host reference * Fix save_unquant cases * Update reference rmsnorm forward function to use enum for model sensitivity * Update reference rmsnorm calculation for model sensitivity * Fix m warp for layernorm * Adjust parameter of reference for twoPass * Fix clang format * Run clang-format-overwrite.sh to fix formating issue * fix clang format --------- Co-authored-by: MHYang <mengyang@amd.com> Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com> Co-authored-by: ThomasNing <thomas.ning@amd.com>	2025-10-13 11:52:37 -07:00
Ville Pietilä	fc6a9e3931	Create invoker for the kernel and a factory for creating invokers.	2025-10-13 15:22:50 +00:00
John Shumway	fc2a121c44	Enable GMock and improve gtest configuration (#2976 ) Our current cmake/gtest.cmake file does not enable gmock. Gmock is needed for matchers that are needed for more readable unit tests. This PR enables gmock and does a little cleanup in gtest.cmake: * Enable BUILD_GMOCK by default (was previously disabled) * Patch gtest-src/googlemock/CMakeLists.txt for broken include path. * Add configuration to gmock if the target is used. No other changes in this PR, but I've verified I can use gmock matchers correctly once I include these changes in other code.	2025-10-13 08:11:51 -07:00
Ville Pietilä	a60dab521e	Added a placeholder conv bwd instance factory for CK Tile profiler.	2025-10-13 14:32:20 +00:00
Ville Pietilä	6dcee56fee	WIP: CK Tile conv bwd profiler.	2025-10-13 13:03:21 +00:00
Sami Remes	d2bbca3eca	[CK_TILE] Non-K Major from old CK to CK-Tile (#2442 ) * Enable the adapted LDS B layout for Row-Major * fix formatting * Implement specialized col-major A LDS block descriptor * Fix formatting * Use VecLoadSize for AK1/BK1 * Fix some thread access pattern values * Use GetVectorSizeA for A * Fix formatting * Add extra condition to avoid division by zero * disable layout for wave32 * remove extra else * fix formatting * Fix formatting * Rename one remaining TileDistributionEncodingPattern2D * Use integer ceil division * revert remod.py changes * also revert utility.hpp * use getA/BTileAccessPattern everywhere * use integer_divide_ceil for AK0 too --------- Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com> Co-authored-by: Adam Osewski <Adam.Osewski@amd.com>	2025-10-13 14:27:02 +02:00
aledudek	634634f5c0	[CK_TILE] Blockwise GEMM pipeline v6 - port of v5 from old CK (#2955 ) * First checkpoint * Second checkpoint - hot loop scheduler * Third checkpoint - init main operator * Fourth checkpoint - main loop ready * Fifth checkpoint - main loop fix * Sixth checkpoint - ReadWritecompFunc * Seventh checkpoint - Tail finished * [CK_TILE] Blockwise gemm pipeline v5 complete * Working * Working fixes 2 * Rename v5 to v77 temporarily * Data type adjustment * Data type adjustment 2 * [CK_TILE] Blockwise Gemm pipeline v5 add tests * [CK_TILE] Fix calculation error * TEMP: check pipeline * Fix name to V6 * naming and documentation changes * WIP dump * Try fixing v1 * Failing tests v5 * Debugging * Changes v2 * F16 tests working great * Working BlockwiseGemmPipelineV5 as V6 * Cleanup and format * Merging changes part1 * [CK_TILE] Blockwise Gemm Pipeline Comp V5/V6 * Remove commented code * Fix gfx950 build issues * Fix file formatting * Review changes, more concat info, add bf16 bf8 tests * Fix formatting * Add bf16 and bf8 tests --------- Co-authored-by: Adam Osewski <Adam.Osewski@amd.com>	2025-10-13 13:57:37 +02:00
aledudek	3021604213	[CK_TILE] Batched Gemm Kernel IsSupported function checks (#2860 ) * Add valid check batched gemm part1 * [CK_TILE] Add batched gemm kernel IsSupported func checks * revert broken pre-commit hook changes * revert broken pre-commit hook changes v2 * Clarify error messages	2025-10-13 13:55:23 +02:00
Ville Pietilä	d62f34348a	Skeleton for the ckTileProfiler.	2025-10-13 11:40:31 +00:00
damien-lejeune	46c10c316d	Update include path to break the remod's cyclic dep issue (#2978 ) * Update include path to break the cyclic dep issue * Use ck_tile::permute_vectors_i4x4_b in tile engine --------- Co-authored-by: Damien Lejeune <damien.lejeune@amd.com> Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>	2025-10-13 13:24:47 +02:00
msaffari-amd	e9f0cc83a8	[CK Tile] contraction multi d - kernel & example (#2901 ) * Initial commit. create batched_contraction_kernel file * initial problem definition * implement initial example to launch kernel * add universal gemm to contraction. initial phase * complete implementation for special case all Dims are 1 and no Ds * clean code * initial changes to support multi dimensional G * more progress in implementing multiple G * tmp commit * manage dynamic NumDimG in kernel * improving example for multi M,N,K,G handling. start generalizing kernel. it is a temporary commit * implement the example for general Multi dimension G M N K and test different reference calculation algorithms * 2 functions for reference using multi dimensional and flat indexing * clean the code for muti dimentional G, M, N, K contraction and add some logs * Add Make descriptor function in kernel for merging Ms, Ns, Ks for A, B, E * some cleaning on kernel * clean the code for calculating the offsets from flatten batch number * Start adding MultiD support to kernel and example * more changes to manage multi D in kernel and example * manage passing multi d to kernel and testing. * complete multi D support in kernel. modify example code to support it * Correct algorithm to calc the correct offset values for D tensor batches and some code cleaning * Minor fix * Generalize example code for variable NumD tensors and apply cleanup based on review feedback * Refactored code and addressed review feedback * refactoring, cleaning, add documents, in kernel side and example codes * Optimize batch offset calculation in kernel * Inline CalculateBatchOffset in batched contraction kernel, update CHANGELOG.md --------- Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>	2025-10-13 12:30:28 +02:00
Ville Pietilä	94569f3991	Build only grouped conv profilers.	2025-10-13 10:01:42 +00:00
Yi DING	95bdc7410c	[CK_TILE] FMHA BWD Add Instance for D48 on GFX950 (#2866 ) Co-authored-by: asleepzzz <hanwen.chang@amd.com>	2025-10-13 15:03:46 +08:00

1 2 3 4 5 ...

2496 Commits