composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-07-01 12:17:00 +00:00

Author	SHA1	Message	Date
Enrico Degregori	0f1bb0e817	Fix gridwise ab scale	2025-12-12 10:14:13 +00:00
Enrico Degregori	4a3c949753	Fix gridwise common	2025-12-12 10:11:42 +00:00
Enrico Degregori	29743bc0f4	Fix explicit conv bwd weight struct	2025-12-12 10:06:09 +00:00
Enrico Degregori	0c67e9731a	Address review comments	2025-12-12 09:49:01 +00:00
Enrico Degregori	3ea94e540b	Merge branch 'develop' into streamhpc/conv_bwd_weight_wmma	2025-12-12 08:42:36 +00:00
Enrico Degregori	ffad9c3e8f	Fix copyright	2025-12-12 08:40:44 +00:00
dependabot[bot]	8d7a4e0c73	Bump rocm-docs-core[api_reference] from 1.31.0 to 1.31.1 in /docs/sphinx (#3410 ) Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.31.0 to 1.31.1. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.31.0...v1.31.1) --- updated-dependencies: - dependency-name: rocm-docs-core[api_reference] dependency-version: 1.31.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-12-11 21:09:40 -08:00
Max Podkorytov	4011dbfec3	[CK-Tile] fixup codegen for tile engine ops gemm multid and gemm preshuffle (#3383 ) * fixup gemm multi-d and preshuffle in tile engine codegen --------- Co-authored-by: Thrupti Raj Lakshmana Gowda <thruptiraj.lakshmanagowda@amd.com>	2025-12-11 14:23:43 -08:00
Aviral Goel	ff194a4271	build: Hot fix to reduce massive build time by just disabling the instances (#3408 ) Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>	2025-12-11 10:39:20 -08:00
Aviral Goel	45c4ea510c	chore: add copyright to pass the CI (#3407 )	2025-12-11 10:34:15 -08:00
Aviral Goel	4dcc3e59c1	chore: update copyright header for misc files (#3402 ) * chore: update copyright header for misc files * fix: typo in kernel resulting in ci failure	2025-12-11 08:25:29 -08:00
Enrico Degregori	0566c90f66	Merge branch 'develop' into streamhpc/conv_bwd_weight_wmma	2025-12-11 16:13:05 +00:00
Illia Silin	b2925ee207	Fix compilation errors with latest clang22 version. (#3396 ) * remove target attributes from deduction guides * switch CK_TILE_HOST_DEVICE_EXTERN based on clang version	2025-12-11 08:09:29 -08:00
eliotwang	715671e419	Bf16fp4 gemm (#2801 ) support bf16mxfp4 gemm rebase bf16fp4 example to develop branch Clean up commented debug code in GEMM kernel * rename example folder * support bf16mxfp4 gemm rebase bf16fp4 example to develop branch Clean up commented debug code in GEMM kernel * rename example folder * rebase to new develop * fix clang format * update code according to reviewer's comment * Update README.md * update code according to reviewer's comment * update code according to reviewer's comment * Update CMakeLists.txt * Update README.md * Update CMakeLists.txt * Delete files * Delete files * Add unit tests * Update test_gemm_quant_base.hpp * merge bf16fp4 example to develop branch fix clang format * fix clang format * Update CMakeLists.txt * fix ci test * fix clang format * resolve conflicts --------- Co-authored-by: eliotwang <charyang@smci355-ccs-aus-m10-29.cs-aus.dcgpu> Co-authored-by: ShaoChunLee <Shao-Chun.Lee@amd.com> Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com> Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>	2025-12-11 07:20:29 -08:00
Enrico Degregori	ce99cab605	Wmma support for gemm_ab_scale (#3314 ) * Support gemm_ab_scale: - Add tests - Integrate scaling implementation in multiple D - Generalize existing b_scale for ab_scale - Add instances - Generalize implementation for ScaleBlockM, ScaleBlockN, ScaleBlockK - Add support for all layouts supported by xdl - Fix splitk xdl * Fix copyright * Wmma support for gemm_blockscale_wp (#3315) * Support for preshuffle with ab scale - add support for b preshuffle in GridwiseGemm_wmma_cshuffle_v3_ab_scale - add support for AScaleLayout amnd BScaleLayout (can be different from ALayout and BLayout, respectively) - add Run method in v1 pipeline to support preshuffle + scaling - add support for preshuffle gemms in common invoker - Add splitk support * Fix copyright header	2025-12-11 09:06:20 +01:00
Ville Pietilä	d66e5f667c	[CK_BUILDER] Improve CK Builder and CK Builder tests (#3382 ) * Remove stale documentation. * Add placeholder for conv algorithm design description. Add link to conv factory description. * Improve testing transfer parameters. * Python script to check the block tilings. * Improve tests and conv types serialization. * Change representation of boolean values from 1/0 to true/false in instance strings. * Change representation of boolean values from 1/0 to true/false in conv algorithm types. * Test code improvements. * Improve covn descriptions tests. * Improve conv signature definition in conv fwd builder tests. * clang-format. * Remove obsolete script. * Revert StaticAssertTypeEq changes in conv layout tests. * Remove obsolete using declaration. --------- Co-authored-by: Ville Pietilä <>	2025-12-11 09:50:00 +02:00
Aviral Goel	6d25525adc	feat(precommit-hooks): add check for correct copyright header (#3302 ) * chore(copyright): update copyright header for left files * feat(copyright): add copyright check to precommit hooks * chore(copyright): update copyright header for include/ck_tile directory * chore(copyright): update copyright header for example directory * chore(copyright): update copyright header for .github directory * refactor: copyright_check script with better if else handling * chore(copyright): update compyright header for remaining files * feat: add script to automate copyright addition	2025-12-10 22:50:43 -08:00
Aviral Goel	fbbdd36ea8	docs: add notes on tile distribution and inline comments (#3297 ) * docs: add notes on tile distribution and inline comments * Apply suggestions from code review Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> --------- Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>	2025-12-10 22:47:19 -08:00
Geo Min	8270900d60	[ci] Bumping TheRock commit hash (#3385 ) * Bumping TheRock commit hash * new docker hash * Using new runner name	2025-12-10 17:34:41 -08:00
John Shumway	15ed65db35	Improve sequence sorting and add unit tests (#3376 ) Old sequence sort code was showing up on build profiles. Convert it to constexpr functions for much more efficient build-time execution. The sorting is still O(N^2), but our sequences are small enough it executes quickly. This reduced compilation time of a small convolution by more than 10% and time overall time spent in the compiler on a narrow build by %6.	2025-12-10 12:25:23 -08:00
Po Yen Chen	b15df37255	fix: python 3.8 compatibility in fmha codegen (#3388 )	2025-12-10 07:08:41 -08:00
kiefer	8d0951f5e2	Fix clang format for Two Stage implementation	2025-12-10 11:09:53 +00:00
Ville Pietilä	fc22320d78	[CK_TILE] Split-K autodeduction (#3351 ) * First version of split-K autodeduction. * Fix circular dependency and kernel construction. * Fix tolerance calculation for bwd weight example. * Simplify kernel construction. * Fix kernel launching bug for split-K autodeduce. * Add split-K autodeduction support for the two stage example. * Fix a corner case. * Fix clang-format. * Fix clang-format for inc files. * Add missing header. * Prevent too large split-K values. * Fix formatting. * Add unit tests for IsSupportedArgument in grouped bwd conv. * clang-format. * Fix merge conflicts. * Address feedback from code review. * clang-format * Fix new tests after merge. --------- Co-authored-by: Ville Pietilä <>	2025-12-10 09:30:30 +02:00
Zzz9990	1aa93ef551	[CK_TILE MOE] add NT & preshuffle permute to cktile MOE (#3377 ) * update coherence --------- Co-authored-by: Zzz9990 <Zzz9990>	2025-12-10 10:03:28 +08:00
Illia Silin	934ba1208a	use hipTensor from monorepo for daily builds (#3386 )	2025-12-09 14:39:08 -08:00
Illia Silin	0d8259affd	temporarily disable daily builds on gfx1010 and gfx908 (#3384 )	2025-12-09 10:37:13 -08:00
Illia Silin	7582c9e73f	Upgrade to ROCm7.1.1 as default compiler. (#3370 ) * upgrade to rocm7.1.1 as new default compiler * fix jenkinsfile	2025-12-09 07:35:32 -08:00
dependabot[bot]	50ca3f83eb	Bump rocm-docs-core[api_reference] from 1.20.1 to 1.31.0 in /docs/sphinx (#3374 ) Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.20.1 to 1.31.0. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/v1.31.0/CHANGELOG.md) - [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.20.1...v1.31.0) --- updated-dependencies: - dependency-name: rocm-docs-core[api_reference] dependency-version: 1.31.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-12-09 07:10:34 -08:00
lalala-sh	6f0966e1e9	fix a16w4 moe bugs (#3373 ) * fix valid mask bug * update format	2025-12-09 17:54:55 +08:00
kiefer	616ad45cef	Print number of valid instances in profiler and tests.	2025-12-09 09:13:31 +00:00
kiefer	d201572ae4	Actually print the reason when a device implementation is not supported.	2025-12-09 09:13:31 +00:00
kiefer	1a822947eb	Fix bug in various bwd wei device implementations / profiler where the occupancy based split_k value could not be found because the Argument did not derive from ArgumentSplitK, leading to incorrect error tolerances.	2025-12-09 09:13:31 +00:00
kiefer	4cf3e61954	Grab device and gridwise files from bkp branch, this should enable splitK support for convolution and also we no longer ForceThreadTileTransfer for explicit gemm. Also grab some updates from 7e7243783008b11e904f127ecf1df55ef95e9af2 to fix building on clang20.	2025-12-09 09:13:31 +00:00
kiefer	3e27e627bb	Always ForceThreadTileTransfer for now, WaveTileTransfer does not work for convolution yet.	2025-12-09 09:13:31 +00:00
Enrico Degregori	29265aa82f	Fix add_test_executable	2025-12-09 09:13:30 +00:00
Enrico Degregori	4c09ae57bc	Disable splitk for 2stage xdl on rdna (bug to be fixed)	2025-12-09 09:13:30 +00:00
kiefer	9eece6c0c4	Revert "Adapt all grouped conv bwd weight vanilla Xdl instances to 16x16. MRepeat doubled for all but 12 of them (some static assert failure). Also added custom reduced profiler target for building grouped conv bwd weight vanilla only profiler. Verified with gtest test." This reverts commit `d20c869d3d`.	2025-12-09 09:08:53 +00:00
Yi DING	c1c2e41a03	[CK_TILE] Generate random tensor values with multiple threads (#3324 )	2025-12-09 11:02:33 +08:00
Sami Remes	c363a98d41	[CK_TILE] Support more layouts for BQuant GEMM (#3349 ) * WIP: preparing to add transpose bq support * WIP: handle both row/col layout for BQ windows/tile dstr * Fix build * WIP: adding some test, debugging numerical errors * Fix all but pkint4 tests * Remove test_gemm_quant_typed.cpp again * update disabled tests * add conversion from pkint4 for b matrix * fix formatting * fix formatting * Fix tr_load and use override b datatype for clarity * fix formatting * make bquant preshuffle tests bqlayout column-major	2025-12-08 13:05:56 -08:00
Erwin Terpstra	fe07b5a1bf	[CK Tile] Grouped GEMM aquant mode and non-persistent kernel (#3337 ) * wip: add aquant to grouped gemm quant example * fix: properly handle hot loop count in aquant pipeline * fix: add separate GemmConfig structs for AQuant, automatically select the correct one * feat: finish support for a non-persistent kernel invocation for grouped gemm quant, and add support code to example * refactor: cleaned up grouped gemm quant example a bit by reusing pipeline selection logic * chore: add warp gemm dispatchers for a couple of TransposeC K=32 variants * feat: add quant grouped gemm tests cases for aquant (regular and transpose C) and non-persistent kernel * fix: update base pipeline classes according to changes in develop branch * Revert "chore: add warp gemm dispatchers for a couple of TransposeC K=32 variants" This reverts commit `b3fd4d326d`. * feat: remove aquant config from grouped gemm quant example, update to add persistency as runtime parameter * chore: removed work-around for aquant bug that has been fixed * chore: fix typo in command-line parameters * fix: correct K warp tile size for gfx950 * chore: incorrect warp tile configuration on gfx942	2025-12-08 12:19:22 -08:00
Anton Gorenko	ca6143f0b2	Add a workaround for a compiler issue for bwd on gfx90a and ROCm 7.1.1 (#3369 ) Sometimes there are not enough wait-states between v_mfma_f32... and v_accvgpr_read_b32 instructions if they are separated by s_cbranch. The workaround is to read accvgprs to vgpr before branching.	2025-12-08 07:44:17 -08:00
Yi DING	878b4e7f46	[CK_TILE] Optimize Flatmm MXFP4 by Eliminating Runtime Division by 2 (#3287 ) * [CK_TILE] Optimize Flatmm MXFP4 by Eliminating Runtime Division by 2 * typo	2025-12-08 19:20:44 +08:00
Bartłomiej Kocot	04612c30ce	[CK_BUILDER] Ck Tile Grouped convolution factory (#3352 ) * [BUILDER] Ck Tile Grouped convolution factory * Part 2 * Fixes after rebase * Remove leftovers	2025-12-08 10:32:56 +01:00
yinglu	8fec8054b2	ck: add tf32 in `DTYPES` to control instances build(#3317 )	2025-12-08 16:24:20 +08:00
Thomas Ning	86a84ae611	Add the gfx1011 support on CK Tile with the SGPR builtin reading protection (#3350 ) * Finish the fixes * add the gfx1010 support macro * Fix the compilation error	2025-12-05 14:18:30 -08:00
Khushbu Agarwal	6b1bceca7b	[CK_Tile] Enable PreshuffleB for 2d block scale Gemm (#3298 ) * formatted * formatted * formatting * formatting * formatting * [CK TILE GEMM] Refactor block_scale_gemm examples - Split cpp file to reduce building time - Support multiple GemmConfig * [CK TILE GEMM] Refactor block_scale_gemm examples - Update Readme * enable prefill shapes * [CK TILE GEMM] Refactor block_scale_gemm examples - Add support for rowcol and tensor GEMM operations * [CK TILE GEMM] Refactor block_scale_gemm examples - Update README * adding preshuffle quant as new parameter and its associated new files * remove debugging statements * adding test * enable preshuffle quant with permuteN * updating readme and correcponding gemmconfigs * updating cmake file * fixing CI failures for grouped quant gemm * debugging permuteN * debugging * debugging PermuteN * initial commit * resolving merge conflicts * adding test cases * fixing bq tensor calculation --------- Co-authored-by: Cong Ma <congma13@amd.com> Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>	2025-12-05 09:57:52 -08:00
Illia Silin	608232ce82	do not build hipblaslt for gfx90a to save time and disc space (#3362 )	2025-12-05 08:39:18 -08:00
Cong Ma	ed080f5a56	Congma/ck tile/aquant mem pipeline (#3346 ) * [CK TILE GEMM QUANT] Fix the bug in HotLoopTail of memory pipeline	2025-12-05 09:35:27 -07:00
John Shumway	7541d9b5b0	Ignore .cmake-format.yaml (#3356 ) We don't want to add cmake formatting until we are in the super repo, but its handy if developers want to experiment with formatting. For now we should ignore .cmake-format.yaml.	2025-12-05 08:26:00 -08:00
Bartłomiej Kocot	82f796a1f0	Profile resnet layout fixes (#3360 )	2025-12-05 08:20:46 -08:00

1 2 3 4 5 ...

2818 Commits