kiefer
c9c05bfc7d
Remove some instances that give incorrect results (f16 NHWGC)
2025-12-15 08:56:49 +00:00
kiefer
9d5810942a
Replace cshuffle non-v3 lists with v3 lists, making sure to not have duplications. Also removing stride1pad0 support for NHWGC since we can use explicit for those cases.
2025-12-15 08:56:49 +00:00
kiefer
4fe6c7ddcb
Add two stage instances based on the parameters from the tuned cshuffle V3 instances. CShuffleBlockTranserScalarPerVector adapted to 4, and mergegroups fixed to 1 for now. No more special instance lists.
2025-12-15 08:56:49 +00:00
kiefer
8e88660834
Add explicit oddMN support with custom tuned instances
2025-12-15 08:56:49 +00:00
kiefer
133f5538e3
Reduce instances to only the tuned wmma V3 ones for implicit v1 intra and explicit v1 intra pad/nopad.
2025-12-15 08:56:49 +00:00
kiefer
7a516b8e99
Adapt all grouped conv bwd weight vanilla Xdl instances to 16x16. MRepeat doubled for all but 12 of them (some static assert failure). Also added custom reduced profiler target for building grouped conv bwd weight vanilla only profiler. Verified with gtest test.
2025-12-15 08:56:49 +00:00
Enrico Degregori
b5ccc070a8
Fix splitk ab scale
2025-12-15 08:19:21 +00:00
Enrico Degregori
e1694a9547
Fix splitk
2025-12-14 12:04:58 +00:00
Enrico Degregori
f4419af2c5
Fix typo
2025-12-12 13:54:32 +00:00
Enrico Degregori
e41b818b9f
Fix gridwise gemm
2025-12-12 11:41:54 +00:00
Enrico Degregori
9d87cfec15
Fix gridwise common
2025-12-12 11:40:20 +00:00
Enrico Degregori
9c7f272a6b
Fix compilation error
2025-12-12 11:40:00 +00:00
Enrico Degregori
df75061576
Restore example tolerance calculation
2025-12-12 11:17:31 +00:00
Enrico Degregori
a87256a676
Remove autodeduce 1 stage
2025-12-12 10:35:30 +00:00
Enrico Degregori
0f1bb0e817
Fix gridwise ab scale
2025-12-12 10:14:13 +00:00
Enrico Degregori
4a3c949753
Fix gridwise common
2025-12-12 10:11:42 +00:00
Enrico Degregori
29743bc0f4
Fix explicit conv bwd weight struct
2025-12-12 10:06:09 +00:00
Enrico Degregori
0c67e9731a
Address review comments
2025-12-12 09:49:01 +00:00
Enrico Degregori
3ea94e540b
Merge branch 'develop' into streamhpc/conv_bwd_weight_wmma
2025-12-12 08:42:36 +00:00
Enrico Degregori
ffad9c3e8f
Fix copyright
2025-12-12 08:40:44 +00:00
dependabot[bot]
8d7a4e0c73
Bump rocm-docs-core[api_reference] from 1.31.0 to 1.31.1 in /docs/sphinx ( #3410 )
...
Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core ) from 1.31.0 to 1.31.1.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.31.0...v1.31.1 )
---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
dependency-version: 1.31.1
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-11 21:09:40 -08:00
Max Podkorytov
4011dbfec3
[CK-Tile] fixup codegen for tile engine ops gemm multid and gemm preshuffle ( #3383 )
...
* fixup gemm multi-d and preshuffle in tile engine codegen
---------
Co-authored-by: Thrupti Raj Lakshmana Gowda <thruptiraj.lakshmanagowda@amd.com >
2025-12-11 14:23:43 -08:00
Aviral Goel
ff194a4271
build: Hot fix to reduce massive build time by just disabling the instances ( #3408 )
...
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
2025-12-11 10:39:20 -08:00
Aviral Goel
45c4ea510c
chore: add copyright to pass the CI ( #3407 )
2025-12-11 10:34:15 -08:00
Aviral Goel
4dcc3e59c1
chore: update copyright header for misc files ( #3402 )
...
* chore: update copyright header for misc files
* fix: typo in kernel resulting in ci failure
2025-12-11 08:25:29 -08:00
Enrico Degregori
0566c90f66
Merge branch 'develop' into streamhpc/conv_bwd_weight_wmma
2025-12-11 16:13:05 +00:00
Illia Silin
b2925ee207
Fix compilation errors with latest clang22 version. ( #3396 )
...
* remove target attributes from deduction guides
* switch CK_TILE_HOST_DEVICE_EXTERN based on clang version
2025-12-11 08:09:29 -08:00
eliotwang
715671e419
Bf16*fp4 gemm ( #2801 )
...
* support bf16*mxfp4 gemm
* rebase bf16*fp4 example to develop branch
* Clean up commented debug code in GEMM kernel
* rename example folder
* support bf16*mxfp4 gemm
* rebase bf16*fp4 example to develop branch
* Clean up commented debug code in GEMM kernel
* rename example folder
* rebase to new develop
* fix clang format
* update code according to reviewer's comment
* Update README.md
* update code according to reviewer's comment
* update code according to reviewer's comment
* Update CMakeLists.txt
* Update README.md
* Update CMakeLists.txt
* Delete files
* Delete files
* Add unit tests
* Update test_gemm_quant_base.hpp
* merge bf16*fp4 example to develop branch
* fix clang format
* fix clang format
* Update CMakeLists.txt
* fix ci test
* fix clang format
* resolve conflicts
---------
Co-authored-by: eliotwang <charyang@smci355-ccs-aus-m10-29.cs-aus.dcgpu >
Co-authored-by: ShaoChunLee <Shao-Chun.Lee@amd.com >
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com >
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com >
2025-12-11 07:20:29 -08:00
Enrico Degregori
ce99cab605
Wmma support for gemm_ab_scale ( #3314 )
...
* Support gemm_ab_scale:
- Add tests
- Integrate scaling implementation in multiple D
- Generalize existing b_scale for ab_scale
- Add instances
- Generalize implementation for ScaleBlockM, ScaleBlockN, ScaleBlockK
- Add support for all layouts supported by xdl
- Fix splitk xdl
* Fix copyright
* Wmma support for gemm_blockscale_wp (#3315 )
* Support for preshuffle with ab scale
- add support for b preshuffle in GridwiseGemm_wmma_cshuffle_v3_ab_scale
- add support for AScaleLayout amnd BScaleLayout (can be different
from ALayout and BLayout, respectively)
- add Run method in v1 pipeline to support preshuffle + scaling
- add support for preshuffle gemms in common invoker
- Add splitk support
* Fix copyright header
2025-12-11 09:06:20 +01:00
Ville Pietilä
d66e5f667c
[CK_BUILDER] Improve CK Builder and CK Builder tests ( #3382 )
...
* Remove stale documentation.
* Add placeholder for conv algorithm design description. Add link to conv factory description.
* Improve testing transfer parameters.
* Python script to check the block tilings.
* Improve tests and conv types serialization.
* Change representation of boolean values from 1/0 to true/false in instance strings.
* Change representation of boolean values from 1/0 to true/false in conv algorithm types.
* Test code improvements.
* Improve covn descriptions tests.
* Improve conv signature definition in conv fwd builder tests.
* clang-format.
* Remove obsolete script.
* Revert StaticAssertTypeEq changes in conv layout tests.
* Remove obsolete using declaration.
---------
Co-authored-by: Ville Pietilä <>
2025-12-11 09:50:00 +02:00
Aviral Goel
6d25525adc
feat(precommit-hooks): add check for correct copyright header ( #3302 )
...
* chore(copyright): update copyright header for left files
* feat(copyright): add copyright check to precommit hooks
* chore(copyright): update copyright header for include/ck_tile directory
* chore(copyright): update copyright header for example directory
* chore(copyright): update copyright header for .github directory
* refactor: copyright_check script with better if else handling
* chore(copyright): update compyright header for remaining files
* feat: add script to automate copyright addition
2025-12-10 22:50:43 -08:00
Aviral Goel
fbbdd36ea8
docs: add notes on tile distribution and inline comments ( #3297 )
...
* docs: add notes on tile distribution and inline comments
* Apply suggestions from code review
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
---------
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
2025-12-10 22:47:19 -08:00
Geo Min
8270900d60
[ci] Bumping TheRock commit hash ( #3385 )
...
* Bumping TheRock commit hash
* new docker hash
* Using new runner name
2025-12-10 17:34:41 -08:00
John Shumway
15ed65db35
Improve sequence sorting and add unit tests ( #3376 )
...
Old sequence sort code was showing up on build profiles. Convert it to constexpr functions for much more efficient build-time execution. The sorting is still O(N^2), but our sequences are small enough it executes quickly. This reduced compilation time of a small convolution by more than 10% and time overall time spent in the compiler on a narrow build by %6.
2025-12-10 12:25:23 -08:00
Po Yen Chen
b15df37255
fix: python 3.8 compatibility in fmha codegen ( #3388 )
2025-12-10 07:08:41 -08:00
kiefer
8d0951f5e2
Fix clang format for Two Stage implementation
2025-12-10 11:09:53 +00:00
Ville Pietilä
fc22320d78
[CK_TILE] Split-K autodeduction ( #3351 )
...
* First version of split-K autodeduction.
* Fix circular dependency and kernel construction.
* Fix tolerance calculation for bwd weight example.
* Simplify kernel construction.
* Fix kernel launching bug for split-K autodeduce.
* Add split-K autodeduction support for the two stage example.
* Fix a corner case.
* Fix clang-format.
* Fix clang-format for inc files.
* Add missing header.
* Prevent too large split-K values.
* Fix formatting.
* Add unit tests for IsSupportedArgument in grouped bwd conv.
* clang-format.
* Fix merge conflicts.
* Address feedback from code review.
* clang-format
* Fix new tests after merge.
---------
Co-authored-by: Ville Pietilä <>
2025-12-10 09:30:30 +02:00
Zzz9990
1aa93ef551
[CK_TILE MOE] add NT & preshuffle permute to cktile MOE ( #3377 )
...
* update coherence
---------
Co-authored-by: Zzz9990 <Zzz9990>
2025-12-10 10:03:28 +08:00
Illia Silin
934ba1208a
use hipTensor from monorepo for daily builds ( #3386 )
2025-12-09 14:39:08 -08:00
Illia Silin
0d8259affd
temporarily disable daily builds on gfx1010 and gfx908 ( #3384 )
2025-12-09 10:37:13 -08:00
Illia Silin
7582c9e73f
Upgrade to ROCm7.1.1 as default compiler. ( #3370 )
...
* upgrade to rocm7.1.1 as new default compiler
* fix jenkinsfile
2025-12-09 07:35:32 -08:00
dependabot[bot]
50ca3f83eb
Bump rocm-docs-core[api_reference] from 1.20.1 to 1.31.0 in /docs/sphinx ( #3374 )
...
Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core ) from 1.20.1 to 1.31.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/v1.31.0/CHANGELOG.md )
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.20.1...v1.31.0 )
---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
dependency-version: 1.31.0
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-09 07:10:34 -08:00
lalala-sh
6f0966e1e9
fix a16w4 moe bugs ( #3373 )
...
* fix valid mask bug
* update format
2025-12-09 17:54:55 +08:00
kiefer
616ad45cef
Print number of valid instances in profiler and tests.
2025-12-09 09:13:31 +00:00
kiefer
d201572ae4
Actually print the reason when a device implementation is not supported.
2025-12-09 09:13:31 +00:00
kiefer
1a822947eb
Fix bug in various bwd wei device implementations / profiler where the occupancy based split_k value could not be found because the Argument did not derive from ArgumentSplitK, leading to incorrect error tolerances.
2025-12-09 09:13:31 +00:00
kiefer
4cf3e61954
Grab device and gridwise files from bkp branch, this should enable splitK support for convolution and also we no longer ForceThreadTileTransfer for explicit gemm. Also grab some updates from 7e7243783008b11e904f127ecf1df55ef95e9af2 to fix building on clang20.
2025-12-09 09:13:31 +00:00
kiefer
3e27e627bb
Always ForceThreadTileTransfer for now, WaveTileTransfer does not work for convolution yet.
2025-12-09 09:13:31 +00:00
Enrico Degregori
29265aa82f
Fix add_test_executable
2025-12-09 09:13:30 +00:00
Enrico Degregori
4c09ae57bc
Disable splitk for 2stage xdl on rdna (bug to be fixed)
2025-12-09 09:13:30 +00:00