Commit Graph

2832 Commits

Author SHA1 Message Date
kiefer
c9c05bfc7d Remove some instances that give incorrect results (f16 NHWGC) 2025-12-15 08:56:49 +00:00
kiefer
9d5810942a Replace cshuffle non-v3 lists with v3 lists, making sure to not have duplications. Also removing stride1pad0 support for NHWGC since we can use explicit for those cases. 2025-12-15 08:56:49 +00:00
kiefer
4fe6c7ddcb Add two stage instances based on the parameters from the tuned cshuffle V3 instances. CShuffleBlockTranserScalarPerVector adapted to 4, and mergegroups fixed to 1 for now. No more special instance lists. 2025-12-15 08:56:49 +00:00
kiefer
8e88660834 Add explicit oddMN support with custom tuned instances 2025-12-15 08:56:49 +00:00
kiefer
133f5538e3 Reduce instances to only the tuned wmma V3 ones for implicit v1 intra and explicit v1 intra pad/nopad. 2025-12-15 08:56:49 +00:00
kiefer
7a516b8e99 Adapt all grouped conv bwd weight vanilla Xdl instances to 16x16. MRepeat doubled for all but 12 of them (some static assert failure). Also added custom reduced profiler target for building grouped conv bwd weight vanilla only profiler. Verified with gtest test. 2025-12-15 08:56:49 +00:00
Enrico Degregori
b5ccc070a8 Fix splitk ab scale 2025-12-15 08:19:21 +00:00
Enrico Degregori
e1694a9547 Fix splitk 2025-12-14 12:04:58 +00:00
Enrico Degregori
f4419af2c5 Fix typo 2025-12-12 13:54:32 +00:00
Enrico Degregori
e41b818b9f Fix gridwise gemm 2025-12-12 11:41:54 +00:00
Enrico Degregori
9d87cfec15 Fix gridwise common 2025-12-12 11:40:20 +00:00
Enrico Degregori
9c7f272a6b Fix compilation error 2025-12-12 11:40:00 +00:00
Enrico Degregori
df75061576 Restore example tolerance calculation 2025-12-12 11:17:31 +00:00
Enrico Degregori
a87256a676 Remove autodeduce 1 stage 2025-12-12 10:35:30 +00:00
Enrico Degregori
0f1bb0e817 Fix gridwise ab scale 2025-12-12 10:14:13 +00:00
Enrico Degregori
4a3c949753 Fix gridwise common 2025-12-12 10:11:42 +00:00
Enrico Degregori
29743bc0f4 Fix explicit conv bwd weight struct 2025-12-12 10:06:09 +00:00
Enrico Degregori
0c67e9731a Address review comments 2025-12-12 09:49:01 +00:00
Enrico Degregori
3ea94e540b Merge branch 'develop' into streamhpc/conv_bwd_weight_wmma 2025-12-12 08:42:36 +00:00
Enrico Degregori
ffad9c3e8f Fix copyright 2025-12-12 08:40:44 +00:00
dependabot[bot]
8d7a4e0c73 Bump rocm-docs-core[api_reference] from 1.31.0 to 1.31.1 in /docs/sphinx (#3410)
Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.31.0 to 1.31.1.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.31.0...v1.31.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-version: 1.31.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-11 21:09:40 -08:00
Max Podkorytov
4011dbfec3 [CK-Tile] fixup codegen for tile engine ops gemm multid and gemm preshuffle (#3383)
* fixup gemm multi-d and preshuffle in tile engine codegen

---------

Co-authored-by: Thrupti Raj Lakshmana Gowda <thruptiraj.lakshmanagowda@amd.com>
2025-12-11 14:23:43 -08:00
Aviral Goel
ff194a4271 build: Hot fix to reduce massive build time by just disabling the instances (#3408)
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
2025-12-11 10:39:20 -08:00
Aviral Goel
45c4ea510c chore: add copyright to pass the CI (#3407) 2025-12-11 10:34:15 -08:00
Aviral Goel
4dcc3e59c1 chore: update copyright header for misc files (#3402)
* chore: update copyright header for misc files

* fix: typo in kernel resulting in ci failure
2025-12-11 08:25:29 -08:00
Enrico Degregori
0566c90f66 Merge branch 'develop' into streamhpc/conv_bwd_weight_wmma 2025-12-11 16:13:05 +00:00
Illia Silin
b2925ee207 Fix compilation errors with latest clang22 version. (#3396)
* remove target attributes from deduction guides

* switch CK_TILE_HOST_DEVICE_EXTERN based on clang version
2025-12-11 08:09:29 -08:00
eliotwang
715671e419 Bf16*fp4 gemm (#2801)
* support bf16*mxfp4 gemm

* rebase bf16*fp4 example to develop branch

* Clean up commented debug code in GEMM kernel

* rename example folder

* support bf16*mxfp4 gemm

* rebase bf16*fp4 example to develop branch

* Clean up commented debug code in GEMM kernel

* rename example folder

* rebase to new develop

* fix clang format

* update code according to reviewer's comment

* Update README.md

* update code according to reviewer's comment

* update code according to reviewer's comment

* Update CMakeLists.txt

* Update README.md

* Update CMakeLists.txt

* Delete files

* Delete files

* Add unit tests

* Update test_gemm_quant_base.hpp

* merge bf16*fp4 example to develop branch

* fix clang format

* fix clang format

* Update CMakeLists.txt

* fix ci test

* fix clang format

* resolve conflicts

---------

Co-authored-by: eliotwang <charyang@smci355-ccs-aus-m10-29.cs-aus.dcgpu>
Co-authored-by: ShaoChunLee <Shao-Chun.Lee@amd.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>
2025-12-11 07:20:29 -08:00
Enrico Degregori
ce99cab605 Wmma support for gemm_ab_scale (#3314)
* Support gemm_ab_scale:

 - Add tests
 - Integrate scaling implementation in multiple D
 - Generalize existing b_scale for ab_scale
 - Add instances
 - Generalize implementation for ScaleBlockM, ScaleBlockN, ScaleBlockK
 - Add support for all layouts supported by xdl
 - Fix splitk xdl

* Fix copyright

* Wmma support for gemm_blockscale_wp (#3315)

* Support for  preshuffle with ab scale

 - add support for b preshuffle in GridwiseGemm_wmma_cshuffle_v3_ab_scale
 - add support for AScaleLayout amnd BScaleLayout (can be different
   from ALayout and BLayout, respectively)
 - add Run method in v1 pipeline to support preshuffle + scaling
 - add support for preshuffle gemms in common invoker
 - Add splitk support

* Fix copyright header
2025-12-11 09:06:20 +01:00
Ville Pietilä
d66e5f667c [CK_BUILDER] Improve CK Builder and CK Builder tests (#3382)
* Remove stale documentation.

* Add placeholder for conv algorithm design description. Add link to conv factory description.

* Improve testing transfer parameters.

* Python script to check the block tilings.

* Improve tests and conv types serialization.

* Change representation of boolean values from 1/0 to true/false in instance strings.

* Change representation of boolean values from 1/0 to true/false in conv algorithm types.

* Test code improvements.

* Improve covn descriptions tests.

* Improve conv signature definition in conv fwd builder tests.

* clang-format.

* Remove obsolete script.

* Revert StaticAssertTypeEq changes in conv layout tests.

* Remove obsolete using declaration.

---------

Co-authored-by: Ville Pietilä <>
2025-12-11 09:50:00 +02:00
Aviral Goel
6d25525adc feat(precommit-hooks): add check for correct copyright header (#3302)
* chore(copyright): update copyright header for left files

* feat(copyright): add copyright check to precommit hooks

* chore(copyright): update copyright header for include/ck_tile directory

* chore(copyright): update copyright header for example directory

* chore(copyright): update copyright header for .github directory

* refactor: copyright_check script with better if else handling

* chore(copyright): update compyright header for remaining files

* feat: add script to automate copyright addition
2025-12-10 22:50:43 -08:00
Aviral Goel
fbbdd36ea8 docs: add notes on tile distribution and inline comments (#3297)
* docs: add notes on tile distribution and inline comments

* Apply suggestions from code review

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

---------

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>
2025-12-10 22:47:19 -08:00
Geo Min
8270900d60 [ci] Bumping TheRock commit hash (#3385)
* Bumping TheRock commit hash

* new docker hash

* Using new runner name
2025-12-10 17:34:41 -08:00
John Shumway
15ed65db35 Improve sequence sorting and add unit tests (#3376)
Old sequence sort code was showing up on build profiles. Convert it to constexpr functions for much more efficient build-time execution. The sorting is still O(N^2), but our sequences are small enough it executes quickly. This reduced compilation time of a small convolution by more than 10% and time overall time spent in the compiler on a narrow build by %6.
2025-12-10 12:25:23 -08:00
Po Yen Chen
b15df37255 fix: python 3.8 compatibility in fmha codegen (#3388) 2025-12-10 07:08:41 -08:00
kiefer
8d0951f5e2 Fix clang format for Two Stage implementation 2025-12-10 11:09:53 +00:00
Ville Pietilä
fc22320d78 [CK_TILE] Split-K autodeduction (#3351)
* First version of split-K autodeduction.

* Fix circular dependency and kernel construction.

* Fix tolerance calculation for bwd weight example.

* Simplify kernel construction.

* Fix kernel launching bug for split-K autodeduce.

* Add split-K autodeduction support for the two stage example.

* Fix a corner case.

* Fix clang-format.

* Fix clang-format for inc files.

* Add missing header.

* Prevent too large split-K values.

* Fix formatting.

* Add unit tests for IsSupportedArgument in grouped bwd conv.

* clang-format.

* Fix merge conflicts.

* Address feedback from code review.

* clang-format

* Fix new tests after merge.

---------

Co-authored-by: Ville Pietilä <>
2025-12-10 09:30:30 +02:00
Zzz9990
1aa93ef551 [CK_TILE MOE] add NT & preshuffle permute to cktile MOE (#3377)
* update coherence
---------

Co-authored-by: Zzz9990 <Zzz9990>
2025-12-10 10:03:28 +08:00
Illia Silin
934ba1208a use hipTensor from monorepo for daily builds (#3386) 2025-12-09 14:39:08 -08:00
Illia Silin
0d8259affd temporarily disable daily builds on gfx1010 and gfx908 (#3384) 2025-12-09 10:37:13 -08:00
Illia Silin
7582c9e73f Upgrade to ROCm7.1.1 as default compiler. (#3370)
* upgrade to rocm7.1.1 as new default compiler

* fix jenkinsfile
2025-12-09 07:35:32 -08:00
dependabot[bot]
50ca3f83eb Bump rocm-docs-core[api_reference] from 1.20.1 to 1.31.0 in /docs/sphinx (#3374)
Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.20.1 to 1.31.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/v1.31.0/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.20.1...v1.31.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-version: 1.31.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-09 07:10:34 -08:00
lalala-sh
6f0966e1e9 fix a16w4 moe bugs (#3373)
* fix valid mask bug

* update format
2025-12-09 17:54:55 +08:00
kiefer
616ad45cef Print number of valid instances in profiler and tests. 2025-12-09 09:13:31 +00:00
kiefer
d201572ae4 Actually print the reason when a device implementation is not supported. 2025-12-09 09:13:31 +00:00
kiefer
1a822947eb Fix bug in various bwd wei device implementations / profiler where the occupancy based split_k value could not be found because the Argument did not derive from ArgumentSplitK, leading to incorrect error tolerances. 2025-12-09 09:13:31 +00:00
kiefer
4cf3e61954 Grab device and gridwise files from bkp branch, this should enable splitK support for convolution and also we no longer ForceThreadTileTransfer for explicit gemm. Also grab some updates from 7e7243783008b11e904f127ecf1df55ef95e9af2 to fix building on clang20. 2025-12-09 09:13:31 +00:00
kiefer
3e27e627bb Always ForceThreadTileTransfer for now, WaveTileTransfer does not work for convolution yet. 2025-12-09 09:13:31 +00:00
Enrico Degregori
29265aa82f Fix add_test_executable 2025-12-09 09:13:30 +00:00
Enrico Degregori
4c09ae57bc Disable splitk for 2stage xdl on rdna (bug to be fixed) 2025-12-09 09:13:30 +00:00