Commit Graph

1963 Commits

Author SHA1 Message Date
Astha Rai
2f1baf2074 changes for parallel reduction debug 2025-09-22 19:12:16 +00:00
Astha Rai
e8d68bb29c working version with atomics, reduction still failing 2025-09-22 18:57:35 +00:00
Astha Rai
8a02a52d76 changing back to updated block_to_ctile_map with support for multiple tiles: getting a validation error. Made other minor changes as well 2025-07-08 08:40:46 +00:00
Astha Rai
9e74f1e138 formatting fix 2025-06-23 18:49:32 +00:00
Astha Rai
62e4a80be5 resolved floating point error for some instances 2025-06-23 18:42:55 +00:00
Astha Rai
694c2eaadb added argument to examples 2025-06-23 18:42:20 +00:00
Astha Rai
3dc5dcb8a1 rework 2lds run 2025-06-18 21:23:17 +00:00
Astha Rai
8a3ce560c6 fixing merge conflict 2025-06-17 23:45:53 +00:00
Astha Rai
fcf50b211d changes to profiler for streamk 2025-06-17 23:41:56 +00:00
ozturkosu
50f448b291 CkProfiler Fix 2025-06-12 01:06:23 -04:00
Astha Rai
6e4ddf54ef cleaning up errors for different blockwise gemm pipelines: fp8 works for all pipelines, bf16 has validation errors for v2 atomic and memory issues v1 reduction 2025-06-11 22:27:00 +00:00
Astha Rai
566310e8dd comment debug prints 2025-06-05 00:22:38 +00:00
Astha Rai
64451ec381 made changes to cshuffle datatype to resolve compilation error: reduction is also working for bf16, fp8 2025-06-04 20:42:32 +00:00
Astha Rai
5f9d447669 changes to grid dims: got atomic for bf16 and fp8 running, validation errors for fp16 2025-06-04 20:41:15 +00:00
Astha Rai
eb40781348 changes to relevant files to integrate old version of streamk - compiles, but runs into HIP runtime error 2025-06-02 21:13:56 +00:00
Khushbu Agarwal
2e38eb4f1c Rotating buffer PR CI fix (#2257)
* Revert "Revert "[CK_tile] Add rotating buffer feature for universal gemm (#2200)" (#2256)"

This reverts commit bbdaf79a52.

* fix regression
2025-06-02 10:25:01 -07:00
dependabot[bot]
cffe8fa2a4 Bump rocm-docs-core[api_reference] from 1.19.1 to 1.20.0 in /docs/sphinx (#2272)
Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.19.1 to 1.20.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.19.1...v1.20.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-version: 1.20.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-02 06:44:10 -07:00
valarLip
0fdbf6bcd1 extend buffer load for fp16/bf16x16 (#2270)
* extend buffer load for fp16/bf16x16

* format
2025-06-02 10:29:54 +08:00
Kiefer van Teutem
2215a9edf0 Explicitly set the LINKER_LANGUAGE for the gemm_template_instances target to avoid Ninja build config failure. (#2265)
Co-authored-by: kiefer <kiefer.van.teutem@streamhpc.com>
2025-05-30 13:32:28 -07:00
Illia Silin
654956bb02 Add a daily CI build on GFX950. (#2261)
* add CI build for gfx950

* make sure gfx950 CI always uses special docker and compiler

* enable codegen tests by default
2025-05-30 12:50:08 -07:00
Mirza Halilčević
fbce6c7bb6 Define CHAR_BIT during hipRTC (#2264)
* Fix failing codegen tests.

* fix clang format

---------

Co-authored-by: illsilin <Illia.Silin@amd.com>
2025-05-30 08:23:44 -07:00
dependabot[bot]
61e6c382c6 Bump rocm-docs-core[api_reference] from 1.19.0 to 1.19.1 in /docs/sphinx (#2263)
Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.19.0 to 1.19.1.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.19.0...v1.19.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-version: 1.19.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-05-30 05:56:59 -07:00
slippedJim
57f497452a remove restriction of group mode hd192 no lse (#2252)
Co-authored-by: Jim <jimguo12@amd.com>
2025-05-30 10:14:21 +08:00
Illia Silin
4e561af18c Revert "add CShuffleM/NXdlPerWavePerShuffle in cshuffle_epilogue (#2185)" (#2260)
This reverts commit fd6a859b44.
2025-05-29 16:22:16 -07:00
Paul Fultz II
306f4c537e Export codegen targets (#2259) 2025-05-29 11:03:51 -07:00
joyeamd
fd6a859b44 add CShuffleM/NXdlPerWavePerShuffle in cshuffle_epilogue (#2185)
* add cshuffle's mxdlperwavepershuffle support, not finished

* add epilogue functions

* add cshuffle's mxdlperwavepershuffle support, not finished

* add epilogue functions

* update cshuffle logic

* update cshuffle_logics

* add some change within review

* update some codes following the code review

* update epilogue logic

* remove from problem

* update codes following review.

* fix some issues
2025-05-29 14:31:14 +02:00
Po Yen Chen
28cd0dffc9 [CK_TILE] FMHA forward batch_prefill optimization for low CU utilization (#2251)
* Add constraint on traits/tile/pipeline

* Use kM0=128 if max_seqlen_q == 8192

* Re-format codegen script

* Remove redundant attr name postix

* Fix import error: default field in dataclass

* Use kK0=64 & kK1=64 to hide latency

* Use CU utilization to decide tile size
2025-05-29 18:36:33 +09:00
Bartłomiej Kocot
e7906dd644 Change relu to clamp for grouped conv fwd instances (#2249) 2025-05-29 00:51:25 +02:00
Adam Dickin
6df1c56ad6 Changes to allow MIOpen to build CK as part of its build. (#2247)
* tweaks to the miopen specific build.  add way to skip clang-tidy checks and a way to skip some custom build targets MIOpen also has.

* move the tidy if statment

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
2025-05-28 13:51:15 -07:00
BrianHarrisonAMD
e91be7d96a Add option to disable offload compress for CK builds (#2250)
* Add option to disable offload compress for CK builds

* Remove gemm exe offload compress flag conditional
2025-05-28 13:47:56 -07:00
Casey-Shi
29574f05f7 change from ninja to make (#2253) 2025-05-28 09:25:05 -07:00
Illia Silin
bbdaf79a52 Revert "[CK_tile] Add rotating buffer feature for universal gemm (#2200)" (#2256)
This reverts commit 99857e10e6.
2025-05-28 09:46:52 -06:00
Casey-Shi
4286eae09a fix type hint (#2254) 2025-05-28 08:43:58 -07:00
Sami Remes
9bd01b624e Remove extra if from CMakeLists.txt of gemm tests (#2213) 2025-05-28 15:25:09 +02:00
Khushbu Agarwal
99857e10e6 [CK_tile] Add rotating buffer feature for universal gemm (#2200)
* Add rotating buffer feature for universal gemm

* adding changes in tile_engine

* Updated code to merge kernel_launch

* removing comments

* Enable rotating buffer changes to flatmm

* Created diff launch_kernel function for rotating buffer

* Simplfied calculation using macros

* merge code with new changes in tile_engine

* clang formatted

* Redefine macros
2025-05-27 23:00:58 -07:00
Aviral Goel
c52649ad57 Add catch blocks in example GEMM apps to enable better error handling (Issue: 1928) (#2234)
* added catch statements to examples

* clang format
2025-05-27 22:32:42 -07:00
dependabot[bot]
132bd5b874 Bump rocm-docs-core[api_reference] from 1.18.4 to 1.19.0 in /docs/sphinx (#2237)
Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.18.4 to 1.19.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.18.4...v1.19.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-version: 1.19.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-05-27 06:53:12 -07:00
Casey-Shi
128f5a1eab [Tile Engine] Add benchmark for tile engine gemm. (#2193)
* initial commit -m benchmark

* only support profile

* fix

* fix doc

* add default config

* add ci

* fix cmake

* tmp save for gen blobs

* fix bug

* merge

* range config

* test success

* fix

* fix

* move struct

* remove config property

* fix config

* remove comment

* add cmake option & modify

* add changelog

* fix

* format

* add pydantic module to the docker image

* fix

* add benchmark for cold and warmp up

* python format

* add asm cache control

* fix README

* remove pydantic module

* modify changelog

* fix config

* recover benchmark_gemm and fix

* format python

* refactor profiler

* fix csv bug

* fix codegen bug

* add kernel instance object

* add benchmark gemm executable

* fix jenkins & delete extra header

* disable warning output & enable default config

* Disable sparsity for invalid warp tile combinations

* fix gemm host template func

* refactor gemm profiler

* filter out some inmstances

* default config test & fix codegen bug

* add sparse flag to gen more instances

---------

Co-authored-by: illsilin <Illia.Silin@amd.com>
Co-authored-by: khuagarw <khuagarw@amd.com>
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>
2025-05-26 22:32:36 -07:00
Po Yen Chen
c42b957d65 [CK_TILE] For FMHA forward kernels, assign block indices reversely if using mask (#2209)
* Assign block indices reversely if kHasMask=true

* Assign block indices reversely for splitkv kernel
2025-05-27 10:58:58 +08:00
Yi DING
5727af98d1 Add operator/instance filters to ckProfiler (#2233) 2025-05-27 09:51:20 +08:00
Bartłomiej Kocot
b1ed92b131 Revert "Remove not needed bwd wei merged groups instances (#2218)" (#2235)
This reverts commit 4583aeffad.
2025-05-26 23:26:04 +02:00
Bartłomiej Kocot
4583aeffad Remove not needed bwd wei merged groups instances (#2218)
* Grouped conv bwd wei add two stage instances for larger filter and Merge Groups

* Fix

* fix

* Revert "Restore oddc instances (#2201)"

This reverts commit 6342f6b5e8.

* fix

---------

Co-authored-by: Rostyslav Geyyer <46627076+geyyer@users.noreply.github.com>
2025-05-26 22:46:18 +02:00
Bartłomiej Kocot
037764bbc6 Fix grid size calc for bwd wei (#2226) 2025-05-26 16:51:09 +02:00
Zzz9990
ece38b9d7a [VLLM V1] Add chunked prefill for FA to pass seq with small seqlen_q (#2221)
* fix splitkv compiler issue since lse is used to select kernel instances

* bypass seqlen == 1

* add chunked prefill into mha varlen

This reverts commit aa9847e42d.

* skip compile when receipt 2-4 and add comments

* fix

---------

Co-authored-by: fsx950223 <fsx950223@outlook.com>
2025-05-26 19:17:18 +08:00
Illia Silin
8146e471f1 fix the buffer intrinsic names for clang >=20 (#2228) 2025-05-23 14:58:25 -07:00
Illia Silin
1b846143c6 Revert "Update the buffer load/store intrinsic names for clang>=20. (#2192)" (#2227)
This reverts commit 58f9e9ffbc.
2025-05-22 15:41:17 -07:00
Illia Silin
bc2551ac3b disable building device_mha_operations by default (#2225) 2025-05-22 14:03:04 -07:00
Adam Dickin
417a6b65b6 Add MIOPEN_REQ_LIBS_ONLY option for cmake to build only the libs MIOpen requires (#2224)
* cut out anything we dont need for MIOpen to test

* refactor exclusion code to be more streamlined.
2025-05-22 11:14:33 -07:00
Aviral Goel
534d4594d0 Refactor tile_window.hpp, tile_window_linear.hpp into a CK Tile Hierarchy (#2214)
* window_origin variable now in base class

* abstracted more functions

* consolidated tile_window_static_distribution and tile_window_static_lengths

* clang format

* skeleton code for tile_window and tile_window_linear consolidation

* more abstraction

* moved variables from child to parent

* clang format

* removed comments

* removed debug code

* removed debug code

* abstracting traits WIP

* consolidated traits

* removed comments and clang formatted
2025-05-21 23:28:00 -07:00
Bartłomiej Kocot
ebc5a6ef87 Grouped conv bwd wei add for larger filter and Merge Groupes optimization (#2197)
* Grouped conv bwd wei add two stage instances for larger filter and Merge Groups

* Fix

* fix

* Restore removed instances

---------

Co-authored-by: Rostyslav Geyyer <46627076+geyyer@users.noreply.github.com>
2025-05-21 22:47:34 +02:00