Yanxing-Shi
17a9aa2787
fix change log
2025-05-28 09:10:33 +00:00
Yanxing-Shi
926bd2b985
fix conflict
2025-05-28 09:08:42 +00:00
Khushbu Agarwal
99857e10e6
[CK_tile] Add rotating buffer feature for universal gemm ( #2200 )
...
* Add rotating buffer feature for universal gemm
* adding changes in tile_engine
* Updated code to merge kernel_launch
* removing comments
* Enable rotating buffer changes to flatmm
* Created diff launch_kernel function for rotating buffer
* Simplfied calculation using macros
* merge code with new changes in tile_engine
* clang formatted
* Redefine macros
2025-05-27 23:00:58 -07:00
Aviral Goel
c52649ad57
Add catch blocks in example GEMM apps to enable better error handling (Issue: 1928) ( #2234 )
...
* added catch statements to examples
* clang format
2025-05-27 22:32:42 -07:00
dependabot[bot]
132bd5b874
Bump rocm-docs-core[api_reference] from 1.18.4 to 1.19.0 in /docs/sphinx ( #2237 )
...
Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core ) from 1.18.4 to 1.19.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.18.4...v1.19.0 )
---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
dependency-version: 1.19.0
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-05-27 06:53:12 -07:00
Yanxing-Shi
dd0248aedb
fix cmake for sqlite3
2025-05-27 10:24:52 +00:00
Yanxing-Shi
cd89d1746d
add header
2025-05-27 09:50:23 +00:00
Yanxing-Shi
e28422bd89
fix changelog
2025-05-27 09:33:06 +00:00
Yanxing-Shi
b88be7fff3
merge upstream
2025-05-27 09:31:20 +00:00
Casey-Shi
128f5a1eab
[Tile Engine] Add benchmark for tile engine gemm. ( #2193 )
...
* initial commit -m benchmark
* only support profile
* fix
* fix doc
* add default config
* add ci
* fix cmake
* tmp save for gen blobs
* fix bug
* merge
* range config
* test success
* fix
* fix
* move struct
* remove config property
* fix config
* remove comment
* add cmake option & modify
* add changelog
* fix
* format
* add pydantic module to the docker image
* fix
* add benchmark for cold and warmp up
* python format
* add asm cache control
* fix README
* remove pydantic module
* modify changelog
* fix config
* recover benchmark_gemm and fix
* format python
* refactor profiler
* fix csv bug
* fix codegen bug
* add kernel instance object
* add benchmark gemm executable
* fix jenkins & delete extra header
* disable warning output & enable default config
* Disable sparsity for invalid warp tile combinations
* fix gemm host template func
* refactor gemm profiler
* filter out some inmstances
* default config test & fix codegen bug
* add sparse flag to gen more instances
---------
Co-authored-by: illsilin <Illia.Silin@amd.com >
Co-authored-by: khuagarw <khuagarw@amd.com >
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com >
2025-05-26 22:32:36 -07:00
Po Yen Chen
c42b957d65
[CK_TILE] For FMHA forward kernels, assign block indices reversely if using mask ( #2209 )
...
* Assign block indices reversely if kHasMask=true
* Assign block indices reversely for splitkv kernel
2025-05-27 10:58:58 +08:00
Yi DING
5727af98d1
Add operator/instance filters to ckProfiler ( #2233 )
2025-05-27 09:51:20 +08:00
Bartłomiej Kocot
b1ed92b131
Revert "Remove not needed bwd wei merged groups instances ( #2218 )" ( #2235 )
...
This reverts commit 4583aeffad .
2025-05-26 23:26:04 +02:00
Bartłomiej Kocot
4583aeffad
Remove not needed bwd wei merged groups instances ( #2218 )
...
* Grouped conv bwd wei add two stage instances for larger filter and Merge Groups
* Fix
* fix
* Revert "Restore oddc instances (#2201 )"
This reverts commit 6342f6b5e8 .
* fix
---------
Co-authored-by: Rostyslav Geyyer <46627076+geyyer@users.noreply.github.com >
2025-05-26 22:46:18 +02:00
Bartłomiej Kocot
037764bbc6
Fix grid size calc for bwd wei ( #2226 )
2025-05-26 16:51:09 +02:00
Zzz9990
ece38b9d7a
[VLLM V1] Add chunked prefill for FA to pass seq with small seqlen_q ( #2221 )
...
* fix splitkv compiler issue since lse is used to select kernel instances
* bypass seqlen == 1
* add chunked prefill into mha varlen
This reverts commit aa9847e42d .
* skip compile when receipt 2-4 and add comments
* fix
---------
Co-authored-by: fsx950223 <fsx950223@outlook.com >
2025-05-26 19:17:18 +08:00
Yanxing-Shi
7549e2b2e6
fix readme
2025-05-26 06:45:36 +00:00
Yanxing-Shi
68fe3876e4
fix dockerfile
2025-05-26 06:35:17 +00:00
Yanxing-Shi
ec1e45609b
merge support_engine_benchmark branch
2025-05-26 06:30:15 +00:00
Casey-Shi
c04c3107c1
Merge branch 'develop' into support_engine_benchmark
2025-05-26 14:07:47 +08:00
Yanxing-Shi
00d10075d6
add sparse flag to gen more instances
2025-05-26 06:06:16 +00:00
Yanxing-Shi
9510d3df1f
default config test & fix codegen bug
2025-05-26 04:33:44 +00:00
Illia Silin
8146e471f1
fix the buffer intrinsic names for clang >=20 ( #2228 )
2025-05-23 14:58:25 -07:00
Thomas Ning
9aef288ea9
Merge branch 'develop' into support_engine_benchmark
2025-05-22 16:42:24 -07:00
Illia Silin
1b846143c6
Revert "Update the buffer load/store intrinsic names for clang>=20. ( #2192 )" ( #2227 )
...
This reverts commit 58f9e9ffbc .
2025-05-22 15:41:17 -07:00
khuagarw
2ce19b36ef
filter out some inmstances
2025-05-22 21:26:23 +00:00
Illia Silin
bc2551ac3b
disable building device_mha_operations by default ( #2225 )
2025-05-22 14:03:04 -07:00
Adam Dickin
417a6b65b6
Add MIOPEN_REQ_LIBS_ONLY option for cmake to build only the libs MIOpen requires ( #2224 )
...
* cut out anything we dont need for MIOpen to test
* refactor exclusion code to be more streamlined.
2025-05-22 11:14:33 -07:00
Yanxing-Shi
43ac895597
fix query for insert
2025-05-22 16:28:19 +00:00
Yanxing-Shi
ecf403a430
initial commit, but resul=0 bug
2025-05-22 10:11:05 +00:00
Casey-Shi
7504e6929a
Merge branch 'develop' into support_engine_benchmark
2025-05-22 15:59:33 +08:00
Yanxing-Shi
40cd09a93d
refactor gemm profiler
2025-05-22 07:58:10 +00:00
Yanxing-Shi
365e80638a
fix gemm host template func
2025-05-22 07:14:37 +00:00
Aviral Goel
534d4594d0
Refactor tile_window.hpp, tile_window_linear.hpp into a CK Tile Hierarchy ( #2214 )
...
* window_origin variable now in base class
* abstracted more functions
* consolidated tile_window_static_distribution and tile_window_static_lengths
* clang format
* skeleton code for tile_window and tile_window_linear consolidation
* more abstraction
* moved variables from child to parent
* clang format
* removed comments
* removed debug code
* removed debug code
* abstracting traits WIP
* consolidated traits
* removed comments and clang formatted
2025-05-21 23:28:00 -07:00
khuagarw
baf923da13
Disable sparsity for invalid warp tile combinations
2025-05-21 22:11:28 +00:00
Bartłomiej Kocot
ebc5a6ef87
Grouped conv bwd wei add for larger filter and Merge Groupes optimization ( #2197 )
...
* Grouped conv bwd wei add two stage instances for larger filter and Merge Groups
* Fix
* fix
* Restore removed instances
---------
Co-authored-by: Rostyslav Geyyer <46627076+geyyer@users.noreply.github.com >
2025-05-21 22:47:34 +02:00
Aviral Goel
fa39c4e798
Add Doxygen Documentation for HostTesnor, HostTensorDescriptor, DeviceMem, FillUniformDistribution ( #2160 )
...
* added documentation for HostTensorDescriptor
* added documentation for DeviceMem and FillUniformDistribution
* fixed merging error
* fixed host_tensor_descriptor error
* clang format
2025-05-21 10:34:30 -07:00
Yanxing-Shi
a0f1615c09
Merge remote-tracking branch 'upstream/develop' into support_engine_benchmark
2025-05-21 09:48:23 +00:00
Yanxing-Shi
bb66c2af3e
disable warning output & enable default config
2025-05-21 09:47:57 +00:00
Aviral Goel
990d645578
added gemm universal example in readme ( #2216 )
2025-05-20 15:35:07 -07:00
SamiAario-AMD
380bca2b85
Fix 11_add_rmsnorm2d_rdquant ( #2207 )
2025-05-20 15:15:28 -07:00
Thomas Ning
1386924749
Add the instances for small sized GEMM in preshuffle and improve CMake Flag ( #2212 )
...
* Add small instance, add the bug fix, & improve the example CMake
* clang format
2025-05-20 15:05:08 -07:00
Yanxing-Shi
1bd07d12fc
Merge remote-tracking branch 'upstream/develop' into support_engine_benchmark
2025-05-20 16:09:19 +00:00
Yanxing-Shi
4dcbc7e3d8
fix jenkins & delete extra header
2025-05-20 16:08:30 +00:00
Yanxing-Shi
3506722e6a
add benchmark gemm executable
2025-05-20 15:41:19 +00:00
Sami Remes
d1e6f0982d
[CK_TILE] Grouped GEMM tile loop ( #2146 )
...
* Add trait to use a persistent kernel and split the entrypoints in grouped gemm
* Some helper functions for persistent kernel case
* Get max occupancy grid using device properties
* Implement tile loop in main entry point to grouped gemm
* Enable GridSize() on device
* Handle offset tile index using real current block index
* Add persistent kernel choice to grouped gemm example
* Use a for-loop for iterating over the group
* Reduce VGPR spills by early-exit
* Enable persistent kernel choice in grouped_gemm example
* Add persistent kernel option to grouped_gemm test
* Fix formatting with remod.py
* Remove GridUpdateBlocks as blocks are now iteratively computed
* Add comment about VGPR spilling
* Fix formatting
* Use CK_TILE_HOST instead of __host__
* Enable all Row/Col combinations in grouped gemm unit test
* Add some KBatch=2 cases to grouped gemm tests
* Fix SplitK for grouped gemm
* Enable pipeline hotloop/tailnumber selection in-kernel for grouped gemm
* Add type traits
* Split examples to regular and tileloop
* Formatting
* Use hipExtStreamGetCUMask to get current active CUs for the given stream
* Align test and example kernel config, and disable validation for splitk repeats
* Remove debug options from CMakeLists.txt
* Separate the code paths for persistent/non-persistent in test
* Fix formatting
* Address review comments
---------
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
2025-05-20 17:18:57 +03:00
Yanxing-Shi
3e66716bd0
merge develop
2025-05-20 09:01:24 +00:00
Yanxing-Shi
ee6b7f9246
add kernel instance object
2025-05-20 08:57:48 +00:00
Aviral Goel
c4929225f6
remove debug statements from CMakeLists ( #2204 )
2025-05-19 17:31:04 -07:00
Jan Patrick Lehr
0970f22221
[CMake] Disable newly added compiler warning -Wnrvo ( #2210 )
...
Recently a new warning was added to Clang to warn when no copy-elision
on return happens. That prevents our CK build. This disables the
warning.
2025-05-19 17:30:15 -07:00