Commit Graph

1981 Commits

Author SHA1 Message Date
Yanxing-Shi
68fe3876e4 fix dockerfile 2025-05-26 06:35:17 +00:00
Yanxing-Shi
ec1e45609b merge support_engine_benchmark branch 2025-05-26 06:30:15 +00:00
Casey-Shi
c04c3107c1 Merge branch 'develop' into support_engine_benchmark 2025-05-26 14:07:47 +08:00
Yanxing-Shi
00d10075d6 add sparse flag to gen more instances 2025-05-26 06:06:16 +00:00
Yanxing-Shi
9510d3df1f default config test & fix codegen bug 2025-05-26 04:33:44 +00:00
Illia Silin
8146e471f1 fix the buffer intrinsic names for clang >=20 (#2228) 2025-05-23 14:58:25 -07:00
Thomas Ning
9aef288ea9 Merge branch 'develop' into support_engine_benchmark 2025-05-22 16:42:24 -07:00
Illia Silin
1b846143c6 Revert "Update the buffer load/store intrinsic names for clang>=20. (#2192)" (#2227)
This reverts commit 58f9e9ffbc.
2025-05-22 15:41:17 -07:00
khuagarw
2ce19b36ef filter out some inmstances 2025-05-22 21:26:23 +00:00
Illia Silin
bc2551ac3b disable building device_mha_operations by default (#2225) 2025-05-22 14:03:04 -07:00
Adam Dickin
417a6b65b6 Add MIOPEN_REQ_LIBS_ONLY option for cmake to build only the libs MIOpen requires (#2224)
* cut out anything we dont need for MIOpen to test

* refactor exclusion code to be more streamlined.
2025-05-22 11:14:33 -07:00
Yanxing-Shi
43ac895597 fix query for insert 2025-05-22 16:28:19 +00:00
Yanxing-Shi
ecf403a430 initial commit, but resul=0 bug 2025-05-22 10:11:05 +00:00
Casey-Shi
7504e6929a Merge branch 'develop' into support_engine_benchmark 2025-05-22 15:59:33 +08:00
Yanxing-Shi
40cd09a93d refactor gemm profiler 2025-05-22 07:58:10 +00:00
Yanxing-Shi
365e80638a fix gemm host template func 2025-05-22 07:14:37 +00:00
Aviral Goel
534d4594d0 Refactor tile_window.hpp, tile_window_linear.hpp into a CK Tile Hierarchy (#2214)
* window_origin variable now in base class

* abstracted more functions

* consolidated tile_window_static_distribution and tile_window_static_lengths

* clang format

* skeleton code for tile_window and tile_window_linear consolidation

* more abstraction

* moved variables from child to parent

* clang format

* removed comments

* removed debug code

* removed debug code

* abstracting traits WIP

* consolidated traits

* removed comments and clang formatted
2025-05-21 23:28:00 -07:00
khuagarw
baf923da13 Disable sparsity for invalid warp tile combinations 2025-05-21 22:11:28 +00:00
Bartłomiej Kocot
ebc5a6ef87 Grouped conv bwd wei add for larger filter and Merge Groupes optimization (#2197)
* Grouped conv bwd wei add two stage instances for larger filter and Merge Groups

* Fix

* fix

* Restore removed instances

---------

Co-authored-by: Rostyslav Geyyer <46627076+geyyer@users.noreply.github.com>
2025-05-21 22:47:34 +02:00
Aviral Goel
fa39c4e798 Add Doxygen Documentation for HostTesnor, HostTensorDescriptor, DeviceMem, FillUniformDistribution (#2160)
* added documentation for HostTensorDescriptor

* added documentation for DeviceMem and FillUniformDistribution

* fixed merging error

* fixed host_tensor_descriptor error

* clang format
2025-05-21 10:34:30 -07:00
Yanxing-Shi
a0f1615c09 Merge remote-tracking branch 'upstream/develop' into support_engine_benchmark 2025-05-21 09:48:23 +00:00
Yanxing-Shi
bb66c2af3e disable warning output & enable default config 2025-05-21 09:47:57 +00:00
Aviral Goel
990d645578 added gemm universal example in readme (#2216) 2025-05-20 15:35:07 -07:00
SamiAario-AMD
380bca2b85 Fix 11_add_rmsnorm2d_rdquant (#2207) 2025-05-20 15:15:28 -07:00
Thomas Ning
1386924749 Add the instances for small sized GEMM in preshuffle and improve CMake Flag (#2212)
* Add small instance, add the bug fix, & improve the example CMake

* clang format
2025-05-20 15:05:08 -07:00
Yanxing-Shi
1bd07d12fc Merge remote-tracking branch 'upstream/develop' into support_engine_benchmark 2025-05-20 16:09:19 +00:00
Yanxing-Shi
4dcbc7e3d8 fix jenkins & delete extra header 2025-05-20 16:08:30 +00:00
Yanxing-Shi
3506722e6a add benchmark gemm executable 2025-05-20 15:41:19 +00:00
Sami Remes
d1e6f0982d [CK_TILE] Grouped GEMM tile loop (#2146)
* Add trait to use a persistent kernel and split the entrypoints in grouped gemm

* Some helper functions for persistent kernel case

* Get max occupancy grid using device properties

* Implement tile loop in main entry point to grouped gemm

* Enable GridSize() on device

* Handle offset tile index using real current block index

* Add persistent kernel choice to grouped gemm example

* Use a for-loop for iterating over the group

* Reduce VGPR spills by early-exit

* Enable persistent kernel choice in grouped_gemm example

* Add persistent kernel option to grouped_gemm test

* Fix formatting with remod.py

* Remove GridUpdateBlocks as blocks are now iteratively computed

* Add comment about VGPR spilling

* Fix formatting

* Use CK_TILE_HOST instead of __host__

* Enable all Row/Col combinations in grouped gemm unit test

* Add some KBatch=2 cases to grouped gemm tests

* Fix SplitK for grouped gemm

* Enable pipeline hotloop/tailnumber selection in-kernel for grouped gemm

* Add type traits

* Split examples to regular and tileloop

* Formatting

* Use hipExtStreamGetCUMask to get current active CUs for the given stream

* Align test and example kernel config, and disable validation for splitk repeats

* Remove debug options from CMakeLists.txt

* Separate the code paths for persistent/non-persistent in test

* Fix formatting

* Address review comments

---------

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>
2025-05-20 17:18:57 +03:00
Yanxing-Shi
3e66716bd0 merge develop 2025-05-20 09:01:24 +00:00
Yanxing-Shi
ee6b7f9246 add kernel instance object 2025-05-20 08:57:48 +00:00
Aviral Goel
c4929225f6 remove debug statements from CMakeLists (#2204) 2025-05-19 17:31:04 -07:00
Jan Patrick Lehr
0970f22221 [CMake] Disable newly added compiler warning -Wnrvo (#2210)
Recently a new warning was added to Clang to warn when no copy-elision
on return happens. That prevents our CK build. This disables the
warning.
2025-05-19 17:30:15 -07:00
jefyang1
f18170064d Use new mfma instructions for FP8 on gfx950 (#2202)
* Add logic to use new mfma instructions for fp8 bf8

* Fix example_gemm_xdl_fp8_pk_i4_bpreshuffle_v3 on gfx950 and run clang format

* Update include/ck/tensor_operation/gpu/warp/xdlops_gemm.hpp

Co-authored-by: Andriy Roshchenko <107577548+andriy-ca@users.noreply.github.com>

* Fix intrin_mfma f8 calls due to merge mistake

---------

Co-authored-by: Andriy Roshchenko <107577548+andriy-ca@users.noreply.github.com>
2025-05-19 17:29:51 -07:00
Andriy Roshchenko
57e0f5df29 MX GEMM - Expand MX MFMA Testing to BF8, FP6, and BF6 Data Types (#2199)
* Unify test interface for different layouts.

* WIP: Introducing FP4/FP6/FP8 abstractions

* WIP: Introducing packed storage abstraction

* WIP: Introducing packed storage abstraction

* WIP: Improved support for FP6 data type

* Refactor packed storage for f6_t

* WIP: FP6 MFMA test

* Test if we correctly represent all FP6/FP4 numbers

* Additional output for failed FP4 test.

* More failing conversion tests

* Even more failing conversion tests

* Working FP6 MFMA tests

* Expand MX MFMA testing to BF8/6

* Update and verify MX MFMA test for packed types

* Fix fp4 and fp6 conversions on host

* Working MX MFMA tests for FP8/6/4

* Cleanup

* Add missing type

* Cleanup

* Final cleanup

* Restrict FP6/4 values output to CK_LOGGING=1

* Use CHAR_BIT instead of number 8

* Fix typo

* Remove FP6 and FP4 from the list of native types

---------

Co-authored-by: Rostyslav Geyyer <rosty.geyyer@amd.com>
2025-05-19 16:52:51 -05:00
jefyang1
b8b12bb81e Fix example_grouped_gemm_multiple_d_xdl_fp16 on gfx950 (#2203)
* Fix example_grouped_gemm_multiple_d_xdl_fp16 on gfx950

* Run clang format
2025-05-19 14:25:50 -07:00
Yanxing-Shi
5b83f76eb0 fix codegen bug 2025-05-19 14:03:16 +00:00
Yanxing-Shi
b3caa67694 fix csv bug 2025-05-19 11:44:26 +00:00
Yanxing-Shi
9897410acf refactor profiler 2025-05-19 10:42:57 +00:00
Bartłomiej Kocot
6342f6b5e8 Restore oddc instances (#2201) 2025-05-16 18:42:02 -07:00
arai713
5b3430b868 Narrowing error fix for codegen compilation (#2194)
* removed comment with special characters

* fix for arg/template change after merge from develop

---------

Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>
2025-05-16 11:11:54 -07:00
Illia Silin
40668c9a99 Build and store CK library deb package for all targets daily. (#2196)
* generate and store library package for all targets

* use ninja to build packages for all targets

* make sure to use ftime-trace when using ninja

* make sure build trace only runs on gfx9

* archive lib package and stash only library package
2025-05-16 07:40:53 -07:00
Yanxing-Shi
c821b1253a format python 2025-05-16 10:41:30 +00:00
Yanxing-Shi
adeb051095 Merge remote-tracking branch 'upstream/develop' into support_engine_benchmark 2025-05-16 10:38:59 +00:00
Yanxing-Shi
012c77125a recover benchmark_gemm and fix 2025-05-16 10:37:59 +00:00
Mateusz Ozga
fa3c6811d8 Disable conv for Filter1x1Stride1Pad0 when K or C is even (#2186) 2025-05-16 10:18:47 +02:00
Po Yen Chen
791802b381 [CK_TILE] fMHA batch_prefill block index & logits soft-capping optimizations (#2198)
* Write soft-sign in inline asm

* Change tile idx computation

* Add macro to turn off soft-sign asm opt

* Use simple for loop to avoid register spill

* Only do block id transform for masking cases
2025-05-16 15:14:46 +08:00
Yanxing-Shi
fb63cd5923 merge 2025-05-16 06:55:32 +00:00
Po Yen Chen
8cb0474b3d Use only qr_async pipeline for batch_prefill (#2195) 2025-05-15 11:47:29 -07:00
Khushbu Agarwal
3d8d6e75e4 Adding validation for tile sizes in Tile Engine (#2189)
* Adding validation for tile sizes

* Add architecture in config, and shuffle lines of code in warp_gemm.hpp

* Enable MFMA for gfx950, and invalid tile handling
2025-05-15 10:28:31 -07:00