Illia Silin
4cfc157a28
Add compiler flags for ROCm versions 6.2+ ( #1429 )
...
* add compiler flags to fix compiler issues
* fix typo.
* disable test_smfmac_op on all devices except gfx942
* specify full path to compiler in CI
[ROCm/composable_kernel commit: d311c95396 ]
2024-08-01 08:27:52 -07:00
Sam Wu
962232deb9
Update doc requirements ( #1423 )
...
[ROCm/composable_kernel commit: 6648fd3b04 ]
2024-07-31 07:42:42 -07:00
zjing14
b7c617b665
[HotFix] Fixed a typo in profile_gemm_multiply_multiply ( #1425 )
...
* fixed a typo
* clean
---------
Co-authored-by: Jing Zhang <jizhan@fb.com >
[ROCm/composable_kernel commit: f31e8dfa80 ]
2024-07-31 07:19:17 -07:00
arai713
603f44b82c
Codegen: isSupportedArgument check ( #1417 )
...
* added isSupportedArgument check into codegen device op
* adding function call
* remove commented code
[ROCm/composable_kernel commit: d32997a792 ]
2024-07-31 07:12:15 -07:00
carlushuang
3fdf17947f
workaround rocm-6.2 compiler issue ( #1421 )
...
[ROCm/composable_kernel commit: b3f86e79dd ]
2024-07-31 16:03:59 +08:00
Illia Silin
47f355e6fe
add docker for rocm6.2_rc4 compiler ( #1424 )
...
[ROCm/composable_kernel commit: b527cad4a5 ]
2024-07-30 11:55:33 -07:00
Bartłomiej Kocot
aa83424e9c
Revert Revert Support access per groups and filter2x3 in grouped conv fwd ( #1382 ) ( #1406 ) ( #1415 )
...
[ROCm/composable_kernel commit: 33b399cc15 ]
2024-07-30 18:36:04 +02:00
dependabot[bot]
e48dea2908
Bump rocm-docs-core from 1.6.0 to 1.6.1 in /docs/sphinx ( #1420 )
...
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core ) from 1.6.0 to 1.6.1.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.6.0...v1.6.1 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
[ROCm/composable_kernel commit: b9ba5b2676 ]
2024-07-26 14:47:19 -07:00
trixirt
059252be31
Introduce cmake USE_GLIBCXX_ASSERTIONS option ( #1404 )
...
A standard option in Fedora packaging that is used to check
the correctness of c++ use of the standard c++ library.
Signed-off-by: Tom Rix <trix@redhat.com >
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
[ROCm/composable_kernel commit: 733f33af78 ]
2024-07-25 19:28:17 -07:00
zjing14
87e7be2845
Add rotating buff for gemm_multi_d ( #1411 )
...
* add rotating_buff for gemm_multi_d
* format
* Update flush_cache.hpp
* Update gtest.cmake
---------
Co-authored-by: Jing Zhang <jizhan@fb.com >
Co-authored-by: Haocong WANG <haocwang@amd.com >
[ROCm/composable_kernel commit: 105bd708c7 ]
2024-07-25 23:21:21 +08:00
dependabot[bot]
7d27dd11b8
Bump rocm-docs-core from 1.5.1 to 1.6.0 in /docs/sphinx ( #1416 )
...
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core ) from 1.5.1 to 1.6.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.5.1...v1.6.0 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
[ROCm/composable_kernel commit: 1208082e53 ]
2024-07-24 22:56:29 -07:00
Andriy Roshchenko
df929c14be
Adding more instances of grouped convolution 3d forward for FP8 with ConvScale+Bias element-wise operation. ( #1412 )
...
* Add CMakePresets configurations.
* Add binary elementwise ConvScaleAdd and an example.
* Numerical verification of results.
Observed significant irregularities in F8 to F32 type conversions:
```log
ConvScaleAdd: float=145.000000 f8_t=160.000000 e=144.000000
ConvScaleAdd: float=97.000000 f8_t=96.000000 e=104.000000
ConvScaleAdd: float=65.000000 f8_t=64.000000 e=72.000000
```
* Implemented ConvScaleAdd + Example.
* Add ConvScale+Bias Instances
* Add Client Example for ConvScale+Bias
* Fix number of bytes in an example..
* Cleanup.
[ROCm/composable_kernel commit: 4a8a1befd5 ]
2024-07-24 15:49:55 -05:00
Bartłomiej Kocot
5bb30d4077
Add support for half_t and bfloat to reduction operations ( #1395 )
...
* Add support for half_t and bfloat to reduction operations
* Fix bhalf convert
* Next fix bf16
[ROCm/composable_kernel commit: ffabd70a15 ]
2024-07-24 12:12:37 -05:00
dependabot[bot]
fed6f87e5c
Bump rocm-docs-core from 1.5.0 to 1.5.1 in /docs/sphinx ( #1414 )
...
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core ) from 1.5.0 to 1.5.1.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.5.0...v1.5.1 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
[ROCm/composable_kernel commit: 33b2a2bdf5 ]
2024-07-24 07:10:50 -07:00
Haocong WANG
2596f81b45
disable bad instance ( #1410 )
...
[ROCm/composable_kernel commit: d22713a719 ]
2024-07-23 09:05:03 -07:00
Bartłomiej Kocot
ae1cdf5056
Revert Support access per groups and filter2x3 in grouped conv fwd ( #1382 ) ( #1406 )
...
[ROCm/composable_kernel commit: 5d8c3d8190 ]
2024-07-22 14:21:24 +02:00
Haocong WANG
be31f1ddf3
[GEMM] F8 GEMM, performance optimized. ( #1384 )
...
* add ab_scale init support
* enabled interwave
* add scale type; update isSupport
* adjust example
* clean
* enable f8 pure gemm rcr ckprofiler
* Add gemm_multiply_multiply instances
* clang format
* Optimize for ScaleBlockMNK=128
* enable abscale f8 gemm ck profiler
* Add pure f8 gemm test suite
* Reverting to the state of project at f60fd77
* update copyright
* clang format
* update copyright
---------
Co-authored-by: root <jizhan@amd.com >
[ROCm/composable_kernel commit: 8c90f25be3 ]
2024-07-19 22:06:52 +08:00
ltqin
b12dae8c68
Universal gemm splitk using reduce (with multi-d) ( #1341 )
...
* init for reduce_threadwise multi_d
* add reduce_threadwise_multi_d
* add reduce_multi_d
* clean
* start add an other splitk device op
* add reduce template parameter to SplitKBatchOffset
* add reduce c matrix
* clean up code
* change example data type to bf16
* add bf16Ai8B example
* remove reduce template parameter
* add splitk atomic status to v4
* example add multi d parameters
* device op add multi-d parameters
* add multi-d to reduce
* fix kbach=1 bug
* change B layout to col in bf16Ai8B example
* remove float adding struct
* change multi-d interface
* change file and class name
* remove multi-d of bf16Ai8B example
* change IsReduce function to IsReduceAdd
* change example layout to RRR from RCR
* according layout to set ds stride
* reset parameter layout
* add gemm universal reduce instance
* add reduce factory
* add profile_gemm_universal_reduce
* add reduce to profiler
* fix reduce instance
* fix profiler reduce compiling bug
* format
* format library instance code
* add mem instance for reduce library
* fix call instance names
* add workspace for reduce in ckProfiler
* format
* add mnpading to reduce library instance
* add fp16 instance to reduce of profiler
* change copyright time
* restore profiler cmake file
* add reduce text to instances
* add DsLayout and DsDataType to instances template parameter
* fixed gemm_reduce_multi_d
* add an example without multi_d
* Update common.hpp
* Update gtest.cmake
* Update gemm_xdl_splitk_reduce_bf16.cpp
* clean
* Update gtest.cmake
* format
* fixe api
* format
* default parameter change to RRR
* add vector_len for multi_d
* format
* Update gtest.cmake
* fix bf16A iBB elementwiseop
* add ReduceDataType
* move ReduceDataType to end position
* format
* remove googletest git method address
* fix copyright time
* update init data
---------
Co-authored-by: root <jizhan@amd.com >
Co-authored-by: letaoqin <letaoqin@amd.com >
Co-authored-by: Jing Zhang <jizhan@meta.com >
Co-authored-by: zjing14 <zhangjing14@gmail.com >
[ROCm/composable_kernel commit: c544eb4da0 ]
2024-07-19 22:01:22 +08:00
Bartłomiej Kocot
321353f5a7
Refactor transform conv to gemm fwd ( #1391 )
...
* Refactor transform conv to gemm fwd
* fixes codegen
* wmma fixes
* fix wmma
* Fix copyright
[ROCm/composable_kernel commit: 70a814f163 ]
2024-07-19 09:29:25 +02:00
Illia Silin
a2f40a3276
add docker for rocm6.2_rc3 ( #1401 )
...
[ROCm/composable_kernel commit: ab250afda0 ]
2024-07-18 09:41:33 -07:00
Qianfeng
03fb9b9435
Replace the using of __expf by __ocml_exp_f32 to work-around the test_softmax_rank4 failure ( #1394 )
...
[ROCm/composable_kernel commit: ee768148f0 ]
2024-07-17 09:15:05 -07:00
Mateusz Ozga
118c9e7c28
An option whether to colorize output during build ( #1390 )
...
[ROCm/composable_kernel commit: 9cac282793 ]
2024-07-16 09:52:44 -07:00
Illia Silin
132beacaa6
[ASAN builds] Modify the list of default targets for ASAN builds. ( #1389 )
...
* add a build parameter to build only XNACK targets
* use ENABLE_ASAN_PACKAGING flag to set targets for ASAN builds
---------
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com >
[ROCm/composable_kernel commit: 4c3107fdcb ]
2024-07-16 09:19:23 -07:00
Andriy Roshchenko
ece7edc492
Adding more instances of grouped convolution 3d forward for FP8 with ConvScale element-wise operation and ReLU activation. ( #1386 )
...
* Add CMakePresets configurations.
* Add ConvScale+ReLU Functor and an Example
* Account for ReLU FLOPs.
* Add instances of 3D convolutions with ConvscaleRelu operation.
* Implement Client Example
* Cleanup
[ROCm/composable_kernel commit: 802a8a1df1 ]
2024-07-16 08:51:49 -07:00
Haocong WANG
8073a8d846
Disbale failed instance in rocm6.2 rel ( #1388 )
...
[ROCm/composable_kernel commit: 1ff4f25138 ]
2024-07-16 08:46:48 -07:00
Illia Silin
fdb75dcc1b
add Rosty and Bartek to code owners ( #1392 )
...
[ROCm/composable_kernel commit: eca39050c6 ]
2024-07-16 23:44:46 +08:00
Bartłomiej Kocot
c885afdaae
Support access per groups and filter3x3 in grouped conv fwd ( #1382 )
...
* Support access per groups and filter3x3 in grouped conv fwd
* Fixes for large cases
* Fixes for large tensors
[ROCm/composable_kernel commit: 82e8a78a3f ]
2024-07-12 11:08:42 -07:00
zjing14
fcb0e225db
add gemm_bias_add example ( #1361 )
...
* add gemm_bias_add example
* changed strideD
* clang-format
---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
[ROCm/composable_kernel commit: 13c1e64daa ]
2024-07-11 18:08:07 -07:00
Rostyslav Geyyer
31e30c0e4e
Add instances for grouped conv fwd 3d with ConvScale for bf8@fp8->fp8 ( #1369 )
...
* Add an example
* Add instances
* Add a client example
[ROCm/composable_kernel commit: 7a46a91c84 ]
2024-07-11 13:31:39 -07:00
Illia Silin
4488950d01
Add CK_TILE tests to daily CI builds. ( #1381 )
...
* add ck_tile tests to CI
* build and run ck_tile tests on gfx90a and gfx942 in parallel
* fix groovy syntax
* turn ck_tile tests OFF by default
* skip creating the build folder
* build ck_tile examples with 64 threads
* build ck_tile examples with cmake-ck-dev.sh script
* add video group to docker on mi300
* do not retry to rebuild the early CI stages
* help prevent jenkins false failure
* restore cron trigger
[ROCm/composable_kernel commit: 98a01bbc72 ]
2024-07-11 13:22:40 -07:00
Illia Silin
3b49b68660
[Jenkins] restore cron jobs ( #1380 )
...
* test the cron trigger
* fix the cron jobs
* restore the list of cron jobs
[ROCm/composable_kernel commit: f914c228c6 ]
2024-07-11 10:28:11 -07:00
Illia Silin
80960d5c55
[gfx12] add gfx12 to the default target list ( #1379 )
...
[ROCm/composable_kernel commit: a8eb872055 ]
2024-07-10 14:54:04 -07:00
Sam Wu
a7496b8dae
Update changelog release headers ( #1378 )
...
* Update doc codeowner syntax
* Add doc link to changelog
* Update changelog formatting for markdownlint
Also change headings for releases
[ROCm/composable_kernel commit: 860f957c22 ]
2024-07-10 09:36:10 -06:00
dependabot[bot]
6da72dc223
Bump rocm-docs-core from 1.4.1 to 1.5.0 in /docs/sphinx ( #1374 )
...
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core ) from 1.4.1 to 1.5.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.4.1...v1.5.0 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Sam Wu <22262939+samjwu@users.noreply.github.com >
[ROCm/composable_kernel commit: da42a88964 ]
2024-07-09 12:48:23 -07:00
carlushuang
d50bc4c39d
update owner ( #1377 )
...
* remove zjing14, add poyenc
* remove yigex
[ROCm/composable_kernel commit: ccfdc53022 ]
2024-07-09 20:30:07 +08:00
Illia Silin
8dab3d4d1e
Fix the cmake logic when building with INSTANCES_ONLY=ON. ( #1376 )
...
* fix the cmake logic when building for various targets
* another minor fix
[ROCm/composable_kernel commit: a328df25a1 ]
2024-07-08 21:21:16 -07:00
carlushuang
fca10f32d9
[CK_TILE] wa prec, remove sgpr offset for inline asm ( #1356 )
...
* wa prec, remove sgpr offset for inline asm
* macro for set tile
* ignore unused param if no kernel instances in host API
* fix more prec issue
* cache buffer resource
* fix
* support pre-nop
* clear tile by vector type members
* add workaround to reduce scratch memory
* conditionally enable workaround code
* enable workaround start from certain build version
* fallback set_tile() implementation from certain build version
* undo template argument changes
* put dummy asm in load_raw()
* fix comments, refactor s_nop inside buffer_load
---------
Co-authored-by: PoYen, Chen <PoYen.Chen@amd.com >
[ROCm/composable_kernel commit: 8182976c37 ]
2024-07-08 11:09:55 -07:00
Andriy Roshchenko
25aa95786a
Add ckProfiler support for forward 3D convolutions with OUT element-wise operations. ( #1354 )
...
[ROCm/composable_kernel commit: eb44e0472a ]
2024-07-08 10:55:54 -07:00
Harisankar Sadasivan
45802765e0
Universal streamk with atomics ( #1360 )
...
* universal streamk with atomics with ckprofiler support. grid_size and streamk strategy are tunable. grid_size of -1 leads to #WGs = maximum occupancy X num_CUs. implementation supports many different streamk policies: 1-tile, 2-tile, 3-tile and 4-tile. streamk strategy of -1 leads to default streamk policy (4-tile).
* Update README.md
* fixing clang-format issues
* removed conflicts in struct members between streamk and universal streamk
* corrected arg parsing for streamk and universal streamk
* added stream-k policies for 3 tile and 4 tile
* fixed argument type issue with parsing cmd args
* changes suggested in PR review are made- removing comments and correcting copyright
* file permissions updated
* added default value support for grid_size and streamk-policy selection set to -1
* print messages for arguments
* print messages for arguments
* print messages for arguments1
[ROCm/composable_kernel commit: 75e622f02f ]
2024-07-05 21:40:30 -07:00
jakpiase
605ac804c4
Add structural sparsity xdlops ( #1363 )
...
* Implemented smfmac xdlops
* add reviewer comments
[ROCm/composable_kernel commit: eaa870a1ab ]
2024-07-04 12:00:14 +02:00
Jun Liu
fa73739812
Fix issue with multiple targets and remove smfmac tests from unsupported test targets ( #1372 )
...
[ROCm/composable_kernel commit: 959073842c ]
2024-07-03 23:34:38 -07:00
Illia Silin
35bbee7130
fix the optional ckProfiler grouped_gemm arguments ( #1368 )
...
[ROCm/composable_kernel commit: 497ccb872b ]
2024-06-28 06:50:46 -07:00
dependabot[bot]
84035df87e
Bump rocm-docs-core from 1.4.0 to 1.4.1 in /docs/sphinx ( #1367 )
...
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core ) from 1.4.0 to 1.4.1.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.4.0...v1.4.1 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
[ROCm/composable_kernel commit: 614ebd050a ]
2024-06-27 22:14:36 -07:00
Ruturaj Vaidya
7c52c2ca9b
Update CMakeLists.txt ( #1364 )
...
It is a good practice to check if the file CMakeLists.txt is in fact in the directory.
[ROCm/composable_kernel commit: 2525864fda ]
2024-06-27 12:34:25 -07:00
Illia Silin
bdb9044869
Adding a private docker for ROCm6.2 release candidate. ( #1365 )
...
* add private docker for rocm6.2_rc1
* update dockerfile
[ROCm/composable_kernel commit: fafa567b3c ]
2024-06-27 11:09:00 -07:00
alexxu-amd
0e3ca6a9b8
remove PR trigger for now due to high cost ( #1329 )
...
[ROCm/composable_kernel commit: 3bb0fe6c7e ]
2024-06-27 09:57:58 -04:00
jakpiase
b9523bfc3f
Add structural sparsity gemm instruction tests ( #1309 )
...
* first version of smfmac test
* add reviewer comments
* add reviewer suggestions
[ROCm/composable_kernel commit: ed21948bcd ]
2024-06-27 11:30:32 +02:00
Illia Silin
57ae3ae99f
Merging the gfx12 code into public repo. ( #1362 )
...
[ROCm/composable_kernel commit: 941d1f7ce0 ]
2024-06-27 00:33:34 -07:00
Po Yen Chen
03d122e88f
Replace hipDeviceSynchronize() by hipStreamSynchronize(stream) calls ( #1359 )
...
[ROCm/composable_kernel commit: a32b1bc647 ]
2024-06-26 22:04:52 +08:00
Po Yen Chen
87bdfdbf64
[CK_TILE] fmha forward split-kv + combine kernels ( #1338 )
...
* FA fwd dropout
* FA bwd
* epilogue reuse
* CMakeLists update
* [CK_TILE] support alibi (#1269 )
* add alibi support
* fix code
* update code based on comment
* Support more hdim
* fix fp8 bias
* support seqlen_k=0 case
* remove unused printf
* fix format
---------
Co-authored-by: rocking <ChunYu.Lai@amd.com >
* now fwd/bwd can build
* bwd alibi
* add bwd validation stream_config
* update generated filenames
* update bwd kernel launch
* CK_TILE_HOST_DEVICE in philox
* Transpose -> transpose
* format
* format
* format
* Generate the instance for FA required
* format
* fix error in WarpGemm
* Add num_splits option and dummy split-kv api method
* Generate fmha_fwd_splitkv()
* Add SplitKV kernel codegen logics
* Add SplitKV combine kernel codegen logics
* Fix mismatched return type
* Clean-up code
* Replace sentinel value before storing
* Fix wrong layout of LSE/LSEacc/Oacc
* Format codes
* Fix o_acc memory error
* Fix wrong kBlockSize used in policy
* Reduce # of combine kernels
* Fix split-kv combine kernel name
* Fix wrong LDS indexing logics
* Fix wrong loop counter step logic
* Undo vector size changes
* Remove no-longer used field
* Remove in-consistent comment
* Remove debug statements in example
* Remove more debug statements
* Add constness to local variables
* Clearn up generate.py
* Fix unstable clang-format comment
* Remove unused include directive
* Use shorter template parameter name
* Enable non-split-kv blobs
* Update license date
* Print num_splits conditionally
* Undo disabling data types
* Remove unnessary tile size for fp8
* Fix wrong pipeline args for fp8
* Fix example output format
* Remove more debug code in combine pipeline
* Add stride kernel arguments for LSE/O acc workspace
* Re-order split-kv pipeline call operator arguments
* Pass LSE/O strides in kernel argument
* Re-order pipeline call operator arguments
* Use tensor_descriptor to locate LSEacc elements
* Support providing invalid element for tensor view
* Set invalid element value for LSEacc tensor view
* Remove hand-written store_tile() code
* Remove necessary value-overwrite logic
* Add transposed lds descriptor
* Support load_tile() for tile_window_with_static_lengths<>
* Undo removing necessary value-overwrite logic
* Use read descriptor to locate lds elements
* Simplify pipeline source code
* Add constraint to kMaxSplits
* Default use kMaxSplits=64 in generate.py
* Revert "Add constraint to kMaxSplits"
This reverts commit 0a2132d758 .
* Revert "Default use kMaxSplits=64 in generate.py"
This reverts commit c7d9c80b77 .
* Decide alignment by the padding parameter
* Remove no-longer used utility functions
* Remove not-working code
* Add comment & remove no-longer used code
* Fix computation errors
* Add heuristic to override num_splits option
* Add constraint to kMaxSplits
* Fix compilation error
* Clean up pipeline code
* Wrap pointer access as lambda function
* Rename confusing methods
* Use kLogMasSplits as template parameter
* Finish splitkv combine kernel codegen
* Update kMaxSplits limit
* Use smaller kM0 for splitkv combine kernel
* Ignore droupout flag in splitkv pipeline
* Unify flag usage
* Add back flag kStoreLSE
* Merge lambda calls in pipeline
* Fix compilation errors
* Avoid all empty splits
* Always check for empty loop in splitkv pipelines
* Re-order parameters
* Remove redundant p_drop option check
* Add traits/problem for fwd splitkv kernel
* Conditionally enable uneven split boundary checks
* Add comment for the splitkv traits field
* Change even split criteria
* Re-order statements
* Refine occupancy value for hdim=128&256
* Refine occupancy value for hdim=32&64
* Remove redundant kernel argument
* Separate fmha bwd codegen logics
* Separate fmha fwd codegen logics
* Remove redundant direction parameter in fwd&bwd codegen logics
* Support generate multiple APIs for an example
* Let 'api' an alias of 'direction' option
* Remove choices for the 'direction' option
* Use dictionary to config all the functions
* Move fmha splitkv codegen logics to other file
* Add fwd_splitkv api for tile_example_fmha_fwd
---------
Co-authored-by: danyao12 <danyao12>
Co-authored-by: carlushuang <carlus.huang@amd.com >
Co-authored-by: rocking <ChunYu.Lai@amd.com >
Co-authored-by: Jing Zhang <jizhan@amd.com >
[ROCm/composable_kernel commit: 0cb2e06ddc ]
2024-06-26 17:41:15 +08:00