PoYen, Chen
3850028bca
Make comment more clear
2024-08-20 14:10:16 +00:00
PoYen, Chen
8745f5f8bb
Fix kvcache seqlen_k generating logic
2024-08-20 14:04:03 +00:00
PoYen, Chen
6230a78c9e
Find executable from folder automatically
2024-08-20 07:54:37 +00:00
PoYen, Chen
d88ccc1a98
Merge branch 'develop' into feature/fmha-fwd-appendkv
2024-08-20 06:07:02 +00:00
PoYen, Chen
e44852bf2a
Re-format script
2024-08-20 06:02:34 +00:00
dependabot[bot]
f48529b511
Bump rocm-docs-core from 1.7.0 to 1.7.1 in /docs/sphinx ( #1475 )
...
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core ) from 1.7.0 to 1.7.1.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.7.0...v1.7.1 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-08-19 23:02:07 -07:00
PoYen, Chen
e23e6b57ec
Fix compilation errors
2024-08-20 05:59:46 +00:00
PoYen, Chen
eb1c8a26fa
Merge branch 'develop' into feature/fmha-fwd-appendkv
2024-08-20 05:49:01 +00:00
PoYen, Chen
ee1445da23
Re-order seqlen_k_start adjustment logics
2024-08-19 20:14:45 +00:00
PoYen, Chen
60fe251b78
Revert "Only randomize kvcache seqlen_k if 1 < batch"
...
This reverts commit b9a4ab0d7e .
2024-08-19 20:10:36 +00:00
Bartłomiej Kocot
a6a7966505
Add script to convert MIOpen driver to ckProfiler ( #1472 )
...
* Add script to convert MIOpen driver to ckProfiler
* Fix
2024-08-19 08:24:56 -07:00
PoYen, Chen
40a4d96cf5
Return earlier if split is empty
2024-08-19 10:16:23 +00:00
PoYen, Chen
b9a4ab0d7e
Only randomize kvcache seqlen_k if 1 < batch
2024-08-19 09:28:52 +00:00
PoYen, Chen
8166aa58aa
Fix wrong uneven split checking logics
2024-08-19 08:14:45 +00:00
PoYen, Chen
3f0dab6a77
Revert "Avoid seqlen_k=0 for kvcache"
...
This reverts commit 21c4df89e4 .
2024-08-19 07:50:50 +00:00
PoYen, Chen
21c4df89e4
Avoid seqlen_k=0 for kvcache
2024-08-19 07:13:57 +00:00
PoYen, Chen
3fb77a0ebb
Remove type argument
2024-08-18 19:29:40 +00:00
PoYen, Chen
f37cd416e3
Fix typo in comment
2024-08-18 19:07:36 +00:00
PoYen, Chen
e8cd975d6a
Fix compilation errors
2024-08-18 19:00:05 +00:00
PoYen, Chen
9d5c33da04
Add more comments
2024-08-18 18:56:37 +00:00
PoYen, Chen
90c2008fe5
Add comment
2024-08-18 18:42:31 +00:00
PoYen, Chen
8a856f57ab
Add seqlen_q & seqlen_k rules
2024-08-18 18:38:08 +00:00
PoYen, Chen
a93c5e820f
Rename parameter
2024-08-18 18:37:25 +00:00
PoYen, Chen
4cd3432361
Avoid using too small rotary_cos & rotary_sin
2024-08-18 18:27:37 +00:00
PoYen, Chen
e5db71cc59
Use randomized seqlen_k for kvcache
2024-08-18 17:42:32 +00:00
PoYen, Chen
996f46b0d1
Randomize seqlen_k if use kvcache
2024-08-18 17:31:51 +00:00
PoYen, Chen
3d3d73bee2
Fix wrong parameter name
2024-08-18 17:25:39 +00:00
PoYen, Chen
48b7a5bad2
Fix mode overriding logics
2024-08-18 14:44:28 +00:00
PoYen, Chen
05157bf3a3
Force batch mode when invoking appendkv & splitkv apis
2024-08-18 06:05:42 +00:00
PoYen, Chen
cc52587bcc
Remove macro checking
2024-08-18 05:50:51 +00:00
PoYen, Chen
6b361f5a4b
Clarify the case in warning message
2024-08-18 00:42:22 +00:00
PoYen, Chen
c30d7f9d29
Remove 0 < seqlen_knew constraint
2024-08-16 22:14:05 +00:00
PoYen, Chen
352f6d58b0
Merge branch 'feature/fmha-fwd-appendkv' of github.com:ROCm/composable_kernel into feature/fmha-fwd-appendkv
2024-08-16 22:13:02 +00:00
Illia Silin
c8b6b64240
Re-enable fp8 types for all architectures. ( #1470 )
...
* re-enable fp8 and bf8 for all targets
* restore the fp8 gemm instances
* re-enable conv_3d fp8 on all architectures
* diasble several fp8 gemm instances on all architectures except gfx94
* clang format fix
2024-08-16 16:07:52 -06:00
Dan Yao
79a5d9c10c
[CK_TILE] FA bwd kernels optimization ( #1397 )
...
* tmp save
* fix batch deterministic bugs
* fix group deterministic bugs
* codegen update
* reorder files
* bias support
* hd256 bias support
* bwd smoke test update
* simplify convert dq
* fix hd256 dropout scratch
* do{}while() -> while(){}
* comments
* remove FmhaBwdTilePartitioner
* save clear_tile
* refactor dropout
* code cleanup
* code cleanup
* comments
* fix epilogue problem
* fix fwd dropout
* group convert_dq opt
* fix dq alignment
* Do not store storerandval in bwd for flash attention integration
* fix hd32 error and boost performance
* revert
* Remove duplicated WarpGemm definitions in the policy file
* dropout patch for mrepeat 16*16
* code sync up
* dq_acc stride
* dq_acc stride stuff
* codegen update
* fwd dropout revert
* fix hd128 scratches and boost performance
* receipt 3 for simplified smoke test
* more strides for fa integration
* fix hd64 scratches and boost performance
* non-iglp pipeline for headdim padding cases
* dpad same as dvpad for flash attention integration
* unpadded lse&d for group mode
* Support unpad layout for group lse
* Support unpad lse layout for splitkv
* Fix stride for splitkv kernel
* fix unpadded lse issue in fwd splitkv
* comment
* solve lds read&write conflicts
* rename
* bias rename
* tile index revert
---------
Co-authored-by: danyao12 <danyao12>
Co-authored-by: rocking <ChunYu.Lai@amd.com >
Co-authored-by: Qianfeng Zhang <Qianfeng.Zhang@amd.com >
2024-08-16 13:40:10 -07:00
rocking
34fea2935a
Fix unexisted attribute
2024-08-16 20:30:49 +00:00
PoYen, Chen
d52278a5ef
Add more case for appendkv
2024-08-16 18:23:55 +00:00
PoYen, Chen
d3fd64cd26
Add more appendkv test
2024-08-16 18:03:28 +00:00
PoYen, Chen
51062cae0b
Merge remote-tracking branch 'origin/develop' into feature/fmha-fwd-appendkv
2024-08-16 16:47:06 +00:00
PoYen, Chen
41fdf9b2bc
Fix compilation error
2024-08-16 16:39:11 +00:00
PoYen, Chen
43b8100b7f
Support cache_batch_idx in example
2024-08-16 16:27:56 +00:00
PoYen, Chen
9c904b0e4c
Pass cache_batch_idx to kernels
2024-08-16 15:32:24 +00:00
Bartłomiej Kocot
2581727d2a
Add performance and large tensor tests for grouped conv ( #1456 )
...
* Add performance and large tensor tests for grouped conv
* Resize tests
* Resize tests
* update the python script to parse the grouped_conv results
* Remove int8 tests
* change bwd wei layout
---------
Co-authored-by: illsilin <Illia.Silin@amd.com >
2024-08-16 07:48:30 -07:00
PoYen, Chen
e6239e14f7
Re-organize bash functions
2024-08-16 12:46:16 +00:00
PoYen, Chen
2523c8e36c
Fix more format
2024-08-16 10:32:17 +00:00
PoYen, Chen
5728c0be65
Fix formatting
2024-08-16 10:25:46 +00:00
PoYen, Chen
095819a387
Remove options
2024-08-16 10:22:44 +00:00
PoYen, Chen
f2b3620511
Use meaningful options in smoke test
2024-08-16 10:18:14 +00:00
PoYen, Chen
aadd3ec63e
Fix wrong syntax in skcheck expr
2024-08-16 10:09:46 +00:00
PoYen, Chen
a4c6029a3d
Fix skcheck logic
2024-08-16 10:08:01 +00:00