PoYen, Chen
|
73378ff95d
|
Fix wrong knew/vew appending logic on host
|
2024-08-20 23:58:47 +00:00 |
|
PoYen, Chen
|
3850028bca
|
Make comment more clear
|
2024-08-20 14:10:16 +00:00 |
|
PoYen, Chen
|
8745f5f8bb
|
Fix kvcache seqlen_k generating logic
|
2024-08-20 14:04:03 +00:00 |
|
PoYen, Chen
|
6230a78c9e
|
Find executable from folder automatically
|
2024-08-20 07:54:37 +00:00 |
|
PoYen, Chen
|
e44852bf2a
|
Re-format script
|
2024-08-20 06:02:34 +00:00 |
|
PoYen, Chen
|
eb1c8a26fa
|
Merge branch 'develop' into feature/fmha-fwd-appendkv
|
2024-08-20 05:49:01 +00:00 |
|
PoYen, Chen
|
60fe251b78
|
Revert "Only randomize kvcache seqlen_k if 1 < batch"
This reverts commit b9a4ab0d7e.
|
2024-08-19 20:10:36 +00:00 |
|
PoYen, Chen
|
b9a4ab0d7e
|
Only randomize kvcache seqlen_k if 1 < batch
|
2024-08-19 09:28:52 +00:00 |
|
PoYen, Chen
|
8166aa58aa
|
Fix wrong uneven split checking logics
|
2024-08-19 08:14:45 +00:00 |
|
PoYen, Chen
|
3f0dab6a77
|
Revert "Avoid seqlen_k=0 for kvcache"
This reverts commit 21c4df89e4.
|
2024-08-19 07:50:50 +00:00 |
|
PoYen, Chen
|
21c4df89e4
|
Avoid seqlen_k=0 for kvcache
|
2024-08-19 07:13:57 +00:00 |
|
PoYen, Chen
|
3fb77a0ebb
|
Remove type argument
|
2024-08-18 19:29:40 +00:00 |
|
PoYen, Chen
|
f37cd416e3
|
Fix typo in comment
|
2024-08-18 19:07:36 +00:00 |
|
PoYen, Chen
|
e8cd975d6a
|
Fix compilation errors
|
2024-08-18 19:00:05 +00:00 |
|
PoYen, Chen
|
9d5c33da04
|
Add more comments
|
2024-08-18 18:56:37 +00:00 |
|
PoYen, Chen
|
90c2008fe5
|
Add comment
|
2024-08-18 18:42:31 +00:00 |
|
PoYen, Chen
|
8a856f57ab
|
Add seqlen_q & seqlen_k rules
|
2024-08-18 18:38:08 +00:00 |
|
PoYen, Chen
|
a93c5e820f
|
Rename parameter
|
2024-08-18 18:37:25 +00:00 |
|
PoYen, Chen
|
4cd3432361
|
Avoid using too small rotary_cos & rotary_sin
|
2024-08-18 18:27:37 +00:00 |
|
PoYen, Chen
|
e5db71cc59
|
Use randomized seqlen_k for kvcache
|
2024-08-18 17:42:32 +00:00 |
|
PoYen, Chen
|
996f46b0d1
|
Randomize seqlen_k if use kvcache
|
2024-08-18 17:31:51 +00:00 |
|
PoYen, Chen
|
48b7a5bad2
|
Fix mode overriding logics
|
2024-08-18 14:44:28 +00:00 |
|
PoYen, Chen
|
05157bf3a3
|
Force batch mode when invoking appendkv & splitkv apis
|
2024-08-18 06:05:42 +00:00 |
|
PoYen, Chen
|
cc52587bcc
|
Remove macro checking
|
2024-08-18 05:50:51 +00:00 |
|
PoYen, Chen
|
6b361f5a4b
|
Clarify the case in warning message
|
2024-08-18 00:42:22 +00:00 |
|
PoYen, Chen
|
c30d7f9d29
|
Remove 0 < seqlen_knew constraint
|
2024-08-16 22:14:05 +00:00 |
|
PoYen, Chen
|
352f6d58b0
|
Merge branch 'feature/fmha-fwd-appendkv' of github.com:ROCm/composable_kernel into feature/fmha-fwd-appendkv
|
2024-08-16 22:13:02 +00:00 |
|
Dan Yao
|
79a5d9c10c
|
[CK_TILE] FA bwd kernels optimization (#1397)
* tmp save
* fix batch deterministic bugs
* fix group deterministic bugs
* codegen update
* reorder files
* bias support
* hd256 bias support
* bwd smoke test update
* simplify convert dq
* fix hd256 dropout scratch
* do{}while() -> while(){}
* comments
* remove FmhaBwdTilePartitioner
* save clear_tile
* refactor dropout
* code cleanup
* code cleanup
* comments
* fix epilogue problem
* fix fwd dropout
* group convert_dq opt
* fix dq alignment
* Do not store storerandval in bwd for flash attention integration
* fix hd32 error and boost performance
* revert
* Remove duplicated WarpGemm definitions in the policy file
* dropout patch for mrepeat 16*16
* code sync up
* dq_acc stride
* dq_acc stride stuff
* codegen update
* fwd dropout revert
* fix hd128 scratches and boost performance
* receipt 3 for simplified smoke test
* more strides for fa integration
* fix hd64 scratches and boost performance
* non-iglp pipeline for headdim padding cases
* dpad same as dvpad for flash attention integration
* unpadded lse&d for group mode
* Support unpad layout for group lse
* Support unpad lse layout for splitkv
* Fix stride for splitkv kernel
* fix unpadded lse issue in fwd splitkv
* comment
* solve lds read&write conflicts
* rename
* bias rename
* tile index revert
---------
Co-authored-by: danyao12 <danyao12>
Co-authored-by: rocking <ChunYu.Lai@amd.com>
Co-authored-by: Qianfeng Zhang <Qianfeng.Zhang@amd.com>
|
2024-08-16 13:40:10 -07:00 |
|
rocking
|
34fea2935a
|
Fix unexisted attribute
|
2024-08-16 20:30:49 +00:00 |
|
PoYen, Chen
|
d52278a5ef
|
Add more case for appendkv
|
2024-08-16 18:23:55 +00:00 |
|
PoYen, Chen
|
d3fd64cd26
|
Add more appendkv test
|
2024-08-16 18:03:28 +00:00 |
|
PoYen, Chen
|
51062cae0b
|
Merge remote-tracking branch 'origin/develop' into feature/fmha-fwd-appendkv
|
2024-08-16 16:47:06 +00:00 |
|
PoYen, Chen
|
41fdf9b2bc
|
Fix compilation error
|
2024-08-16 16:39:11 +00:00 |
|
PoYen, Chen
|
43b8100b7f
|
Support cache_batch_idx in example
|
2024-08-16 16:27:56 +00:00 |
|
PoYen, Chen
|
9c904b0e4c
|
Pass cache_batch_idx to kernels
|
2024-08-16 15:32:24 +00:00 |
|
PoYen, Chen
|
e6239e14f7
|
Re-organize bash functions
|
2024-08-16 12:46:16 +00:00 |
|
PoYen, Chen
|
2523c8e36c
|
Fix more format
|
2024-08-16 10:32:17 +00:00 |
|
PoYen, Chen
|
5728c0be65
|
Fix formatting
|
2024-08-16 10:25:46 +00:00 |
|
PoYen, Chen
|
095819a387
|
Remove options
|
2024-08-16 10:22:44 +00:00 |
|
PoYen, Chen
|
f2b3620511
|
Use meaningful options in smoke test
|
2024-08-16 10:18:14 +00:00 |
|
PoYen, Chen
|
aadd3ec63e
|
Fix wrong syntax in skcheck expr
|
2024-08-16 10:09:46 +00:00 |
|
PoYen, Chen
|
a4c6029a3d
|
Fix skcheck logic
|
2024-08-16 10:08:01 +00:00 |
|
PoYen, Chen
|
5805f5aa73
|
Remove group mode from appendkv kernel
|
2024-08-16 10:04:48 +00:00 |
|
Haocong WANG
|
3049b5467c
|
[GEMM] gemm_universal related optimization (#1453)
* replace buffer_atomic with global_atomic
* fixed global_atomic_add
* added bf16 atomic_add
* format
* clang-format-12
* clean
* clean
* add guards
* Update gtest.cmake
* enabled splitk_gemm_multi_d
* format
* add ckProfiler
* format
* fixed naming
* format
* clean
* clean
* add guards
* fix clang format
* format
* add kbatch printout
* clean
* Add rocm6.2 related gemm optimization
* Limit bf16 atomic usage
* remove redundant RCR gemm_universal instance
* Add RRR fp8 gemm universal instance
* Bug fix
* Add GPU_TARGET guard to FP8/BF8 target
* bug fix
* update cmake
* remove all fp8/bf8 example if arch not support
* Enable fp8 RRR support in ckProfiler
* limit greedy-reverse flag to gemm_universal in ckProfiler
---------
Co-authored-by: Jing Zhang <jizhan@fb.com>
Co-authored-by: Jing Zhang <jizhan@meta.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: illsilin <Illia.Silin@amd.com>
|
2024-08-14 10:42:30 +08:00 |
|
Mateusz Ozga
|
0606e5498e
|
Support large: 12d tensor size for reduction kenrel (#1465)
|
2024-08-13 16:15:47 +02:00 |
|
PoYen, Chen
|
a8a2275aca
|
Fix wrong arugment count
|
2024-08-13 08:42:23 +00:00 |
|
PoYen, Chen
|
d96752d0f5
|
Refine smoke_test_fwd.sh
|
2024-08-13 08:36:04 +00:00 |
|
Illia Silin
|
cbb6f2ab8c
|
Disable inapplicable xdl and mha instances for gfx12 (#1464)
|
2024-08-12 15:11:58 -07:00 |
|
PoYen, Chen
|
e8603dc21a
|
Add missing comment
|
2024-08-08 20:40:50 +00:00 |
|
PoYen, Chen
|
822d5dcd8e
|
Fix wrong seqlen for kvcache
|
2024-08-08 20:39:36 +00:00 |
|