Commit Graph

494 Commits

Author SHA1 Message Date
PoYen, Chen
3850028bca Make comment more clear 2024-08-20 14:10:16 +00:00
PoYen, Chen
8745f5f8bb Fix kvcache seqlen_k generating logic 2024-08-20 14:04:03 +00:00
PoYen, Chen
6230a78c9e Find executable from folder automatically 2024-08-20 07:54:37 +00:00
PoYen, Chen
e44852bf2a Re-format script 2024-08-20 06:02:34 +00:00
PoYen, Chen
eb1c8a26fa Merge branch 'develop' into feature/fmha-fwd-appendkv 2024-08-20 05:49:01 +00:00
PoYen, Chen
60fe251b78 Revert "Only randomize kvcache seqlen_k if 1 < batch"
This reverts commit b9a4ab0d7e.
2024-08-19 20:10:36 +00:00
PoYen, Chen
b9a4ab0d7e Only randomize kvcache seqlen_k if 1 < batch 2024-08-19 09:28:52 +00:00
PoYen, Chen
8166aa58aa Fix wrong uneven split checking logics 2024-08-19 08:14:45 +00:00
PoYen, Chen
3f0dab6a77 Revert "Avoid seqlen_k=0 for kvcache"
This reverts commit 21c4df89e4.
2024-08-19 07:50:50 +00:00
PoYen, Chen
21c4df89e4 Avoid seqlen_k=0 for kvcache 2024-08-19 07:13:57 +00:00
PoYen, Chen
3fb77a0ebb Remove type argument 2024-08-18 19:29:40 +00:00
PoYen, Chen
f37cd416e3 Fix typo in comment 2024-08-18 19:07:36 +00:00
PoYen, Chen
e8cd975d6a Fix compilation errors 2024-08-18 19:00:05 +00:00
PoYen, Chen
9d5c33da04 Add more comments 2024-08-18 18:56:37 +00:00
PoYen, Chen
90c2008fe5 Add comment 2024-08-18 18:42:31 +00:00
PoYen, Chen
8a856f57ab Add seqlen_q & seqlen_k rules 2024-08-18 18:38:08 +00:00
PoYen, Chen
a93c5e820f Rename parameter 2024-08-18 18:37:25 +00:00
PoYen, Chen
4cd3432361 Avoid using too small rotary_cos & rotary_sin 2024-08-18 18:27:37 +00:00
PoYen, Chen
e5db71cc59 Use randomized seqlen_k for kvcache 2024-08-18 17:42:32 +00:00
PoYen, Chen
996f46b0d1 Randomize seqlen_k if use kvcache 2024-08-18 17:31:51 +00:00
PoYen, Chen
48b7a5bad2 Fix mode overriding logics 2024-08-18 14:44:28 +00:00
PoYen, Chen
05157bf3a3 Force batch mode when invoking appendkv & splitkv apis 2024-08-18 06:05:42 +00:00
PoYen, Chen
cc52587bcc Remove macro checking 2024-08-18 05:50:51 +00:00
PoYen, Chen
6b361f5a4b Clarify the case in warning message 2024-08-18 00:42:22 +00:00
PoYen, Chen
c30d7f9d29 Remove 0 < seqlen_knew constraint 2024-08-16 22:14:05 +00:00
PoYen, Chen
352f6d58b0 Merge branch 'feature/fmha-fwd-appendkv' of github.com:ROCm/composable_kernel into feature/fmha-fwd-appendkv 2024-08-16 22:13:02 +00:00
Dan Yao
79a5d9c10c [CK_TILE] FA bwd kernels optimization (#1397)
* tmp save

* fix batch deterministic bugs

* fix group deterministic bugs

* codegen update

* reorder files

* bias support

* hd256 bias support

* bwd smoke test update

* simplify convert dq

* fix hd256 dropout scratch

* do{}while() -> while(){}

* comments

* remove FmhaBwdTilePartitioner

* save clear_tile

* refactor dropout

* code cleanup

* code cleanup

* comments

* fix epilogue problem

* fix fwd dropout

* group convert_dq opt

* fix dq alignment

* Do not store storerandval in bwd for flash attention integration

* fix hd32 error and boost performance

* revert

* Remove duplicated WarpGemm definitions in the policy file

* dropout patch for mrepeat 16*16

* code sync up

* dq_acc stride

* dq_acc stride stuff

* codegen update

* fwd dropout revert

* fix hd128 scratches and boost performance

* receipt 3 for simplified smoke test

* more strides for fa integration

* fix hd64 scratches and boost performance

* non-iglp pipeline for headdim padding cases

* dpad same as dvpad for flash attention integration

* unpadded lse&d for group mode

* Support unpad layout for group lse

* Support unpad lse layout for splitkv

* Fix stride for splitkv kernel

* fix unpadded lse issue in fwd splitkv

* comment

* solve lds read&write conflicts

* rename

* bias rename

* tile index revert

---------

Co-authored-by: danyao12 <danyao12>
Co-authored-by: rocking <ChunYu.Lai@amd.com>
Co-authored-by: Qianfeng Zhang <Qianfeng.Zhang@amd.com>
2024-08-16 13:40:10 -07:00
rocking
34fea2935a Fix unexisted attribute 2024-08-16 20:30:49 +00:00
PoYen, Chen
d52278a5ef Add more case for appendkv 2024-08-16 18:23:55 +00:00
PoYen, Chen
d3fd64cd26 Add more appendkv test 2024-08-16 18:03:28 +00:00
PoYen, Chen
51062cae0b Merge remote-tracking branch 'origin/develop' into feature/fmha-fwd-appendkv 2024-08-16 16:47:06 +00:00
PoYen, Chen
41fdf9b2bc Fix compilation error 2024-08-16 16:39:11 +00:00
PoYen, Chen
43b8100b7f Support cache_batch_idx in example 2024-08-16 16:27:56 +00:00
PoYen, Chen
9c904b0e4c Pass cache_batch_idx to kernels 2024-08-16 15:32:24 +00:00
PoYen, Chen
e6239e14f7 Re-organize bash functions 2024-08-16 12:46:16 +00:00
PoYen, Chen
2523c8e36c Fix more format 2024-08-16 10:32:17 +00:00
PoYen, Chen
5728c0be65 Fix formatting 2024-08-16 10:25:46 +00:00
PoYen, Chen
095819a387 Remove options 2024-08-16 10:22:44 +00:00
PoYen, Chen
f2b3620511 Use meaningful options in smoke test 2024-08-16 10:18:14 +00:00
PoYen, Chen
aadd3ec63e Fix wrong syntax in skcheck expr 2024-08-16 10:09:46 +00:00
PoYen, Chen
a4c6029a3d Fix skcheck logic 2024-08-16 10:08:01 +00:00
PoYen, Chen
5805f5aa73 Remove group mode from appendkv kernel 2024-08-16 10:04:48 +00:00
Haocong WANG
3049b5467c [GEMM] gemm_universal related optimization (#1453)
* replace buffer_atomic with global_atomic

* fixed global_atomic_add

* added bf16 atomic_add

* format

* clang-format-12

* clean

* clean

* add guards

* Update gtest.cmake

* enabled splitk_gemm_multi_d

* format

* add ckProfiler

* format

* fixed naming

* format

* clean

* clean

* add guards

* fix clang format

* format

* add kbatch printout

* clean

* Add rocm6.2 related gemm optimization

* Limit bf16 atomic usage

* remove redundant RCR gemm_universal instance

* Add RRR fp8 gemm universal instance

* Bug fix

* Add GPU_TARGET guard to FP8/BF8 target

* bug fix

* update cmake

* remove all fp8/bf8 example if arch not support

* Enable fp8 RRR support in ckProfiler

* limit greedy-reverse flag to gemm_universal in ckProfiler

---------

Co-authored-by: Jing Zhang <jizhan@fb.com>
Co-authored-by: Jing Zhang <jizhan@meta.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: illsilin <Illia.Silin@amd.com>
2024-08-14 10:42:30 +08:00
Mateusz Ozga
0606e5498e Support large: 12d tensor size for reduction kenrel (#1465) 2024-08-13 16:15:47 +02:00
PoYen, Chen
a8a2275aca Fix wrong arugment count 2024-08-13 08:42:23 +00:00
PoYen, Chen
d96752d0f5 Refine smoke_test_fwd.sh 2024-08-13 08:36:04 +00:00
Illia Silin
cbb6f2ab8c Disable inapplicable xdl and mha instances for gfx12 (#1464) 2024-08-12 15:11:58 -07:00
PoYen, Chen
e8603dc21a Add missing comment 2024-08-08 20:40:50 +00:00
PoYen, Chen
822d5dcd8e Fix wrong seqlen for kvcache 2024-08-08 20:39:36 +00:00
PoYen, Chen
6a399ea47e Use generic lambda to init all the api traits/args 2024-08-08 19:22:53 +00:00