Input/output permutation for fused attention (#460)

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-11 08:50:17 +00:00

* reopen masking att instance due to CI is upgraded

* re-enable instances previously failed on 9110

* enable ksize-kpadding pair validity test

* add non-masked attention+permute test; expose masking boolean to attention kernel handles

* disable bench

* fix test

* move files

* bulk rename batched_gemm_masking_scale_softmax_gemm_permute to batched_gemm_softmax_gemm_permute

* format

* amend rename

* disable bench in test

* add mask/no-mask test for non-permute attention kernels

* disable broken kernel instance

* example working

add non-permuted problem statement

evaluating whether overhead comes from permutation or the extra kernel arg

* interface for bias addition without implementing it

* test and profiler running

* tidy

* mask type determined by enum class

* unify example code

* move masking specialization to its own header

* align formats

* extract helper functions

* experiment merging dims for attn w/ permute; shows perf parity with attn wo/ permute

* add tensor specialization to template args

since tensor spec packed shows perf parity when permutation isn't needed

remove redundant template args

comment on 'packed' tensor specialization

* grouped attention with input/output permute example

* format

* clean up

* refactor acc0 tile visitor

Co-authored-by: shaojiewang <wsjmessi@163.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>

This commit is contained in:

Anthony Chang

2022-10-28 04:58:20 +08:00

committed by

GitHub

parent cd51732690

commit de37550f72

42 changed files with 2654 additions and 2196 deletions

									
										5

include/ck/ck.hpp
									
												View File
												
				@@ -159,6 +159,11 @@

				// tuning parameter

				#define CK_WORKAROUND_SWDEV_325164 0

				// workaround: disable broken fused attention kernel instance that does not pass validation

				// issue found on mi100/#10738 combo when irregular KPerBlock attention kernel has acc0 scaling

				// enabled

				#define CK_WORKAROUND_DISABLE_BROKEN_ATTN_KERNEL_INSTANCE 1

				namespace ck {

				enum struct InMemoryDataOperationEnum

Input/output permutation for fused attention (#460)

5 include/ck/ck.hpp Unescape Escape View File

5

include/ck/ck.hpp

View File