mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-14 10:09:41 +00:00
* add lower triangle bmm
* init code for tile skipping
* functionality right with lower triangle mask
* add decoder lower triangular mask calculation
* use 7*13 group
* fix n2 compute error
* attention with lower triangle mask with tile skipping
* add template to distinguish masking kernel
* rename template and remove default template value
* remove lower triangle gemm reference struct
* add some comments on example
* add 10 instance for masking bmm + scale + softmax + bmm + permute kernels
* add test
* add test file
* add gtest for bmm masking scale softmax bmm permute
* clang-format
* fix compile error
* check lef bottom corner for tile skipping
* fix error: check left bottom corner for tile skipping
* add k padding
* add test and instance for MNK padding
* passing a mask struct
* fix instances
* delete used comments
* format
Co-authored-by: danyao12 <yaodan@dc-smc-13.amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>
[ROCm/composable_kernel commit: ebab84b6f9]