Files
composable_kernel/example/ck_tile/01_fmha/codegen
jian.wu 2bbff45dcb [CK_TILE][FMHA][Feature] Add support for large hdim
* root cause: fhma_bwd not support if hdim > 256 due to the use of LDS goes beyond the hardware limitations.

* solution: 1. split dqdkdv kernel into 2 kernels.
*              1) QGrad
*              2) KGrad & VGrad
*           2. reuse LDS memory.
*              1). K and K^T use same LDS memory in dq kernel
*              2). OGrad and OGrad^T use same LDS memory in dq kernel
*           3. to avoid or reduce the number of VGPR spills, the calculation order has been readjusted, and prefetch has been disabled.
2025-08-15 16:23:32 +08:00
..