mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-04-20 06:49:15 +00:00
[CK_TILE] fmha: Add query padding support to backward pass (#3097)
* [CK_TILE] fmha: Add query padding support to backward pass Introduces support for query sequence padding (q_padding) in the FMHA backward pass kernels. - Passing `seqlen_q_ptr` to the backward kernels to distinguish logical from physical sequence lengths. - Updating `OGradDotO`, `ConvertQGrad`, and `DQDKDV` kernels to respect logical lengths and handle zero-length sequences. - Aligning LSE indexing in the forward kernel with the padded layout for consistency. - Adding a new GTest suite (`test_fmha_bwd_kernel_padding.cpp`) with comprehensive tests for various padding scenarios, including zero-length sequences and deterministic mode. * fix clang format * Adapt fmha_bwd_runner.cpp to new q, kv sequence padding Add backward q/kv sequence padding unit tests. * [CK_TILE] fmha: Unify sequence length and padding handling Refactor the handling of sequence lengths and padding in the FMHA forward and backward kernels to provide a more unified and flexible interface. - Replaced `seqstart_padded_*_ptr` with a more robust system that uses `seqstart_*_ptr` for physical sequence lengths and introduces `seqlen_*_ptr` and `cu_seqlen_*_ptr` for logical (unpadded) lengths. - Established a clear order of precedence for determining sequence length: cumulative lengths (`cu_seqlen_*_ptr`) take priority, followed by per-sequence lengths (`seqlen_*_ptr`), and finally physical lengths derived from `seqstart_*_ptr`. - Clarified the distinction between "group mode" and "batch mode" and how sequence lengths are handled in each case. - Renamed `cu_seqlen_kv_ptr` to `cu_seqlen_k_ptr` for consistency. - Updated comments and documentation to reflect the new argument structure and usage. --------- Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>
This commit is contained in:
@@ -269,14 +269,14 @@ class FmhaFwdApiTrait:
|
||||
return "true/*group mode skpad always true*/" # group mode only generate spad/skpad == true
|
||||
if self.pipeline_tag == "qr_async":
|
||||
if self.skpad == "t":
|
||||
return f"(a.cu_seqlen_kv_ptr != nullptr) || (a.seqlen_k == 0 || a.seqlen_k % {self.bn0} != 0)"
|
||||
return f"(a.cu_seqlen_k_ptr != nullptr) || (a.seqlen_k == 0 || a.seqlen_k % {self.bn0} != 0)"
|
||||
else:
|
||||
return f"(a.cu_seqlen_kv_ptr == nullptr) && (a.seqlen_k != 0 && a.seqlen_k % {self.bn0} == 0)"
|
||||
return f"(a.cu_seqlen_k_ptr == nullptr) && (a.seqlen_k != 0 && a.seqlen_k % {self.bn0} == 0)"
|
||||
elif self.pipeline_tag in ["qr", "qs"]:
|
||||
if self.skpad == "t":
|
||||
return f"true /*a.seqlen_k % {self.bn0} != 0*/" # TODO: order of get_pipelines() matters! (ugly)
|
||||
else:
|
||||
return f"(a.cu_seqlen_kv_ptr == nullptr) && (a.seqlen_k != 0 && a.seqlen_k % {self.bn0} == 0)"
|
||||
return f"(a.cu_seqlen_k_ptr == nullptr) && (a.seqlen_k != 0 && a.seqlen_k % {self.bn0} == 0)"
|
||||
elif self.pipeline_tag == "qr_async_trload":
|
||||
if self.skpad == "t":
|
||||
return "true"
|
||||
|
||||
Reference in New Issue
Block a user