[CK_TILE] FMHA Fix synchronization issue in FWD splitkv combine pipeline (#2934)

* Fix validation of rotary embedding with time_kernel_

When rotary embedding is used, the appendkv kernel modifies the q tensor
(multiple times when time_kernel_ is set). We need to reset the q buffer
and rerun all kernels.

* Fix synchronization issue in splitkv combine pipeline

Different warps can read and then rewrite the same values of lse_acc_lds.
Sometimes warps progress at different speeds, one warp can rewrite
values that are still being read by another warp.

Running the tests multiple times and, preferably, with multiple
processes on the same GPU helps to trigger this issue:

bin/test_ck_tile_fmha_fwd_fp16 --gtest_repeat=-1 --gtest_shuffle --gtest_throw_on_failure --gtest_filter="TestCkTileFmhaFwd/*KV*"
This commit is contained in:
Anton Gorenko
2025-09-27 09:16:10 +06:00
committed by GitHub
parent 2aa06fbd45
commit c6bfd97c2d
2 changed files with 24 additions and 10 deletions

View File

@@ -1,5 +1,5 @@
// SPDX-License-Identifier: MIT
// Copyright (c) 2018-2024, Advanced Micro Devices, Inc. All rights reserved.
// Copyright (c) 2018-2025, Advanced Micro Devices, Inc. All rights reserved.
#pragma once
@@ -223,6 +223,8 @@ struct BlockFmhaFwdSplitKVCombinePipeline
});
}
// sync before rewriting lse_acc_lds
block_sync_lds();
// store the lse scales in shared memory.
{
constexpr auto spans = decltype(lse_accum)::get_distributed_spans();