mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-14 02:02:46 +00:00
[CK_TILE] FMHA Fix synchronization issues in BWD pipelines (#2876)
* Run ctest with --output-on-failure
* Fix synchronization issues in bwd pipelines
The bwd kernel reuses the same area of LDS for ds (SGrad), bias and
dbias (BiasGrad). This means that there must be block_sync_lds between
loading one tensor and storing another to the same area.
Heavy instructions like MFMA/WMMA and global loads are executed between
reuses of the same memory so in MOST cases loading is finished by all
warps before storing is started. However, sometimes warps progress at
different speeds.
Running the tests multiple times and, preferably, with multiple
processes on the same GPU helps to trigger this issue:
bin/test_ck_tile_fmha_bwd_bf16 --gtest_repeat=-1 --gtest_shuffle --gtest_throw_on_failure
[ROCm/composable_kernel commit: 2aec38f9ec]
This commit is contained in:
@@ -49,7 +49,7 @@ with open('$TEST_FILE', 'r') as f:
|
||||
if tests:
|
||||
# Extract just the filename after the last '/'
|
||||
clean_tests = [os.path.basename(test) for test in tests]
|
||||
print('ctest -R \"' + '|'.join(clean_tests) + '\"')
|
||||
print('ctest --output-on-failure -R \"' + '|'.join(clean_tests) + '\"')
|
||||
else:
|
||||
print('# No tests to run')
|
||||
")
|
||||
@@ -57,5 +57,3 @@ with open('$TEST_FILE', 'r') as f:
|
||||
echo "$command"
|
||||
|
||||
eval "$command"
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user