mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-05 06:01:23 +00:00
[CK] Skip fp16 dropout d256 batch tests for compiler VGPR aliasing bug (#6342) ## Summary - Skip fp16 FMHA forward dropout tests that use the d256 tile in batch mode, gated on compiler version - The AMDGPU compiler miscompiles these kernels due to VGPR aliasing of Philox RNG parameters under high register pressure (383 VGPRs) - bf16 dropout tests are unaffected and cover the same code paths ## Root Cause The compiler aliases `ph_seed` and `ph_head_offset` (Philox RNG state stored in VGPRs) with other live data during the softmax main loop. This causes corrupted `buffer_store_byte` writes for dropout randval on wave lanes 32-63, producing NaN in output and LSE tensors. **Conditions:** fp16 + d256 tile + dropout + batch mode + `qr` pipeline + gfx90a ## Changes - `include/ck_tile/core/config.hpp`: Add `CK_TILE_WORKAROUND_ROCM_7_12_FP16_DROPOUT_MISCOMPILE` macro - `test/ck_tile/fmha/test_fmha_fwd.cpp`: Version-gated `GTEST_SKIP` in `TEST_P(Dropout, ...)` ## Test plan - [x] ROCm 7.1.1 (clang 20): 168/168 fp16 dropout tests PASS (no skip active) - [x] ROCm 7.12 (clang 22): 132 PASS, 36 SKIPPED, 0 FAILED - [x] bf16 dropout tests: 168/168 PASS (unaffected by this change)
ck_tile/core
ck_tile/core contains every basic functions and structures to create a GPU kernel using ck_tile. User should only include ck_tile/core.hpp this single header to use all the functionality. Everything is under ck_tile namespace. The coding style under this folder should be similar to std (snake_case for structure/function, Camel for template types...)
algorithm/
coordinate transform and some other reusable algorithm
arch/
contains some basic device building block like mma, buffer addressing, etc...
container/
contains basic container data structure, array/sequence/tuple/...
numeric/
data type, and data type related math
tensor/
tensor descriptors and tile level API
utility/
other utility function for both host/device