[CK] Fix RDNA3 FMHA tile-load paths
## Summary
Fix CK tile FMHA paths needed for RDNA3/RDNA4 targets.
## Details
This PR addresses RDNA-specific issues hit while enabling xFormers CK
FMHA on gfx11/gfx12:
- On RDNA3, update FMHA P tile handling so the layout consumed by the
second GEMM matches the WMMA path.
## Testing
Validated downstream with xFormers CK/FMHA on gfx1201/gfx1151.
```text
pytest --import-mode=importlib -q \
tests/test_mem_eff_attention.py::test_forward \
tests/test_mem_eff_attention.py::test_backward \
tests/test_mem_eff_attention.py::test_dropout_ck
3844 passed, 5244 skipped, 26 warnings