Add tile size for FMHA batch prefill bf16 for MI308X
## Motivation
Adding a tile size adapted to MI308X, for the FMHA Batch Prefill BF16
input type case
## Technical Details
N/A
## Test Plan
Benchmarking from the Aiter side with:
```
python3 op_tests/test_batch_prefill.py -s 8000 -p 1 -q 4 -k 1 --head_dim 256 -c true -d bf16 --input_dtype bf16 --quant_method none --kv_layout linear -t sglang -l 0.0 --return_lse false --profile
```
## Test Result
We see an improvement with the new tile size on MI308X (both with PLT
mode OFF and ON)
## Submission Checklist
- [X] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
Co-authored-by: Damien Lejeune <damien.lejeune@amd.com>