Ye Wang
d7bb3b10cc
[fmha-bwd] Flat cu_id remap for arbitrary CTA_NUM + grid Y/Z override env
...
The persistent bwd group kernel remapped block->CU as bx/8 + (bx%8)*32,
which is only bijective for N=256 -> wrong gradients when ROCM_FLASH_ATTN_CU_NUM
is set to any other value. Replace with flat blockIdx (bx + by*gridDim.x +
bz*gridDim.x*gridDim.y), a bijection for any 1D/3D grid. Also add
get_persistent_grid_override() honoring ROCM_FLASH_ATTN_GRID_Y/Z to reshape
the persistent grid for 1D-vs-3D scheduling experiments.
2026-06-11 13:19:51 +00:00
..
2025-11-26 11:00:05 -07:00
2025-12-02 13:30:27 +01:00
2025-11-26 11:00:05 -07:00
2026-03-27 10:17:10 +01:00
2026-03-12 19:20:15 -06:00
2026-04-07 08:36:45 -06:00
2026-04-07 08:36:45 -06:00
2026-06-11 13:19:51 +00:00
2026-02-25 09:12:19 -07:00
2026-04-07 08:36:45 -06:00
2026-03-27 20:36:39 +00:00
2026-04-07 08:36:45 -06:00
2026-04-04 00:22:22 +00:00
2025-11-26 11:00:05 -07:00
2025-11-26 11:00:05 -07:00
2026-04-07 08:36:45 -06:00
2025-11-26 11:00:05 -07:00
2026-01-13 09:21:29 -08:00
2026-04-07 08:36:45 -06:00
2026-01-30 10:52:19 +08:00
2025-11-26 11:00:05 -07:00
2026-02-10 22:52:00 -07:00
2026-01-31 00:59:47 +08:00
2025-11-26 11:00:05 -07:00
2026-01-13 09:21:29 -08:00
2026-03-02 12:20:55 +00:00
2026-03-02 12:20:55 +00:00
2026-03-02 12:20:55 +00:00
2026-03-02 12:20:55 +00:00
2026-03-02 12:20:55 +00:00
2026-03-02 12:20:55 +00:00
2026-03-02 12:20:55 +00:00
2026-03-11 09:59:50 +00:00
2026-03-02 12:20:55 +00:00
2026-03-12 09:26:58 +01:00
2026-03-16 09:30:54 +01:00
2026-03-16 09:30:54 +01:00
2026-03-12 09:26:58 +01:00
2026-03-02 12:20:55 +00:00
2026-03-02 12:20:55 +00:00
2025-11-26 11:00:05 -07:00
2026-03-02 12:20:55 +00:00
2026-03-02 12:20:55 +00:00
2026-03-02 12:20:55 +00:00
2026-03-02 12:20:55 +00:00
2026-03-02 12:20:55 +00:00
2026-03-02 12:20:55 +00:00
2026-03-02 12:20:55 +00:00
2026-03-02 12:20:55 +00:00
2026-03-02 12:20:55 +00:00
2026-03-02 12:20:55 +00:00