Default Branch

acf3d65966 · [rocm-libraries] ROCm/rocm-libraries#7256 (commit 1fc20eb) · Updated 2026-05-13 09:42:28 +00:00

Branches

ce838e19e5 · [CK_TILE] FMHA BWD launcher: address PR #7331 review comments (round 2) · Updated 2026-05-13 06:52:49 +00:00

3266
4106

d77f0bea63 · CK-UA: collapse MHA/GQA variants -- one binary per (head_dim, kBlockM) · Updated 2026-05-12 12:15:55 +00:00

68
23

f543646f66 · Skip dropout to the tile if PComputeWindow is a null window · Updated 2026-05-12 10:31:23 +00:00

15
1

066d21ec32 · Fix · Updated 2026-05-12 08:00:54 +00:00

53
1

393ebc1a50 · WIP backup: snapshot all local notes, slides, tutorials, and kernel work · Updated 2026-05-11 20:34:52 +00:00

68
16

8a7529177d · Fix the calling context for type_context in scale_tile_in_scalar()/scale_tile_in_pack · Updated 2026-05-11 08:53:07 +00:00

597
271

cf11d1796b · Fix CK-UA int32 overflow: use saved original pointers and row strides for rebasing · Updated 2026-05-10 08:59:34 +00:00

68
18

b686143624 · Adding SWA decode dispatcher to support GPT-OSS shape + update smoke test · Updated 2026-05-08 14:38:16 +00:00

68
21

e9af75800d · introduce env ROCM_FLASH_ATTN_CU_NUM to control bwd group mode persistent kernel grid size · Updated 2026-05-07 20:48:17 +00:00

3266
4048

ab4ccfcdc1 · Fix gfx12 async tile-load fallback warnings · Updated 2026-05-06 11:02:45 +00:00

10
5

1eafdc8bd7 · [CK][CK_TILE] Fix FMHA codegen group mode dispatch (#6764) · Updated 2026-05-05 19:05:30 +00:00

11
1

b00e5449c8 · sparse_attn: split KStats kernel, add README + perf charts · Updated 2026-05-05 07:13:24 +00:00

45
8

6b1d184e66 · tmp · Updated 2026-04-30 12:33:00 +00:00

18
1

cdf24e0b85 · Add basic support for gfx1153 · Updated 2026-04-30 00:52:14 +00:00

1250
3

e09e6a81f3 · Added TE-specialized receipt · Updated 2026-04-28 16:50:01 +00:00

18
1

c1d2cf0869 · [CK][CK_TILE] Fix FMHA codegen group mode dispatch (#6764) · Updated 2026-04-27 18:14:42 +00:00

3266
4099

8a59f8afa5 · Merge branch 'develop' into users/yiding12/fmha-bwd-workspace · Updated 2026-04-27 07:07:41 +00:00

3266
4098

defd7ad297 · Add swiglustep_and_mul branches to gridwise_moe_gemm (4 paths, hardcoded 7.0f clamp) · Updated 2026-04-24 02:45:13 +00:00

18
1

977af0e511 · Add max len k to UA argument structure · Updated 2026-04-23 15:22:09 +00:00

68
17

c0d65e775a · Merge branch 'develop' into users/ArthurLiu/ck_fmha_codegen · Updated 2026-04-22 07:12:00 +00:00

26
2