Commit Graph

  • f122fc731f fix Ding, Yi 2026-04-23 03:46:52 -05:00
  • 3f076a6fc1 Add IsOutOfSinkBound alias in GenericAttentionMask (API compatibility) root 2026-04-23 08:17:34 +00:00
  • 5c9134f72b Merge remote-tracking branch 'origin/develop' into users/yiding12/fmha-bwd-workspace Ding, Yi 2026-04-23 01:50:54 -05:00
  • 276c6ae5b6 Fix format Ding, Yi 2026-04-23 01:50:26 -05:00
  • 1ca05a105a [rocm-libraries] ROCm/rocm-libraries#6434 (commit 87aae5c) Max Podkorytov 2026-04-22 18:06:19 +00:00
  • eaeba5266b Fix ck4inductor conv instance parsing for NumGroupsToMerge parameter (#6434) Max Podkorytov 2026-04-22 11:05:11 -07:00
  • 8c238fe875 [rocm-libraries] ROCm/rocm-libraries#6434 (commit 87aae5c) Max Podkorytov 2026-04-22 11:05:11 -07:00
  • 58e802dcf1 Fix ck4inductor conv instance parsing for NumGroupsToMerge parameter (#6434) Max Podkorytov 2026-04-22 11:05:11 -07:00
  • ab44b83566 refactor to combine two kernel Gino Lu 2026-04-22 13:13:37 -04:00
  • 2c1c7b64cb [CK] Fix/suppress clang lifetimebound warnings with staging compiler. (#6550) Illia Silin 2026-04-22 08:47:47 -07:00
  • d16061f578 [rocm-libraries] ROCm/rocm-libraries#6550 (commit c396de9) Illia Silin 2026-04-22 08:47:47 -07:00
  • cfb09d76a5 [CK] Fix/suppress clang lifetimebound warnings with staging compiler. (#6550) Illia Silin 2026-04-22 08:47:47 -07:00
  • cbfb3e242e [rocm-libraries] ROCm/rocm-libraries#6611 (commit 5375c0f) Sami Remes 2026-04-22 10:52:59 +00:00
  • f9a83bd96c [CK_TILE] Preserve input strides in EightWaves async-load descriptor (#6611) Sami Remes 2026-04-22 13:52:02 +03:00
  • de3fa71992 [rocm-libraries] ROCm/rocm-libraries#6611 (commit 5375c0f) Sami Remes 2026-04-22 13:52:02 +03:00
  • 1e4eebfba8 [CK_TILE] Preserve input strides in EightWaves async-load descriptor (#6611) Sami Remes 2026-04-22 13:52:02 +03:00
  • 30f2874063 Merge branch 'develop' into users/yiding12/fmha-bwd-workspace Yi DING 2026-04-22 15:17:57 +08:00
  • c0d65e775a Merge branch 'develop' into users/ArthurLiu/ck_fmha_codegen users/ArthurLiu/ck_fmha_codegen ArthurLiu 2026-04-22 15:12:00 +08:00
  • 3607588ca4 FMHA BWD workspace: 4K-align dq_acc base Ding, Yi 2026-04-22 02:08:01 -05:00
  • 4195052efa Time launcher construction and prepare_workspace in benchmark output Ding, Yi 2026-04-22 02:07:55 -05:00
  • 03b7f58896 Fix FMHA codegne group mode dispatch ArthurLiu 2026-04-22 01:50:50 -05:00
  • 9d34174ac2 [rocm-libraries] ROCm/rocm-libraries#5646 (commit 05680a4) Bartłomiej Kocot 2026-04-21 21:50:07 +00:00
  • 0824d93b22 [CK_TILE] Add conv bwd data tests (#5646) Bartłomiej Kocot 2026-04-21 23:49:19 +02:00
  • 9348b8eb82 [rocm-libraries] ROCm/rocm-libraries#5646 (commit 05680a4) Bartłomiej Kocot 2026-04-21 23:49:19 +02:00
  • 2fb3f2716e [CK_TILE] Add conv bwd data tests (#5646) Bartłomiej Kocot 2026-04-21 23:49:19 +02:00
  • 7bcaa73a3a [rocm-libraries] ROCm/rocm-libraries#6537 (commit 16be4f7) arai713 2026-04-21 20:50:57 +00:00
  • 2f0c38a8d7 [CK] Fix for hipblaslt error in PyTorch Dockerfile (#6537) arai713 2026-04-21 13:49:34 -07:00
  • 59d1cbef89 [rocm-libraries] ROCm/rocm-libraries#6537 (commit 16be4f7) arai713 2026-04-21 13:49:34 -07:00
  • 98b45de037 [CK] Fix for hipblaslt error in PyTorch Dockerfile (#6537) arai713 2026-04-21 13:49:34 -07:00
  • 0b6bbe45d6 Remove exposing kUseTrLoad as template parameter of pipeline problem Qianfeng Zhang 2026-04-21 15:35:03 +00:00
  • 9f4ae1a4d9 Added TE-specialized receipt zain/qola/te-receipt Meekail Zain 2026-04-21 15:24:25 +00:00
  • d22aafb48b [rocm-libraries] ROCm/rocm-libraries#6479 (commit 0705c2d) Linjun-AMD 2026-04-21 11:05:12 +00:00
  • 199716991a CK][fmha] Add StreamLLM sink support to batch_prefill pipeline (#6479) Linjun-AMD 2026-04-21 19:03:55 +08:00
  • dfc1305685 [rocm-libraries] ROCm/rocm-libraries#6479 (commit 0705c2d) Linjun-AMD 2026-04-21 19:03:55 +08:00
  • 803874c73b CK][fmha] Add StreamLLM sink support to batch_prefill pipeline (#6479) Linjun-AMD 2026-04-21 19:03:55 +08:00
  • b75afb4274 [rocm-libraries] ROCm/rocm-libraries#6118 (commit 2c7dcf7) 金黄色葡萄球君君 2026-04-21 07:25:52 +00:00
  • c9e8acc56a projects/composablekernel: add SwigluStep support for MoE blockscale (#6118) 金黄色葡萄球君君 2026-04-21 15:24:48 +08:00
  • 8be1bc3b1f [rocm-libraries] ROCm/rocm-libraries#6118 (commit 2c7dcf7) 金黄色葡萄球君君 2026-04-21 15:24:48 +08:00
  • b5b3ba728d projects/composablekernel: add SwigluStep support for MoE blockscale (#6118) 金黄色葡萄球君君 2026-04-21 15:24:48 +08:00
  • eaaed3e35e [rocm-libraries] ROCm/rocm-libraries#6563 (commit 6559ac9) Yi DING 2026-04-21 05:36:37 +00:00
  • b367e98358 [CK] Add render group to AITER and FA dockers (#6563) Yi DING 2026-04-21 13:35:46 +08:00
  • fb236ed464 [rocm-libraries] ROCm/rocm-libraries#6563 (commit 6559ac9) Yi DING 2026-04-21 13:35:46 +08:00
  • f1c6b7e355 [CK] Add render group to AITER and FA dockers (#6563) Yi DING 2026-04-21 13:35:46 +08:00
  • 863abad57d modify for fmha_fwd 192/128 ginolu/perf Gino Lu 2026-04-20 21:20:28 -05:00
  • fd1060f6fe [CK_TILE] Enable canonical-NaN BF16 conversion for FMHA on RDNA (#6253) Hosang Yoon 2026-04-20 14:52:24 -04:00
  • 720cc88a31 [rocm-libraries] ROCm/rocm-libraries#6253 (commit 61934c6) Hosang Yoon 2026-04-20 14:52:24 -04:00
  • 2574f37483 [CK_TILE] Enable canonical-NaN BF16 conversion for FMHA on RDNA (#6253) Hosang Yoon 2026-04-20 14:52:24 -04:00
  • 60ff5693c4 [rocm-libraries] ROCm/rocm-libraries#6168 (commit 2968835) Bartłomiej Kocot 2026-04-20 15:33:18 +00:00
  • 3dbc77a678 [CK][CK Tile] Clamp element space size to max int32 value (#6168) Bartłomiej Kocot 2026-04-20 17:32:24 +02:00
  • 6f9537fa0b [rocm-libraries] ROCm/rocm-libraries#6168 (commit 2968835) Bartłomiej Kocot 2026-04-20 17:32:24 +02:00
  • 8fd401803f [CK][CK Tile] Clamp element space size to max int32 value (#6168) Bartłomiej Kocot 2026-04-20 17:32:24 +02:00
  • b05adebf38 set lane_group_sz=1 for small token decode relbers/moe_sorting/lane_group_sz_to_1 Robin Elbers 2026-04-20 11:13:01 -04:00
  • 21acf3ba3a [CK TILE] Unification of Scale MFMA/WMMA Policy Structs (#5857) Yung-sheng Tu 2026-04-20 16:28:23 +02:00
  • 5d36cad34a [rocm-libraries] ROCm/rocm-libraries#5857 (commit d77cd41) Yung-sheng Tu 2026-04-20 16:28:23 +02:00
  • 91b7dae95a [CK TILE] Unification of Scale MFMA/WMMA Policy Structs (#5857) Yung-sheng Tu 2026-04-20 16:28:23 +02:00
  • d4236de1ba [rocm-libraries] ROCm/rocm-libraries#4961 (commit 6c3969a) Zoltán Lakatos 2026-04-20 12:25:45 +00:00
  • b65d734c87 [CK] Remove code duplications in grouped gemm fixed nk implementations (#4961) Zoltán Lakatos 2026-04-20 14:24:59 +02:00
  • 09bf63fa71 [rocm-libraries] ROCm/rocm-libraries#4961 (commit 6c3969a) Zoltán Lakatos 2026-04-20 14:24:59 +02:00
  • f73bfe1b7e [CK] Remove code duplications in grouped gemm fixed nk implementations (#4961) Zoltán Lakatos 2026-04-20 14:24:59 +02:00
  • 86ed46ca60 remove invalid instance for MI300 add_fmha_tuned_file Mohsen Saffari 2026-04-20 10:59:09 +00:00
  • 114bb4687f [CK_TILE] Skip padded k/n fragment work in qr_hpad FMHA fwd (#6450) Hosang Yoon 2026-04-18 02:44:46 -04:00
  • 7e4e291771 [rocm-libraries] ROCm/rocm-libraries#6450 (commit b75fed1) Hosang Yoon 2026-04-18 02:44:46 -04:00
  • f5e00ec904 [CK_TILE] Skip padded k/n fragment work in qr_hpad FMHA fwd (#6450) Hosang Yoon 2026-04-18 02:44:46 -04:00
  • c19aa36489 [CK][CK_TILE] Fix dispatcher cpp tests - registry key mismatch and string assertions (#6528) Yaswanth Raparti 2026-04-17 22:14:02 -07:00
  • eb7cb6302d [rocm-libraries] ROCm/rocm-libraries#6528 (commit aa81df5) Yaswanth Raparti 2026-04-17 22:14:02 -07:00
  • 907c6e94ae [CK][CK_TILE] Fix dispatcher cpp tests - registry key mismatch and string assertions (#6528) Yaswanth Raparti 2026-04-17 22:14:02 -07:00
  • 00514bed47 add more instances, manage splitkv verifications Mohsen Saffari 2026-04-17 13:20:45 +00:00
  • a30e0c5cce add run_all_kernels benchmarking mode with extended tuning tiles Mohsen Saffari 2026-04-17 12:03:58 +00:00
  • 8f0f7ca436 Simplification in the cross_attention testing/benchmarking scripts Qianfeng Zhang 2026-04-17 09:38:41 +00:00
  • 3f9f2fa736 Remove max_target 3200 cases from cross_attention testing and benchmarking Qianfeng Zhang 2026-04-17 09:17:38 +00:00
  • db3263469c Clarify the using the max_seqlen and max_seqlen_q Qianfeng Zhang 2026-04-17 09:13:45 +00:00
  • 3e3cb36c7a [fmha-bwd] Fix dK/dV left uninitialized for zero-length Q batches in group persistent mono-split/users/yiding12/fmha-bwd-group-persistent Ding, Yi 2026-04-17 02:54:52 -05:00
  • e6d1781f20 [rocm-libraries] ROCm/rocm-libraries#6421 (commit 05b0753) Ville Pietilä 2026-04-17 06:17:30 +00:00
  • 7aab7c464a [MIOpen][CK] Fix bwd weight conv test failures by disabling one block-GEMM V5 instance for 3D convs (#6421) Ville Pietilä 2026-04-17 09:16:32 +03:00
  • c7fe8b72c6 [rocm-libraries] ROCm/rocm-libraries#6421 (commit 05b0753) Ville Pietilä 2026-04-17 09:16:32 +03:00
  • 7d6ef2396f [MIOpen][CK] Fix bwd weight conv test failures by disabling one block-GEMM V5 instance for 3D convs (#6421) Ville Pietilä 2026-04-17 09:16:32 +03:00
  • 92f2ed758e [fmha-bwd] Implement group-mode persistent scheduling with optimized state management Ding, Yi 2026-04-16 22:27:25 -05:00
  • 5c84f54fd9 Add scripts for testing/benchmarking cross_attention cases Qianfeng Zhang 2026-04-16 15:45:57 +00:00
  • cc2963c884 Add tuning-only extended FMHA tile instances (receipts 150/250) Mohsen Saffari 2026-04-16 14:47:36 +00:00
  • 7889844d6b Clarify the using of group_max_seqlens[] and group_input_max_uih_seqlens[] parameters for group attention example Qianfeng Zhang 2026-04-15 16:18:43 +00:00
  • 9279af33f1 Add implementation of fwd splitkv on no_softmax path Qianfeng Zhang 2026-04-15 07:14:40 +00:00
  • 470ff04817 [rocm-libraries] ROCm/rocm-libraries#6445 (commit 2225e10) Yaswanth Raparti 2026-04-16 01:07:37 +00:00
  • e1aabacb49 [CK][CK_TILE] Fix library caching bug in gemm dispatcher (#6445) Yaswanth Raparti 2026-04-15 18:06:30 -07:00
  • 8fc06f80f9 [rocm-libraries] ROCm/rocm-libraries#6445 (commit 2225e10) Yaswanth Raparti 2026-04-15 18:06:30 -07:00
  • 2934d9475d [CK][CK_TILE] Fix library caching bug in gemm dispatcher (#6445) Yaswanth Raparti 2026-04-15 18:06:30 -07:00
  • ac942a32b3 [rocm-libraries] ROCm/rocm-libraries#4657 (commit 47a0db5) Alex Brown 2026-04-15 14:43:23 +00:00
  • 23864ab760 Update build instructions in readme (#4657) Alex Brown 2026-04-15 08:42:37 -06:00
  • 497dc87c02 [rocm-libraries] ROCm/rocm-libraries#4657 (commit 47a0db5) Alex Brown 2026-04-15 08:42:37 -06:00
  • 30a1bfde7a Update build instructions in readme (#4657) Alex Brown 2026-04-15 08:42:37 -06:00
  • 2b83413b8d [rocm-libraries] ROCm/rocm-libraries#6305 (commit 19e10a0) Po Yen Chen 2026-04-15 07:38:36 +00:00
  • a15e7cb018 [CK] Remove obsolete benchmark_fwd_v3.sh script and README reference (#6305) Po Yen Chen 2026-04-15 15:37:37 +08:00
  • c0d9614af6 [rocm-libraries] ROCm/rocm-libraries#6305 (commit 19e10a0) Po Yen Chen 2026-04-15 15:37:37 +08:00
  • 0ddf22610c [CK] Remove obsolete benchmark_fwd_v3.sh script and README reference (#6305) Po Yen Chen 2026-04-15 15:37:37 +08:00
  • 7dcc606adc [rocm-libraries] ROCm/rocm-libraries#5383 (commit b660b8c) Max Podkorytov 2026-04-15 03:44:07 +00:00
  • d415188771 [CK_TILE] Add CShuffleLds microbenchmark suite (#5383) Max Podkorytov 2026-04-14 20:43:23 -07:00
  • 3aee45e115 [rocm-libraries] ROCm/rocm-libraries#5383 (commit b660b8c) Max Podkorytov 2026-04-14 20:43:23 -07:00
  • 027b95a21c [CK_TILE] Add CShuffleLds microbenchmark suite (#5383) Max Podkorytov 2026-04-14 20:43:23 -07:00
  • 5348b577ed [rocm-libraries] ROCm/rocm-libraries#5863 (commit 31d9247) msaffari-amd 2026-04-14 20:23:26 +00:00
  • 6072031cf4 [CK_TILE] Separate PermuteN epilogue from CShuffle epilogue into standalone file (#5863) msaffari-amd 2026-04-14 22:22:18 +02:00
  • cf517ec050 [rocm-libraries] ROCm/rocm-libraries#5863 (commit 31d9247) msaffari-amd 2026-04-14 22:22:18 +02:00