Commit Graph

  • a586a1f8bd [rocm-libraries] ROCm/rocm-libraries#6135 (commit 91f0518) copilot/research-fused-kernel-patterns copilot/investigate-fmha-kernel-execution Vidyasagar Ananthan 2026-04-03 22:08:33 +00:00
  • 356bcbb8cb [CK][CK_Tile] Ensure CK Tile engine benchmarking targets are excluded from default build. (#6135) Vidyasagar Ananthan 2026-04-03 15:07:58 -07:00
  • 89b6bc6489 [rocm-libraries] ROCm/rocm-libraries#6135 (commit 91f0518) Vidyasagar Ananthan 2026-04-03 15:07:58 -07:00
  • 9478c6c69f [CK][CK_Tile] Ensure CK Tile engine benchmarking targets are excluded from default build. (#6135) Vidyasagar Ananthan 2026-04-03 15:07:58 -07:00
  • 3fb26ec98c [rocm-libraries] ROCm/rocm-libraries#5141 (commit e790cc0) harkgill-amd 2026-04-03 19:45:41 +00:00
  • e3f31255f3 Add missing gfx1033 to gfx103 group definition in ck (#5141) harkgill-amd 2026-04-03 15:44:38 -04:00
  • 2b02deb36c [rocm-libraries] ROCm/rocm-libraries#5141 (commit e790cc0) harkgill-amd 2026-04-03 15:44:38 -04:00
  • 12fe2c3de4 Add missing gfx1033 to gfx103 group definition in ck (#5141) harkgill-amd 2026-04-03 15:44:38 -04:00
  • 6880e46a47 [rocm-libraries] ROCm/rocm-libraries#6147 (commit 8035856) Illia Silin 2026-04-03 17:05:23 +00:00
  • d171208642 [CK] Replace daily CI builds with mainline compiler with TheRock compiler. (#6147) Illia Silin 2026-04-03 10:04:12 -07:00
  • eb1508015d [rocm-libraries] ROCm/rocm-libraries#6147 (commit 8035856) Illia Silin 2026-04-03 10:04:12 -07:00
  • 148ae9f2ef [CK] Replace daily CI builds with mainline compiler with TheRock compiler. (#6147) Illia Silin 2026-04-03 10:04:12 -07:00
  • 28afc8fee3 [CK_TILE] Use Unified Workspace for FMHA BWD Ding, Yi 2026-04-03 02:10:43 -05:00
  • cf847f90ed [rocm-libraries] ROCm/rocm-libraries#6102 (commit 827fd10) Thrupti Raj Lakshmana Gowda 2026-04-03 02:55:45 +00:00
  • 88310f86b9 [CK Tile] Fix architecture-dependent EightWave assignment in cshuffle_epilogue (#6102) Thrupti Raj Lakshmana Gowda 2026-04-02 21:55:03 -05:00
  • 8e018a6080 [rocm-libraries] ROCm/rocm-libraries#6102 (commit 827fd10) Thrupti Raj Lakshmana Gowda 2026-04-02 21:55:03 -05:00
  • 8c42eaf3f8 [CK Tile] Fix architecture-dependent EightWave assignment in cshuffle_epilogue (#6102) Thrupti Raj Lakshmana Gowda 2026-04-02 21:55:03 -05:00
  • 1dc35ff4ae [rocm-libraries] ROCm/rocm-libraries#6038 (commit d7041a2) Hosang Yoon 2026-04-03 00:18:21 +00:00
  • b5b96bc62d [CK_TILE] Restrict FMHA codegen to the kernel subset used by FlashAttention (#6038) Hosang Yoon 2026-04-02 20:16:32 -04:00
  • e18f4d8e6d [rocm-libraries] ROCm/rocm-libraries#6038 (commit d7041a2) Hosang Yoon 2026-04-02 20:16:32 -04:00
  • 5370485459 [CK_TILE] Restrict FMHA codegen to the kernel subset used by FlashAttention (#6038) Hosang Yoon 2026-04-02 20:16:32 -04:00
  • 144854dba1 [rocm-libraries] ROCm/rocm-libraries#5938 (commit 73f3650) Christopher Millette 2026-04-02 21:25:56 +00:00
  • 8af6e10be4 [CK_TILE] Optimize static_ford and sequence compile-time infrastructure (#5938) Christopher Millette 2026-04-02 15:25:14 -06:00
  • c8cf71f179 [rocm-libraries] ROCm/rocm-libraries#5938 (commit 73f3650) Christopher Millette 2026-04-02 15:25:14 -06:00
  • 522902f29b [CK_TILE] Optimize static_ford and sequence compile-time infrastructure (#5938) Christopher Millette 2026-04-02 15:25:14 -06:00
  • 7cc9bae9d2 [rocm-libraries] ROCm/rocm-libraries#5722 (commit 55febd2) Emily Martins 2026-04-02 21:07:13 +00:00
  • 76ed4944e1 [CK Tile] Stream-K gtest Code Gen (#5722) Emily Martins 2026-04-02 15:05:44 -06:00
  • 48cd1e0f79 [rocm-libraries] ROCm/rocm-libraries#5722 (commit 55febd2) Emily Martins 2026-04-02 15:05:44 -06:00
  • d80fa8831f [CK Tile] Stream-K gtest Code Gen (#5722) Emily Martins 2026-04-02 15:05:44 -06:00
  • 6d77edc3bd [rocm-libraries] ROCm/rocm-libraries#5544 (commit 3be4095) arai713 2026-04-02 19:49:44 +00:00
  • 37250ceba8 [CK_TILE] Stream-K Tile Engine Fixes (#5544) arai713 2026-04-02 12:49:04 -07:00
  • 7376cb462b [rocm-libraries] ROCm/rocm-libraries#5544 (commit 3be4095) arai713 2026-04-02 12:49:04 -07:00
  • d55186f63a [CK_TILE] Stream-K Tile Engine Fixes (#5544) arai713 2026-04-02 12:49:04 -07:00
  • c73719a78f [rocm-libraries] ROCm/rocm-libraries#6103 (commit c74e44d) Illia Silin 2026-04-02 16:08:15 +00:00
  • 80aeb0fcd0 Use ck_pytorch docker from private repo. (#6103) Illia Silin 2026-04-02 09:06:44 -07:00
  • cd7e17c837 [rocm-libraries] ROCm/rocm-libraries#6103 (commit c74e44d) Illia Silin 2026-04-02 09:06:44 -07:00
  • 9ee89abffe Use ck_pytorch docker from private repo. (#6103) Illia Silin 2026-04-02 09:06:44 -07:00
  • 8506db8761 Fix int32 overflow in CK-UA pipeline via pointer rebasing root 2026-04-02 09:30:00 +00:00
  • 08792e0b31 [rocm-libraries] ROCm/rocm-libraries#5504 (commit 47f86c7) Linjun-AMD 2026-04-02 03:17:45 +00:00
  • d06e2bfa2f [CK Tile] Add sink token gradient support in FMHA backward pass (#5504) Linjun-AMD 2026-04-02 11:17:01 +08:00
  • 8d1fb9d33e [rocm-libraries] ROCm/rocm-libraries#5504 (commit 47f86c7) Linjun-AMD 2026-04-02 11:17:01 +08:00
  • ba0efe01af [CK Tile] Add sink token gradient support in FMHA backward pass (#5504) Linjun-AMD 2026-04-02 11:17:01 +08:00
  • c1127a36f5 [rocm-libraries] ROCm/rocm-libraries#5676 (commit 1d18339) Yaswanth Raparti 2026-04-02 02:26:32 +00:00
  • 91dbdfa476 [CK][CK TILE]Autotuning heuristics infra for universal GEMM kernel selection (#5676) Yaswanth Raparti 2026-04-01 19:25:55 -07:00
  • 644fc05a87 [rocm-libraries] ROCm/rocm-libraries#5676 (commit 1d18339) Yaswanth Raparti 2026-04-01 19:25:55 -07:00
  • 35bf9327b8 [CK][CK TILE]Autotuning heuristics infra for universal GEMM kernel selection (#5676) Yaswanth Raparti 2026-04-01 19:25:55 -07:00
  • e8587b86c2 Fix CK-UA pipeline: s_waitcnt_vmcnt<0> in fmha_post_process root 2026-04-01 23:04:07 +00:00
  • 404a5ce1a4 [rocm-libraries] ROCm/rocm-libraries#6107 (commit e69d1b2) Jobbins 2026-04-01 19:53:41 +00:00
  • fea6d1fadc [CK] poll every 6 hours as workaround (#6107) Jobbins 2026-04-01 13:52:45 -06:00
  • db382efaf7 [rocm-libraries] ROCm/rocm-libraries#6107 (commit e69d1b2) Jobbins 2026-04-01 13:52:45 -06:00
  • f3375e4b79 [CK] poll every 6 hours as workaround (#6107) Jobbins 2026-04-01 13:52:45 -06:00
  • 87d16738bf WIP: CK-UA KV-segment parallelism - kernel args and split range root 2026-04-01 19:09:59 +00:00
  • 63821af1ff Add split-KV decode tiles (b16x32, b32x32) + fix num_splits heuristic root 2026-04-01 18:49:16 +00:00
  • c5600bc8ae Add decode tiles (b16x32, b32x32) to pagedkv_prefill codegen with max_seqlen_q dispatch root 2026-04-01 18:30:06 +00:00
  • 65a3f88ad8 Fix CK-UA mixed batch: use max_seqlen_q for tier selection root 2026-04-01 18:09:48 +00:00
  • 07ba03bcbf Fix sliding window mask: use window_generic when left >= 0 root 2026-04-01 18:00:19 +00:00
  • e5272603c9 Wire FmhaFwdPagedKV: enable bf16 hdim=64 with bn0=32 for page_block_size=32 root 2026-04-01 17:18:41 +00:00
  • 10564b0c40 Enable FmhaFwdPagedKV bf16 hdim=64 instances (was commented out) root 2026-04-01 16:49:20 +00:00
  • cd7ba6e2e8 Add unified attention (42_unified_attention) root 2026-04-01 16:24:53 +00:00
  • ec2db01e4a Fix fmha_fwd early-exit bug: seqlen_q <= min_seqlen_q should be < root 2026-04-01 15:54:17 +00:00
  • cb6fb2802d Split-KV codegen: dual-tile dispatch and head-merge for hdim=64 root 2026-04-01 15:03:39 +00:00
  • 6729989b97 Fix FMHA split-KV for paged-KV with page_block_size < kN0 root 2026-04-01 16:24:19 +00:00
  • 4c5e290378 Add unified attention (42_unified_attention) and topk_softmax_decode root 2026-04-01 16:24:04 +00:00
  • 2bb69a24ea [rocm-libraries] ROCm/rocm-libraries#5776 (commit ee1bbcb) Chinmay Dattanand Kuchinad 2026-04-01 16:22:08 +00:00
  • d71f5d005a [CK] Fix async pivot mismatch in persistent GEMM kernel scheduler (#5776) Chinmay Dattanand Kuchinad 2026-04-01 11:21:20 -05:00
  • 9a49642035 [rocm-libraries] ROCm/rocm-libraries#5776 (commit ee1bbcb) Chinmay Dattanand Kuchinad 2026-04-01 11:21:20 -05:00
  • 820ed2dbb3 [CK] Fix async pivot mismatch in persistent GEMM kernel scheduler (#5776) Chinmay Dattanand Kuchinad 2026-04-01 11:21:20 -05:00
  • 9426f49b52 [rocm-libraries] ROCm/rocm-libraries#6064 (commit cce30ab) Jobbins 2026-04-01 14:35:42 +00:00
  • 359deb01fa [CK] poll develop every 15 minutes for changes (#6064) Jobbins 2026-04-01 08:34:48 -06:00
  • ef2a63047f [rocm-libraries] ROCm/rocm-libraries#6064 (commit cce30ab) Jobbins 2026-04-01 08:34:48 -06:00
  • 6ab0e34ed9 [CK] poll develop every 15 minutes for changes (#6064) Jobbins 2026-04-01 08:34:48 -06:00
  • a502e5a00b [rocm-libraries] ROCm/rocm-libraries#5798 (commit 7acd4e7) Fu-Cheng Tsai 2026-04-01 14:23:38 +00:00
  • e561543dc0 [CK_TILE] Update gfx12 FMHA forward kernel configs (#5798) Fu-Cheng Tsai 2026-04-01 22:22:37 +08:00
  • 37e2cbb735 [rocm-libraries] ROCm/rocm-libraries#5798 (commit 7acd4e7) Fu-Cheng Tsai 2026-04-01 22:22:37 +08:00
  • 47fb489d78 [CK_TILE] Update gfx12 FMHA forward kernel configs (#5798) Fu-Cheng Tsai 2026-04-01 22:22:37 +08:00
  • d9ecf29860 Add float sink_v to the operator() parameters for qs_ks_vs and whole_k_prefetch pipelines whole_k_prefetch_n0loop Qianfeng Zhang 2026-04-01 07:19:20 +00:00
  • 5cf2f4b351 Use in-place version of block_tile_reduce() in whole_k_prefetch pipeline Qianfeng Zhang 2026-03-29 15:35:53 +00:00
  • 42cae2a3fb Add including of iostream in rotating_buffers.hpp Qianfeng Zhang 2026-03-29 13:05:20 +00:00
  • 04c9d64c5f Fix leeked seqstart_v_scale_ptr parameter for grouped-mode MakeArgs() in fmha_fwd Qianfeng Zhang 2026-03-29 13:04:31 +00:00
  • 0bb1420fcf Remove some not very much required interfaces from pipeline problem Qianfeng Zhang 2026-02-24 08:15:39 +00:00
  • d20ed807e4 Fix in comments Qianfeng Zhang 2026-02-24 07:53:50 +00:00
  • a5c29b39b2 Fix sched_barrier mask value Qianfeng Zhang 2026-02-24 07:37:13 +00:00
  • 444a1876a0 Remove un-used index constant definition Qianfeng Zhang 2026-02-14 08:17:39 +00:00
  • 87f62be178 Update in whole_k_prefetch_trload pipeline to prefetch two k_tile for next iteration in the non-whole-k-perfetch path Qianfeng Zhang 2025-12-23 10:23:26 +00:00
  • c738b0f9ff Update to GetNumPrefetchV() for kM0=64 path Qianfeng Zhang 2025-12-23 08:20:08 +00:00
  • 9baf388c08 Move the loading of k_tile for next iteration into the Gemm1 loop (non whole_k_prefetch path in trload pipeline) Qianfeng Zhang 2025-12-23 07:07:15 +00:00
  • f5ba64b595 Update to GetNumPrefetchV() Qianfeng Zhang 2025-12-22 15:35:38 +00:00
  • 0e3b593853 Move the loading of k_file for next iteration into the Gemm1 loop (non whole_k_prefetch path) Qianfeng Zhang 2025-12-22 15:34:10 +00:00
  • c6bad2e8e4 Update to only pre-load one v_tile during Gemm0 loop Qianfeng Zhang 2025-12-22 08:41:54 +00:00
  • f75cc7c415 Update to the non-whole-k-prefetch path in the whoke_k_prefetch pipeline Qianfeng Zhang 2025-12-21 15:13:14 +00:00
  • 2e410c89a4 Fix the static_assert expression in the pipeline Qianfeng Zhang 2025-12-21 12:14:39 +00:00
  • 2ee32b427b Load Q directly from global memory to registers for BlockGemm Qianfeng Zhang 2025-12-20 12:50:18 +00:00
  • bbff386c74 Using is_using_trload_v to check the kUseTrLoad from pipeline Qianfeng Zhang 2025-12-20 10:23:30 +00:00
  • 1c21833837 Add qr_ks_vs_whole_k_prefetch_trload pipeline Qianfeng Zhang 2025-12-18 10:50:30 +00:00
  • da657dcddd Add support of loading QK tiles of hdim96 without padding to hdim128 Qianfeng Zhang 2025-12-17 16:39:15 +00:00
  • 30a9a1b5a0 Adjust in GetNumPrefetchV() Qianfeng Zhang 2025-12-15 15:02:15 +00:00
  • d7f0acd991 Remove replicated codes in the pipeline Qianfeng Zhang 2025-12-15 10:38:15 +00:00
  • eba78658c6 Fix move_tile_window(k_dram_window, ..) step in the pipeline Qianfeng Zhang 2025-12-15 09:54:49 +00:00
  • c570890847 Load Q through Lds Qianfeng Zhang 2025-12-14 15:46:37 +00:00
  • e2125629ce Separate kN0Sub from kK0 to be used for flexible tile tuning for whole_k_prefetch pipeline Qianfeng Zhang 2025-12-09 08:07:35 +00:00