Commit Graph

  • 903a169c5e Using explicit vgpr-saved partition_index with store_tile(lds_window, ...) Qianfeng Zhang 2025-12-08 04:54:22 +00:00
  • 02f3e5f6c4 Refine the interleaving in the loop of Gemm0 Qianfeng Zhang 2025-12-07 12:24:45 +00:00
  • 4779ba1497 [Performance] Change __builtin_amdgcn_sched_barrier() in block_gemm Qianfeng Zhang 2025-12-07 10:11:43 +00:00
  • 733acaf433 Simplify the block_gemm codes Qianfeng Zhang 2025-12-06 14:33:01 +00:00
  • 5f5cb442e0 Switch the codes based on the iteration index (first/intermediate/last) Qianfeng Zhang 2025-12-05 15:58:33 +00:00
  • b43847fc87 Change in GetKVBlockGemm to let gemm1 to use WarpTile-16x16x16/32x32x8 on mi350 Qianfeng Zhang 2025-12-05 02:04:27 +00:00
  • 5c2aa9367b Add prefetching whole next iteration K path in the pipeline Qianfeng Zhang 2025-12-04 10:25:16 +00:00
  • f69c1d0318 Initial re-implementation of pipeline qr_ks_vs_whole_k_prefetch in looping Gemm0 along n0 dimension Qianfeng Zhang 2025-12-04 09:09:47 +00:00
  • b1a4d7cb3e updated mxfp4 moe gemm2 config (#2330) Mingtao Gu 2025-10-02 03:32:55 +08:00
  • 12d4b1500c Rangify check_err() (#444) Po Yen Chen 2022-11-12 01:39:39 +08:00
  • 47b403e8f9 Update develop (#5) (#6) Chao Liu 2021-08-06 16:13:07 -05:00
  • 119712bd90 [rocm-libraries] ROCm/rocm-libraries#4469 (commit 0844cb0) aledudek 2026-04-01 07:32:36 +00:00
  • c672f410e6 [CK_TILE] Add pooling in tile_engine (#4469) aledudek 2026-04-01 09:31:46 +02:00
  • 83cda61eab [rocm-libraries] ROCm/rocm-libraries#4469 (commit 0844cb0) aledudek 2026-04-01 09:31:46 +02:00
  • 357a140e7b [CK_TILE] Add pooling in tile_engine (#4469) zqf_base_develop aledudek 2026-04-01 09:31:46 +02:00
  • 791afc6465 [rocm-libraries] ROCm/rocm-libraries#5991 (commit 8d85e8e) Yi DING 2026-04-01 05:45:19 +00:00
  • 31e1965997 [CK_TILE] Fix FMHA BWD IGLP incorrect results due to AGPR misallocation (#5991) Yi DING 2026-04-01 13:44:04 +08:00
  • 4076ee116f [rocm-libraries] ROCm/rocm-libraries#5991 (commit 8d85e8e) Yi DING 2026-04-01 13:44:04 +08:00
  • 9b8b2456b4 [CK_TILE] Fix FMHA BWD IGLP incorrect results due to AGPR misallocation (#5991) Yi DING 2026-04-01 13:44:04 +08:00
  • f6e7493055 Single wave WarpGemmMfmaF16F16F32M32N32K16 tutorial ecamartins/ck_tile_warp_gemm_play Emily Martins 2026-03-31 20:12:17 +00:00
  • a33b5be1b9 [rocm-libraries] ROCm/rocm-libraries#6022 (commit 54b284a) Estevan Vedovelli 2026-03-31 15:19:43 +00:00
  • d5950c045b [CK] contraction: extend GetTypeString() to include layout-differentiating params (#6022) Estevan Vedovelli 2026-03-31 11:18:11 -04:00
  • 2303d0aee7 [rocm-libraries] ROCm/rocm-libraries#6022 (commit 54b284a) Estevan Vedovelli 2026-03-31 11:18:11 -04:00
  • 2510e7b238 [CK] contraction: extend GetTypeString() to include layout-differentiating params (#6022) Estevan Vedovelli 2026-03-31 11:18:11 -04:00
  • ef4ff4667d [rocm-libraries] ROCm/rocm-libraries#5842 (commit 04c5690) Bartłomiej Kocot 2026-03-31 08:03:41 +00:00
  • 3f31a8931f [CK][CK Tile] Force padding for atomic_add bf16 C tensor (#5842) Bartłomiej Kocot 2026-03-31 10:02:24 +02:00
  • 1f30eb4a54 [rocm-libraries] ROCm/rocm-libraries#5842 (commit 04c5690) Bartłomiej Kocot 2026-03-31 10:02:24 +02:00
  • f14ee90152 [CK][CK Tile] Force padding for atomic_add bf16 C tensor (#5842) Bartłomiej Kocot 2026-03-31 10:02:24 +02:00
  • 66dc81d530 [rocm-libraries] ROCm/rocm-libraries#5729 (commit 516c974) jakpiase 2026-03-31 03:40:25 +00:00
  • be6d9bafa8 [CK_TILE] Changed cshuffle LDS descriptor to naive layout (#5729) jakpiase 2026-03-31 05:39:03 +02:00
  • e9b8edc28c [rocm-libraries] ROCm/rocm-libraries#5729 (commit 516c974) jakpiase 2026-03-31 05:39:03 +02:00
  • 0b98317983 [CK_TILE] Changed cshuffle LDS descriptor to naive layout (#5729) jakpiase 2026-03-31 05:39:03 +02:00
  • 4af7e472a3 Add unified attention kernel on top of CK develop aghamari/ua-on-develop Amir Ghamarian 2026-03-30 17:35:41 +00:00
  • 86f7ebcf27 Add bs32 narrow tier: 2-warp 16x16 MFMA kBlockM=32 for decode aghamari/unified-attention-decode-opt Amir Ghamarian 2026-03-30 17:04:38 +00:00
  • e21b915381 Fix bs32 tiny tier race condition: promote to small tier Amir Ghamarian 2026-03-30 15:10:00 +00:00
  • 6004d045ea docs(fmha): explain the difference between fp8, fp8bf16 and fp8fp32 precision modes copilot/update-fp8-support-in-readme copilot-swe-agent[bot] 2026-03-30 14:43:36 +00:00
  • a809502aaa Update README: FP8 is no longer experimental, document quantization parameters copilot-swe-agent[bot] 2026-03-30 14:38:12 +00:00
  • e6b8094f94 [rocm-libraries] ROCm/rocm-libraries#5921 (commit 032ac1b) copilot/move-pr-3723-to-rocm-libraries Illia Silin 2026-03-30 14:20:20 +00:00
  • 1dd32787a5 [CK] fix clang lifetimebound errors with staging compiler (#5921) Illia Silin 2026-03-30 07:19:32 -07:00
  • 70e4696f01 [rocm-libraries] ROCm/rocm-libraries#5921 (commit 032ac1b) Illia Silin 2026-03-30 07:19:32 -07:00
  • 3873bf3b91 [CK] fix clang lifetimebound errors with staging compiler (#5921) Illia Silin 2026-03-30 07:19:32 -07:00
  • 2dcae9d173 [rocm-libraries] ROCm/rocm-libraries#5977 (commit 794bea7) Hosang Yoon 2026-03-30 14:19:19 +00:00
  • 929041f262 [CK_TILE] Fix Windows build in FMHA head grouping (#5977) Hosang Yoon 2026-03-30 10:18:40 -04:00
  • 0452c865d5 [rocm-libraries] ROCm/rocm-libraries#5977 (commit 794bea7) Hosang Yoon 2026-03-30 10:18:40 -04:00
  • 5844015670 [CK_TILE] Fix Windows build in FMHA head grouping (#5977) Hosang Yoon 2026-03-30 10:18:40 -04:00
  • 5b591af58e Add block_size=32 support and fix int32 overflow in unified attention Amir Ghamarian 2026-03-30 14:16:17 +00:00
  • db98d0e5aa Use direct sweep_tile in decode pipeline. Amir Ghamarian 2026-03-30 10:24:09 +00:00
  • 7968368d92 [rocm-libraries] ROCm/rocm-libraries#5918 (commit a7e2c67) Jeff Huang 2026-03-30 10:21:33 +00:00
  • a52ed96010 [CK][CK_TILE] Add fp8bf16 hdim=256 tile for batch prefill (#5918) Jeff Huang 2026-03-30 18:20:27 +08:00
  • 1cd6d11281 [rocm-libraries] ROCm/rocm-libraries#5918 (commit a7e2c67) Jeff Huang 2026-03-30 18:20:27 +08:00
  • fa912ed457 [CK][CK_TILE] Add fp8bf16 hdim=256 tile for batch prefill (#5918) Jeff Huang 2026-03-30 18:20:27 +08:00
  • fb64a4453c [rocm-libraries] ROCm/rocm-libraries#5915 (commit a72cf7d) Yi DING 2026-03-30 01:45:16 +00:00
  • 42b3728f46 [CK_TILE] Fix FMHA BWD register pressure by wrapping num_total_loop with amd_wave_read_first_lane (#5915) Yi DING 2026-03-30 09:44:35 +08:00
  • 620f20cbf6 [rocm-libraries] ROCm/rocm-libraries#5915 (commit a72cf7d) Yi DING 2026-03-30 09:44:35 +08:00
  • 4a1abd0e31 [CK_TILE] Fix FMHA BWD register pressure by wrapping num_total_loop with amd_wave_read_first_lane (#5915) Yi DING 2026-03-30 09:44:35 +08:00
  • d93efe1b61 Add fused topk_softmax_decode kernel for M=1 MoE decode Amir Ghamarian 2026-03-29 18:06:03 +00:00
  • 423cc72bc4 Move the calling of mask.GetTileRangeAlongX() to the kernel Qianfeng Zhang 2026-03-28 14:19:22 +00:00
  • a7ded14537 Add optimization summary document for unified attention decode work Amir Ghamarian 2026-03-28 14:10:29 +00:00
  • ea157f6244 Route all prefill to 4-warp kBlockM=128 kernel Amir Ghamarian 2026-03-28 13:52:42 +00:00
  • 33b2015939 Add 16x16 MFMA tiny decode kernel (1 warp, kBlockM=16, kBlockQ=2) Amir Ghamarian 2026-03-28 12:19:34 +00:00
  • 5f9b03746d Expand medium tier dispatch threshold for multi-seq prefill Amir Ghamarian 2026-03-28 11:28:21 +00:00
  • b6bbada9f1 [rocm-libraries] ROCm/rocm-libraries#5639 (commit a65e645) Jan Patrick Lehr 2026-03-28 11:20:51 +00:00
  • 07aacceacd [CK] More lifetime-warning suppression (#5639) Jan Patrick Lehr 2026-03-28 12:19:46 +01:00
  • 2457ee6395 [rocm-libraries] ROCm/rocm-libraries#5639 (commit a65e645) Jan Patrick Lehr 2026-03-28 12:19:46 +01:00
  • d1327bedb7 [CK] More lifetime-warning suppression (#5639) Jan Patrick Lehr 2026-03-28 12:19:46 +01:00
  • b3877ba60a Phase 3: early exit, 3-tier dispatch, and 2D decode grid Amir Ghamarian 2026-03-28 11:16:30 +00:00
  • ae1d09f545 Add 2-warp decode kernel with kBlockM=64 for minimal tile waste Amir Ghamarian 2026-03-28 10:57:10 +00:00
  • 8d396d29f0 Add async prefetch overlap to single-warp-group pipeline Amir Ghamarian 2026-03-28 10:47:45 +00:00
  • 583b017321 Add decode-tuned unified attention kernel (4 warps, kBlockM=128) Amir Ghamarian 2026-03-28 10:41:27 +00:00
  • 3b55a05e71 [rocm-libraries] ROCm/rocm-libraries#5849 (commit d9b89b2) Linjun-AMD 2026-03-27 20:37:23 +00:00
  • d8c79ad9e8 [CK_TILE ]Revert "[CK_TILE] Enable MXFP6 for MX GEMM op (#5095)" (#5849) Linjun-AMD 2026-03-28 04:36:39 +08:00
  • 7469320248 [rocm-libraries] ROCm/rocm-libraries#5849 (commit d9b89b2) Linjun-AMD 2026-03-28 04:36:39 +08:00
  • e40a675f74 [CK_TILE ]Revert "[CK_TILE] Enable MXFP6 for MX GEMM op (#5095)" (#5849) Linjun-AMD 2026-03-28 04:36:39 +08:00
  • 7988f0a3ac feat(tutorial): add 01_naive_gemm tutorial example ck/aviralgoel/add-naive-gemm-tutorial Aviral Goel 2026-03-27 16:25:03 -04:00
  • c28d0033d7 [rocm-libraries] ROCm/rocm-libraries#5785 (commit d8ecfc1) Bartłomiej Kocot 2026-03-27 15:38:21 +00:00
  • 48e0d510f0 [CK] Fix min k_batch calculation in conv kernels (#5785) Bartłomiej Kocot 2026-03-27 16:37:37 +01:00
  • 4411c11a13 [rocm-libraries] ROCm/rocm-libraries#5785 (commit d8ecfc1) Bartłomiej Kocot 2026-03-27 16:37:37 +01:00
  • dad85b964c [CK] Fix min k_batch calculation in conv kernels (#5785) Bartłomiej Kocot 2026-03-27 16:37:37 +01:00
  • 4c926497ad [rocm-libraries] ROCm/rocm-libraries#5829 (commit 19b2813) Illia Silin 2026-03-27 15:37:21 +00:00
  • 0757877bc7 [CK] Fix error in dockerfile when building staging compiler. (#5829) Illia Silin 2026-03-27 08:36:22 -07:00
  • 39ad93cb8c [rocm-libraries] ROCm/rocm-libraries#5829 (commit 19b2813) Illia Silin 2026-03-27 08:36:22 -07:00
  • 65618ba39d [CK] Fix error in dockerfile when building staging compiler. (#5829) Illia Silin 2026-03-27 08:36:22 -07:00
  • bbc748defe Add unified attention d64/GQA-8 kernel instances and fix BLOCK_SIZE for small head dims The unified attention kernel previously only supported head_size=128 with MHA (NumQPerKV=1). This change adds support for head_size=64 with GQA-8 (NumQPerKV=8), which is the configuration used by models like DeepSeek-V3/R1 (64 query heads, 8 KV heads, head_dim=64). Changes: - Add 4 new kernel instance files for d64/GQA-8: unified_attention_d64_{bf16,fp16}_{nmask,mask}_gqa8.cpp - Add d64/GQA-8 dispatch path in unified_attention.cpp - Fix BLOCK_SIZE (kPageBlockSize) in unified_attention_kernel_traits: compute from HEAD_SIZE instead of hardcoding 32. For HeadSize<=64, BLOCK_SIZE must be 64 to guarantee NumIssues>=1 on gfx950. With 128-bit vector loads (KVector=8), LaneGroups*NumWarps=128 exceeds kPageBlockSize=32 when HeadSize=64, causing a division-by-zero in the LDS tile descriptor constexpr evaluation. tianxing/unified-attention Amir Ghamarian 2026-03-27 09:41:10 -05:00
  • 5cd4b441ab Fixes compilation issues for unified attention Amir Ghamarian 2026-03-27 08:03:06 -05:00
  • 58475d3f45 [rocm-libraries] ROCm/rocm-libraries#5393 (commit d51b649) Johannes Graner 2026-03-27 09:18:14 +00:00
  • 94bc6ab3ab [CK Tile] StreamK support for Bwd Weight grouped convolutions (#5393) Johannes Graner 2026-03-27 10:17:10 +01:00
  • d219484b46 [rocm-libraries] ROCm/rocm-libraries#5393 (commit d51b649) Johannes Graner 2026-03-27 10:17:10 +01:00
  • c60514f371 [CK Tile] StreamK support for Bwd Weight grouped convolutions (#5393) Johannes Graner 2026-03-27 10:17:10 +01:00
  • 36f2ec23f5 [rocm-libraries] ROCm/rocm-libraries#5445 (commit 2cdbf8b) arai713 2026-03-27 08:13:27 +00:00
  • cfd274a184 [CK_TILE] Support for CompV4 pipeline in Stream-K GEMM (#5445) arai713 2026-03-27 01:12:09 -07:00
  • 04eb0b199b [rocm-libraries] ROCm/rocm-libraries#5445 (commit 2cdbf8b) arai713 2026-03-27 01:12:09 -07:00
  • 54272c6fa6 [CK_TILE] Support for CompV4 pipeline in Stream-K GEMM (#5445) arai713 2026-03-27 01:12:09 -07:00
  • 47a04fda08 [rocm-libraries] ROCm/rocm-libraries#5790 (commit c132b5a) Yi DING 2026-03-27 07:54:53 +00:00
  • f15df65126 [CK_TILE] Fix NaN for FMHA BWD When seq_q=0 (#5790) Yi DING 2026-03-27 15:54:01 +08:00
  • dea23b31b4 [rocm-libraries] ROCm/rocm-libraries#5790 (commit c132b5a) Yi DING 2026-03-27 15:54:01 +08:00
  • 8554618d6a [CK_TILE] Fix NaN for FMHA BWD When seq_q=0 (#5790) Yi DING 2026-03-27 15:54:01 +08:00
  • e2470e837a [rocm-libraries] ROCm/rocm-libraries#5880 (commit a6b6c05) Yaswanth Raparti 2026-03-27 06:34:12 +00:00
  • c379065995 [CK][CK_TILE] Fix CTest parsing to handle all test number formats (#5880) Yaswanth Raparti 2026-03-26 23:32:51 -07:00
  • d00411352e [rocm-libraries] ROCm/rocm-libraries#5880 (commit a6b6c05) Yaswanth Raparti 2026-03-26 23:32:51 -07:00
  • f15c70365c [CK][CK_TILE] Fix CTest parsing to handle all test number formats (#5880) Yaswanth Raparti 2026-03-26 23:32:51 -07:00