Commit Graph

  • fafb375122 Use packed cast_tile for fp16 Qianfeng Zhang 2025-04-15 14:29:30 +00:00
  • 6686c7af44 Update to partially reduce the register spilling Qianfeng Zhang 2025-04-15 07:22:09 +00:00
  • 459c5565d4 Add IsFirstVLdsBufferOverlapLastKLdsBuffer() check to reduce call of s_barrier() Qianfeng Zhang 2025-04-13 10:58:32 +00:00
  • 8a6c2591b0 Update the in pipeline codes Qianfeng Zhang 2025-04-13 09:43:58 +00:00
  • d360c61200 Fix in calculation of total_flops and update benchmark scripts Qianfeng Zhang 2025-04-13 08:50:00 +00:00
  • 251136cca7 Add output of estimated TFLOPS Qianfeng Zhang 2025-04-09 14:50:18 +00:00
  • 644ea27e0e Update to the scripts and error thresholds Qianfeng Zhang 2025-04-09 10:34:37 +00:00
  • 2a71304bbb Tune the input initialization to avoid over-flow in silu Qianfeng Zhang 2025-04-09 10:03:32 +00:00
  • 9c2dbf8d64 Add benchmark_hstu_attention.sh Qianfeng Zhang 2025-04-09 08:28:05 +00:00
  • cdb0704377 Add several verification test cases Qianfeng Zhang 2025-04-08 16:38:35 +00:00
  • beb6fa8cc1 Fix in kernel and forward dispatch for jagged mode Qianfeng Zhang 2025-04-08 16:37:52 +00:00
  • 24822a4898 Fix in hstu-attention pipeline (which makes some testing cases passed) Qianfeng Zhang 2025-04-08 15:53:08 +00:00
  • 50b0af257c Fixes and updates Qianfeng Zhang 2025-04-07 15:29:23 +00:00
  • 72774b718b Change in HstBlockMasking and kernel/reference codes for using masking Qianfeng Zhang 2025-04-03 14:46:12 +00:00
  • 74a0ec4609 Fix and change in example Qianfeng Zhang 2025-04-03 14:44:36 +00:00
  • 450494945f Add hstu attention kernel implementation, instances and interfaces (building succeeded) Qianfeng Zhang 2025-04-03 08:20:54 +00:00
  • e6b6323b67 fix the jagged mode tensor access in reference_hstu_attention Qianfeng Zhang 2025-03-29 12:55:04 +00:00
  • a19f73c305 Initial reference implementation of hstu attention Qianfeng Zhang 2025-03-28 16:26:43 +00:00
  • db5cb5d19d Update to example_hstu_attention_fwd.cpp Qianfeng Zhang 2026-06-22 09:22:10 +00:00
  • e349a39f3d Rename generate_instances.py to generate_fwd_instances.py Qianfeng Zhang 2026-06-22 08:01:23 +00:00
  • 6670e53589 Fix instruction scheduler Enrico Degregori 2026-06-20 12:21:35 +00:00
  • 424a5f8183 Async parameter in Quant GEMM Enrico Degregori 2026-06-20 09:02:18 +00:00
  • 55e30feac6 [rocm-libraries] ROCm/rocm-libraries#8637 (commit a1a7f5f) Enrico Degregori 2026-06-20 02:08:58 +00:00
  • 01bad4c3d9 [rocm-libraries] ROCm/rocm-libraries#8205 (commit f58120c) Adel Johar 2026-06-19 15:08:04 +00:00
  • c2e187e997 Allow broadcasting of D column vectors in DeviceGemmMultiD_Xdl_CShuffle_V3 Anton Gorenko 2026-06-19 18:08:27 +05:00
  • 016da0d5f0 Support large tensors in quant gemm kernel Anton Gorenko 2026-06-19 16:27:02 +05:00
  • ed8c9dd4f2 Add workaround for inefficient buffer_load to lds on 7.2 Anton Gorenko 2026-06-19 16:20:21 +05:00
  • e11f3a3029 Use literal 0 as scales for unscaled 16x16x128 and 32x32x64 mfma Anton Gorenko 2026-06-19 16:01:56 +05:00
  • f97a34c02b Add operator() overload to GemmPipelineAgBgCrCompAsyncEightWaves to use in QuantGemmKernel Anton Gorenko 2026-06-17 15:46:44 +05:00
  • 01cde459aa Support multiple D in quant gemm kernel Anton Gorenko 2026-06-16 11:42:27 +05:00
  • 5ea73e4075 Impove precision of CShuffle with scales or multi D Anton Gorenko 2026-06-16 11:41:20 +05:00
  • 7c2b979de2 [rocm-libraries] ROCm/rocm-libraries#8573 (commit 04c9f1d) Bartłomiej Kocot 2026-06-19 09:38:44 +00:00
  • 2733e75900 [rocm-libraries] ROCm/rocm-libraries#6565 (commit d41715e) Enrico Degregori 2026-06-19 06:57:14 +00:00
  • 2694adbd55 Remove the using of kSubQKHeaddim Qianfeng Zhang 2026-06-19 05:16:40 +00:00
  • 081fe18c1c [rocm-libraries] ROCm/rocm-libraries#8558 (commit ccfa08b) Brock Hargreaves 2026-06-18 21:18:27 +00:00
  • 8864dcc3a4 [rocm-libraries] ROCm/rocm-libraries#8560 (commit f8362a1) Brock Hargreaves 2026-06-18 21:16:24 +00:00
  • bad7870830 [rocm-libraries] ROCm/rocm-libraries#8508 (commit 5cc3bef) Brock Hargreaves 2026-06-18 18:33:59 +00:00
  • a3a12b8945 [rocm-libraries] ROCm/rocm-libraries#5813 (commit 18b43cf) Sami Remes 2026-06-18 17:05:09 +00:00
  • e2deaaba64 [rocm-libraries] ROCm/rocm-libraries#8591 (commit 5210ae6) Illia Silin 2026-06-18 14:58:10 +00:00
  • 1762eaeaec [rocm-libraries] ROCm/rocm-libraries#8535 (commit a0f47eb) Enrico Degregori 2026-06-18 12:59:59 +00:00
  • cd782f613c Fix buffer load instruction 7.2 test_ck_bd_ticket Enrico Degregori 2026-06-18 10:49:01 +00:00
  • 252c288def Fix scheduling instructions Enrico Degregori 2026-06-18 08:50:56 +00:00
  • 2b68eb63f3 Fix mfma instruction Enrico Degregori 2026-06-18 08:08:11 +00:00
  • 60b276647b [rocm-libraries] ROCm/rocm-libraries#8157 (commit b0d9d39) Ville Pietilä 2026-06-18 01:22:50 +00:00
  • c43b550206 [rocm-libraries] ROCm/rocm-libraries#8202 (commit 0911fa0) Aviral Goel 2026-06-17 16:41:00 +00:00
  • a625506698 Add restriction on the relationship between HstuAttention<xxx>Problem and HstuAttention<xxx>TileSettingClass Qianfeng Zhang 2026-06-17 15:58:25 +00:00
  • 65bef78383 [rocm-libraries] ROCm/rocm-libraries#8518 (commit 1ad69c3) jakpiase 2026-06-17 15:51:36 +00:00
  • b5713be6cd [rocm-libraries] ROCm/rocm-libraries#8501 (commit 54eb5dc) Illia Silin 2026-06-17 14:03:00 +00:00
  • 0765b2631b CK-UA: aggressive pipeline cleanup — drop dead experiments + trim comments juuso-oskari 2026-06-17 11:39:10 +00:00
  • 39182b50eb [rocm-libraries] ROCm/rocm-libraries#8487 (commit 06a73ba) SamiAario-AMD 2026-06-17 11:07:22 +00:00
  • aa5ed1a749 Add operator() overload to GemmPipelineAgBgCrCompAsyncEightWaves to use in QuantGemmKernel Anton Gorenko 2026-06-17 15:46:44 +05:00
  • 382bb198eb CK-UA: freeze docs + comment cleanup (+ gated decode-ring scaffolding) juuso-oskari 2026-06-17 09:14:44 +00:00
  • 5bebfd460f [rocm-libraries] ROCm/rocm-libraries#8492 (commit 46b6a06) damien-lejeune 2026-06-17 06:22:26 +00:00
  • 40ee1ee0af [rocm-libraries] ROCm/rocm-libraries#8262 (commit d4ff8fc) zain/ck-graph-fix-cherry ltqin 2026-06-14 03:11:53 +00:00
  • be398c224f CK-UA: sliding-window page-table cache to lift the max-KV-length ceiling juuso-oskari 2026-06-16 15:19:45 +00:00
  • d9df801be2 CK-UA: bf16/fp16 paged ps128 fast path + single-page rebase for single-issue tiles juuso-oskari 2026-06-16 14:54:27 +00:00
  • b687578c5c Add experiment tile shape for fmha batch prefill fp8 dlejeune/qwen3.5_fmha_batch_prefill_opt2 Damien Lejeune 2026-06-16 10:41:16 +00:00
  • 2c0b7cbb0a [rocm-libraries] ROCm/rocm-libraries#8424 (commit debb669) damien-lejeune 2026-06-16 07:41:58 +00:00
  • 8428732dc2 Support multiple D in quant gemm kernel Anton Gorenko 2026-06-16 11:42:27 +05:00
  • 335f80033b Impove precision of CShuffle with scales or multi D Anton Gorenko 2026-06-16 11:41:20 +05:00
  • 1b649a8d4b [rocm-libraries] ROCm/rocm-libraries#8332 (commit 48c389c) Brock Hargreaves 2026-06-15 17:40:10 +00:00
  • b8440b3aeb [rocm-libraries] ROCm/rocm-libraries#8325 (commit 559eaf6) Andriy Roshchenko 2026-06-15 16:12:33 +00:00
  • 80009b4c82 CK-UA: paged ps128 fast path for fp8 prefill_d128 at contiguous parity juuso-oskari 2026-06-15 12:53:50 +00:00
  • c1f7104852 [rocm-libraries] ROCm/rocm-libraries#6663 (commit f19fc01) Sami Remes 2026-06-15 08:28:55 +00:00
  • 1864287f95 CK-UA: issue WG1's next-tile prefetch from its MATRIX slot; drop PREFETCH_EARLY knob juuso-oskari 2026-06-15 08:26:51 +00:00
  • a49a273a7f CK-UA: revert post-4694 softmax packs (measured -3% regression); restore ~1882 baseline juuso-oskari 2026-06-15 07:54:03 +00:00
  • aab1d219f5 [rocm-libraries] ROCm/rocm-libraries#8350 (commit f92ded1) damien-lejeune 2026-06-15 07:00:35 +00:00
  • 947dcc2606 [rocm-libraries] ROCm/rocm-libraries#5510 (commit 8415c8c) SamiAario-AMD 2026-06-15 06:42:28 +00:00
  • 3eb929334b Add static_assert() in HstuAttentionFwdTileSettingClass Qianfeng Zhang 2026-06-14 13:36:46 +00:00
  • 0954a8f3fa [rocm-libraries] ROCm/rocm-libraries#8262 (commit d4ff8fc) ltqin 2026-06-14 03:11:53 +00:00
  • d912139ca9 CK-UA: split fmha_alu0 into rowmax/shift lambdas (default-off pipelining hook) juuso-oskari 2026-06-13 11:59:21 +00:00
  • 29e0f75e19 CK-UA: packed softmax shift + alu1 rescale; default to fastest fp8 prefill config juuso-oskari 2026-06-13 11:41:19 +00:00
  • 01cca38c8e [rocm-libraries] ROCm/rocm-libraries#8220 (commit 4c04a3a) Johannes Graner 2026-06-13 00:10:50 +00:00
  • 329e589840 [rocm-libraries] ROCm/rocm-libraries#8260 (commit 1139236) John Afaganis 2026-06-12 21:11:59 +00:00
  • 96a7e44832 [rocm-libraries] ROCm/rocm-libraries#8378 (commit d68585d) Brock Hargreaves 2026-06-12 20:11:53 +00:00
  • d450749933 [rocm-libraries] ROCm/rocm-libraries#8357 (commit 800965c) Illia Silin 2026-06-12 19:19:44 +00:00
  • 789ef38093 [rocm-libraries] ROCm/rocm-libraries#8333 (commit 69b3fc1) Illia Silin 2026-06-12 18:19:31 +00:00
  • c0a011a275 Fix the comments in reference_hstu_attention_fwd.hpp Qianfeng Zhang 2026-06-12 14:50:22 +00:00
  • c2601f38b7 [rocm-libraries] ROCm/rocm-libraries#6569 (commit 393049e) Wojciech Laskowski 2026-06-12 12:48:29 +00:00
  • 9332ef0b56 fmha: fp8 per_token_head batch_prefill perf (v_descale fold + GEMM1 s_setprio) AITERKER-112 msaffari-amd 2026-06-12 12:00:43 +00:00
  • e75076c826 [rocm-libraries] ROCm/rocm-libraries#8310 (commit 003bc6b) Enrico Degregori 2026-06-12 11:42:38 +00:00
  • 73b5f5ee38 Remove un-used element-wise functions passed through pipelines' operator() interfaces Qianfeng Zhang 2026-06-12 08:40:18 +00:00
  • c61dc4267e Add tile shape for FMHA batch prefill on MI308X (on hdim=256) dlejeune/qwen3.5_opt_gfx942 Damien Lejeune 2026-06-11 15:11:48 +00:00
  • f947db93fe CK-UA: fix wide-MMA FP8 P relayout (was cvt-only, missing QK-C->PV-A transpose) juuso-oskari 2026-06-12 08:18:21 +00:00
  • bff80fc39e Rename GetKVBlockGemm to GetPVTBlockGemm Qianfeng Zhang 2026-06-12 07:35:53 +00:00
  • de4f776531 Remove the kHasBias==true instances to save building time Qianfeng Zhang 2026-06-12 05:31:09 +00:00
  • d7609923b6 [rocm-libraries] ROCm/rocm-libraries#7919 (commit 061001d) Thrupti Raj Lakshmana Gowda 2026-06-11 20:38:38 +00:00
  • 276863ca87 [rocm-libraries] ROCm/rocm-libraries#8259 (commit df03f10) jefyang1 2026-06-11 17:33:11 +00:00
  • 359f664b25 [rocm-libraries] ROCm/rocm-libraries#6086 (commit d25d8cc) music-dino 2026-06-11 16:22:37 +00:00
  • abd84e64ab CK-UA: correct UA_FA4_SHARED_SPCOMPUTE note (correct-but-insufficient) jukorhon/fa4-kv128-vgpr juuso-oskari 2026-06-11 15:51:08 +00:00
  • c88eeb8c34 CK-UA: mark UA_FA4_SHARED_SPCOMPUTE as incorrect (keep default OFF) juuso-oskari 2026-06-11 15:37:33 +00:00
  • 8abbd21a01 CK-UA: VGPR-pressure toggles for kv128 probing (all default OFF) juuso-oskari 2026-06-11 15:29:04 +00:00
  • 0fdbf8a91d [rocm-libraries] ROCm/rocm-libraries#8272 (commit 1c66ecb) Bartłomiej Kocot 2026-06-11 15:28:21 +00:00
  • 9aa380e6c2 CK-UA: wide 32x32x64 FP8 MMA with cvt-only P relayout + V-read in MATRIX juuso-oskari 2026-06-11 14:41:47 +00:00
  • d7bb3b10cc [fmha-bwd] Flat cu_id remap for arbitrary CTA_NUM + grid Y/Z override env yewang12/bwd_group_persistent_hack_cu Ye Wang 2026-06-11 13:19:51 +00:00
  • 9e0391e610 Exp1 grid-prune: sync-free per-batch SWA dead-K-tile prune (DQDKDV) yewang12/ck-varlen-bwd-det Ye Wang 2026-06-11 07:03:14 -05:00
  • 78b459127b Temporary 3way merge of ck graph fix zain/ck-graph-fix Meekail Zain 2026-06-10 20:32:09 +00:00
  • f0545b5c15 [rocm-libraries] ROCm/rocm-libraries#8132 (commit 57d21a1) BrianHarrisonAMD 2026-06-10 18:57:31 +00:00
  • a433424e08 [rocm-libraries] ROCm/rocm-libraries#8241 (commit cd183df) Illia Silin 2026-06-10 15:37:44 +00:00
  • c7cde5807d Renaming BUILD_HSTU_FOR_GFX95_ONLY to BUILD_HSTU_FOR_GFX95 Qianfeng Zhang 2026-06-10 15:30:08 +00:00