Commit Graph

  • e8842e3c1f Use git ls-files to select candidate files for clang format John Afaganis 2025-09-25 17:24:04 -06:00
  • 6b40ce4074 Fix in GetQKBlockGemm() Qianfeng Zhang 2025-09-27 14:31:24 +00:00
  • 0d053396c5 Merge commit '1edd250115bc3edd987b7d038f61290a0460d0a3' into develop assistant-librarian[bot] 2025-09-27 13:13:37 +00:00
  • bc9362af55 [CK_TILE] Support f32 in FMHA (fwd and bwd) (#2836) Anton Gorenko 2025-09-27 19:03:48 +06:00
  • 8118d84f77 [CK_TILE] Support f32 in FMHA (fwd and bwd) (#2836) Anton Gorenko 2025-09-27 19:03:48 +06:00
  • 1edd250115 [CK_TILE] Support f32 in FMHA (fwd and bwd) (#2836) Anton Gorenko 2025-09-27 19:03:48 +06:00
  • 477a605961 Merge commit 'c6bfd97c2d186fd03866c3f5d460bb680ce667a1' into develop assistant-librarian[bot] 2025-09-27 03:19:57 +00:00
  • 4158d33735 [CK_TILE] FMHA Fix synchronization issue in FWD splitkv combine pipeline (#2934) Anton Gorenko 2025-09-27 09:16:10 +06:00
  • d0142f8223 [CK_TILE] FMHA Fix synchronization issue in FWD splitkv combine pipeline (#2934) Anton Gorenko 2025-09-27 09:16:10 +06:00
  • c6bfd97c2d [CK_TILE] FMHA Fix synchronization issue in FWD splitkv combine pipeline (#2934) Anton Gorenko 2025-09-27 09:16:10 +06:00
  • 2c114d7ccc fix copy-paste bug in get_matrix_b; re-enable all tests in multi_abd (#2939) emezh 2025-09-26 22:55:18 -04:00
  • daabe29bff fix copy-paste bug in get_matrix_b; re-enable all tests in multi_abd (#2939) emezh 2025-09-26 22:55:18 -04:00
  • 2aa06fbd45 fix copy-paste bug in get_matrix_b; re-enable all tests in multi_abd (#2939) emezh 2025-09-26 22:55:18 -04:00
  • 91317bdfe9 Add atomic-free MOE GEMM implementation wave_buffer_resource_patch Ali Nouri 2025-09-26 22:50:37 +00:00
  • fa8baf3ce6 No atomic passes! Ali Nouri 2025-09-26 22:35:44 +00:00
  • 088b4670ae Merge commit 'ee9769616a51ed85edd8860fe5b976cec0cde037' into develop assistant-librarian[bot] 2025-09-26 21:11:12 +00:00
  • f17f16e8ec fix wp gemm bug when permuteN is false (#2935) lalala-sh 2025-09-27 04:28:54 +08:00
  • 857566c8aa fix wp gemm bug when permuteN is false (#2935) lalala-sh 2025-09-27 04:28:54 +08:00
  • ee9769616a fix wp gemm bug when permuteN is false (#2935) lalala-sh 2025-09-27 04:28:54 +08:00
  • bd52e1a07b output of ./scripts/clang-format-overwrite.sh on the files changes in this PR kylasa_kdim_pr Sudhir Kylasa 2025-09-26 18:56:15 +00:00
  • 75d0086d92 Merge branch 'develop' into kylasa_kdim_pr kylasa 2025-09-26 10:39:56 -07:00
  • dd38b01ac5 Merge commit 'a44bea45b205a84552e417a7b069d962d73c6cb1' into develop assistant-librarian[bot] 2025-09-26 17:11:27 +00:00
  • e98edd3322 Integrate Multi D GEMMs into Grouped GEMMs along with unit tests (#2923) Aviral Goel 2025-09-26 12:59:58 -04:00
  • 5ebdd30e58 Integrate Multi D GEMMs into Grouped GEMMs along with unit tests (#2923) Aviral Goel 2025-09-26 12:59:58 -04:00
  • a44bea45b2 Integrate Multi D GEMMs into Grouped GEMMs along with unit tests (#2923) Aviral Goel 2025-09-26 12:59:58 -04:00
  • 77dcfaa687 Merge commit 'e40c0acef25cab3e6b2ac046e76886764fed0239' into develop assistant-librarian[bot] 2025-09-26 16:13:26 +00:00
  • f7bd8c1634 [TheRock CI] Adding MIOpen at HEAD (#2929) Geo Min 2025-09-26 09:08:15 -07:00
  • 5f3f69dfc5 [TheRock CI] Adding MIOpen at HEAD (#2929) Geo Min 2025-09-26 09:08:15 -07:00
  • e40c0acef2 [TheRock CI] Adding MIOpen at HEAD (#2929) Geo Min 2025-09-26 09:08:15 -07:00
  • a8ff84ed1b Disable Rapid Json to be used by Default (#2936) rahjain-amd 2025-09-26 21:35:35 +05:30
  • 8ad7f1b2ca Disable Rapid Json to be used by Default (#2936) rahjain-amd 2025-09-26 21:35:35 +05:30
  • e92e69318e Disable Rapid Json to be used by Default (#2936) rahjain-amd 2025-09-26 21:35:35 +05:30
  • 3f55777e66 Update CODEOWNERS Christopher Millette 2025-09-26 09:32:34 -06:00
  • 659a331d36 Update CODEOWNERS Christopher Millette 2025-09-26 09:32:34 -06:00
  • f92b3c7a1e Update CODEOWNERS Christopher Millette 2025-09-26 09:32:34 -06:00
  • 558054eadb WIP: Simplify conv to gemm transformations and handle K > 1 and C > 1 cases. Ville Pietilä 2025-09-26 13:38:24 +00:00
  • e8b2ba4c92 Merge branch 'develop' into _bgp_v5_last_working_merging_todo try_merge_with_multiple_abd Aleksander Dudek 2025-09-26 12:11:20 +00:00
  • 8babf7195a Fix strides in 1D conv to gemm transformation. Ville Pietilä 2025-09-26 09:38:11 +00:00
  • f709601bbc Merge commit '32773fe5cb176efd2fcbb361f183164fc6525d8a' into develop assistant-librarian[bot] 2025-09-26 09:12:43 +00:00
  • a3296e00b8 [CK_TILE] FMHA BWD Pad HDim to a Multiple of 8 (#2918) Yi DING 2025-09-26 16:42:59 +08:00
  • 5d7bc8b578 [CK_TILE] FMHA BWD Pad HDim to a Multiple of 8 (#2918) Yi DING 2025-09-26 16:42:59 +08:00
  • 32773fe5cb [CK_TILE] FMHA BWD Pad HDim to a Multiple of 8 (#2918) Yi DING 2025-09-26 16:42:59 +08:00
  • 354dd5039c Add compile check for assumed row-mjor layout. Ville Pietilä 2025-09-26 08:39:39 +00:00
  • 2c4816fe32 code clean valarLip 2025-09-26 08:19:11 +00:00
  • e4391834ad fix wp gemm bug when permuteN is false valarLip 2025-09-26 08:11:59 +00:00
  • 1764c77fb2 Enable running multiple GEMM batches of merged conv groups. Ville Pietilä 2025-09-26 07:51:29 +00:00
  • 97a9b657df Moved the NumWaveGroups condition to the user level files from the include directory. Sudhir Kylasa 2025-09-26 07:37:57 +00:00
  • 11262543b7 Merge commit '518d24e6628eb0c91a56748d26ac8910813c8dcb' into develop assistant-librarian[bot] 2025-09-26 05:13:10 +00:00
  • 00b80bb4a1 Add sequence padding and variable length support in fmha (#2932) Jeff Huang 2025-09-26 12:36:27 +08:00
  • 0957b78f76 Add sequence padding and variable length support in fmha (#2932) Jeff Huang 2025-09-26 12:36:27 +08:00
  • 518d24e662 Add sequence padding and variable length support in fmha (#2932) Jeff Huang 2025-09-26 12:36:27 +08:00
  • 19f49ee63e Merge commit 'b0a2d99d100f2e4212ebbed080acb49a404035ab' into develop assistant-librarian[bot] 2025-09-26 01:40:00 +00:00
  • cf9fbe9f20 use inline function in hpp (#2922) kyle-256 2025-09-26 09:29:26 +08:00
  • 3e6c83e13a use inline function in hpp (#2922) kyle-256 2025-09-26 09:29:26 +08:00
  • b0a2d99d10 use inline function in hpp (#2922) kyle-256 2025-09-26 09:29:26 +08:00
  • f628be2ed1 Verify HostTensorDescriptor when it is created (#2829) emezh 2025-09-25 21:22:13 -04:00
  • 3c207a18b0 Verify HostTensorDescriptor when it is created (#2829) emezh 2025-09-25 21:22:13 -04:00
  • db2524be2d Verify HostTensorDescriptor when it is created (#2829) emezh 2025-09-25 21:22:13 -04:00
  • a5889aa9d6 save tmp tenpercent/async_copy_gemm_v3 Max Podkorytov 2025-09-25 16:47:16 -05:00
  • e575ac4332 Merge commit 'ec4d16b991d16379b785f61b0043ebcfa3fb0914' into develop assistant-librarian[bot] 2025-09-25 23:11:46 +00:00
  • e94b2f02ac Enable CI on gfx1100 (#2930) Illia Silin 2025-09-25 16:10:54 -07:00
  • 4567c988ca Enable CI on gfx1100 (#2930) Illia Silin 2025-09-25 16:10:54 -07:00
  • ec4d16b991 Enable CI on gfx1100 (#2930) Illia Silin 2025-09-25 16:10:54 -07:00
  • 396f701558 Fix compilation issues with other instances of CShuffle usage. Sudhir Kylasa 2025-09-25 22:22:29 +00:00
  • d8fbe774f4 save tmp tenpercent/cktile_rename_f8 Max Podkorytov 2025-09-19 15:55:14 +00:00
  • b8448ab68d Merge commit '8c1a95991330118930f23e6a2ba8e76068d8ca22' into develop assistant-librarian[bot] 2025-09-25 18:15:45 +00:00
  • 768e496178 use default docker for build/test on gfx950 (#2928) Illia Silin 2025-09-25 10:40:45 -07:00
  • a4f310c7b1 use default docker for build/test on gfx950 (#2928) Illia Silin 2025-09-25 10:40:45 -07:00
  • 8c1a959913 use default docker for build/test on gfx950 (#2928) Illia Silin 2025-09-25 10:40:45 -07:00
  • 30673dba81 Congma/ck tile/remove cpp 20 code (#2873) Cong Ma 2025-09-25 11:34:28 -06:00
  • 578566f809 Congma/ck tile/remove cpp 20 code (#2873) Cong Ma 2025-09-25 11:34:28 -06:00
  • a5d1e25ec7 Congma/ck tile/remove cpp 20 code (#2873) Cong Ma 2025-09-25 11:34:28 -06:00
  • 9ed178a93e Fix for Add the API to load SGPR (#2913) Khushbu Agarwal 2025-09-25 10:32:42 -07:00
  • bb5eeef2af Fix for Add the API to load SGPR (#2913) Khushbu Agarwal 2025-09-25 10:32:42 -07:00
  • b56e5d1d79 Fix for Add the API to load SGPR (#2913) Khushbu Agarwal 2025-09-25 10:32:42 -07:00
  • ff8105704b Add AITER test_mha_varlen (#2927) Illia Silin 2025-09-25 10:00:20 -07:00
  • 5a39b14c52 Add AITER test_mha_varlen (#2927) Illia Silin 2025-09-25 10:00:20 -07:00
  • 64e61b8647 Add AITER test_mha_varlen (#2927) Illia Silin 2025-09-25 10:00:20 -07:00
  • 9aa69136e7 fix clang format (#2926) Illia Silin 2025-09-25 09:35:35 -07:00
  • 80f0af1e91 fix clang format (#2926) Illia Silin 2025-09-25 09:35:35 -07:00
  • 9f6fc9fe09 fix clang format (#2926) Illia Silin 2025-09-25 09:35:35 -07:00
  • 27b96b15c4 Simplify the warp_gemm definitions in GetQKBlockGemm and GetKVBlockGemm Qianfeng Zhang 2025-09-25 15:38:55 +00:00
  • 9cc8100783 Add documentation to conv_signature.hpp. John Shumway 2025-09-25 15:37:47 +00:00
  • 0e513e86a4 Merge commit '929291741d44e05ab3b199f836d9be97c6e294f8' into develop assistant-librarian[bot] 2025-09-25 15:27:24 +00:00
  • b864c077ed Code clean-up for bwd tensor transformations. Ville Pietilä 2025-09-25 15:09:08 +00:00
  • 62d78c7bba [Jenkins] Remove 'Jenkins - ' prefix (#2920) Jobbins 2025-09-25 09:08:29 -06:00
  • b7a9ea456b [Jenkins] Remove 'Jenkins - ' prefix (#2920) Jobbins 2025-09-25 09:08:29 -06:00
  • 929291741d [Jenkins] Remove 'Jenkins - ' prefix (#2920) Jobbins 2025-09-25 09:08:29 -06:00
  • 9aa3c6db82 Cleanup and format Aleksander Dudek 2025-09-25 12:25:28 +00:00
  • 8019d4d87a Working BlockwiseGemmPipelineV5 as V6 Aleksander Dudek 2025-09-25 06:24:58 -05:00
  • 1cf47ee477 Shard grouped conv bwd data instances Bartlomiej Kocot 2025-09-25 10:42:46 +00:00
  • 0ea3268d5d Remove debug and other dead code. Ville Pietilä 2025-09-25 09:41:33 +00:00
  • cc7433efc6 Add more comments, disable debug code. Ville Pietilä 2025-09-25 09:37:15 +00:00
  • 97f842f2c6 Fully functional LDS to global mem transfer using tensor descriptor and tile distribution encoding. Ville Pietilä 2025-09-25 09:30:50 +00:00
  • c5056b4f94 change codegen for split fp8 ck_tile/fmha_in_fp8_split ltqin 2025-09-25 08:34:40 +00:00
  • 9904d0f601 Merge branch 'develop' of https://github.com/ROCm/composable_kernel into fmha_bwd_trload_mfma32 fmha_bwd_trload_mfma32 aska-0096 2025-09-25 08:27:34 +00:00
  • e9f31e99b3 Merge branch 'develop' of https://github.com/ROCm/composable_kernel into fmha_fwd_test_all_hdim fmha_fwd_test_all_hdim aska-0096 2025-09-25 06:50:03 +00:00
  • 02375d4cf0 revert changes in smoke test. disable v col-major fmha fwd tests in gtest aska-0096 2025-09-25 06:49:04 +00:00
  • 51953569f2 remove hdim(192,128) codegen limitation, for support wider range of cases. aska-0096 2025-09-25 05:41:13 +00:00
  • 9d8734c878 Merge commit 'ab22f91a7c63a34af3198411d064a760b1edebbc' into develop assistant-librarian[bot] 2025-09-25 03:25:33 +00:00