Commit Graph

  • 700b2ec9c0 Update AMD buffer coherency (#3403) Bartłomiej Kocot 2025-12-18 10:16:22 +01:00
  • 792c3eb377 Merge commit '15e81397a45d82e2c3032ac7b4e8a7ac0f66590a' into develop assistant-librarian[bot] 2025-12-18 09:16:13 +00:00
  • 4985afb03c adap gemm_mx_kernel.hpp from flatmm, comment changes needed to mx pipeline from flatmm Sami Remes 2025-12-18 04:06:04 -05:00
  • 962966cf5e [CK_TILE] Epilogue chaining (Lwpck 3373) (#2773) Yashvardhan Agarwal 2025-12-18 11:02:02 +02:00
  • 1cc7d01ea8 [CK_TILE] Epilogue chaining (Lwpck 3373) (#2773) Yashvardhan Agarwal 2025-12-18 11:02:02 +02:00
  • 15e81397a4 [CK_TILE] Epilogue chaining (Lwpck 3373) (#2773) Yashvardhan Agarwal 2025-12-18 11:02:02 +02:00
  • a84d4d52bd Merge commit 'bfac64953fd4a91d1f37a473d5849e38a9ce6852' into develop assistant-librarian[bot] 2025-12-18 08:16:15 +00:00
  • 97556d24f2 [CK_TILE][FMHA] Add logits soft-capping support for FAv3 (WIP) (#3355) Po Yen Chen 2025-12-18 16:08:45 +08:00
  • 4dfee8500f [CK_TILE][FMHA] Add logits soft-capping support for FAv3 (WIP) (#3355) Po Yen Chen 2025-12-18 16:08:45 +08:00
  • bfac64953f [CK_TILE][FMHA] Add logits soft-capping support for FAv3 (WIP) (#3355) Po Yen Chen 2025-12-18 16:08:45 +08:00
  • ba29aebebd Merge commit 'bb8445dca8a43fe37b9dd35c04bda98d33115399' into develop assistant-librarian[bot] 2025-12-18 07:15:19 +00:00
  • 27279f00df [CK] Integrate GPU reference into ckProfiler for convolutions (#3379) Johannes Graner 2025-12-18 07:59:45 +01:00
  • f8cedacfb9 [CK] Integrate GPU reference into ckProfiler for convolutions (#3379) Johannes Graner 2025-12-18 07:59:45 +01:00
  • bb8445dca8 [CK] Integrate GPU reference into ckProfiler for convolutions (#3379) Johannes Graner 2025-12-18 07:59:45 +01:00
  • 989ba1aa01 fix clang format zanzhang 2025-12-18 10:25:39 +08:00
  • 2f3874b515 update ck_tile moe zanzhang 2025-12-18 09:39:49 +08:00
  • 9ae1e18628 Merge branch 'develop' into dev/a8w4_and_a8w8splitk felix 2025-12-18 08:56:35 +08:00
  • 4176fc6fe1 Merge branch 'develop' into sparse_attention_VSA jiangyon.ren 2025-12-18 08:41:14 +08:00
  • 334ae1c494 Merge commit '87dd073887933fc2c75c234871e3885cee970a98' into develop assistant-librarian[bot] 2025-12-18 00:34:53 +00:00
  • 86e0049300 Wmma support for grouped convolution bwd weight (#2947) Enrico Degregori 2025-12-18 00:58:58 +01:00
  • c5ef3cfa69 Wmma support for grouped convolution bwd weight (#2947) Enrico Degregori 2025-12-18 00:58:58 +01:00
  • 87dd073887 Wmma support for grouped convolution bwd weight (#2947) Enrico Degregori 2025-12-18 00:58:58 +01:00
  • c7a7629d2e Enabling gfx950 in streamk for Tile Engine ThruptiRajLakshmanaGowda 2025-12-17 21:22:24 +00:00
  • 3c59d702ca Merge commit 'f4729de3953f5233c716293eafdbcd17dc878ccf' into develop assistant-librarian[bot] 2025-12-17 20:14:02 +00:00
  • 7ad4a687fc details from org var (#3431) Geo Min 2025-12-17 11:54:13 -08:00
  • e43a252d19 details from org var (#3431) Geo Min 2025-12-17 11:54:13 -08:00
  • f4729de395 details from org var (#3431) Geo Min 2025-12-17 11:54:13 -08:00
  • 83dc6ad263 [ck_tile] refactor reduce kernel (#3257) Yashvardhan Agarwal 2025-12-17 21:46:08 +02:00
  • d73a2287f3 [ck_tile] refactor reduce kernel (#3257) Yashvardhan Agarwal 2025-12-17 21:46:08 +02:00
  • ea10a78203 [ck_tile] refactor reduce kernel (#3257) Yashvardhan Agarwal 2025-12-17 21:46:08 +02:00
  • c5faecd894 Merge commit '92653168c2b276d4467320f5bdff5ec6cbddf4e6' into develop assistant-librarian[bot] 2025-12-17 17:16:17 +00:00
  • c8397e8ef2 flashattention fwd add (80, 96) instance (#3415) ltqin 2025-12-18 01:16:11 +08:00
  • fca34268d1 flashattention fwd add (80, 96) instance (#3415) ltqin 2025-12-18 01:16:11 +08:00
  • 92653168c2 flashattention fwd add (80, 96) instance (#3415) ltqin 2025-12-18 01:16:11 +08:00
  • e404594325 Fix minor issues in cmake-ck-dev script (#3438) Matti Eskelinen 2025-12-17 18:57:21 +02:00
  • dec39c1165 Fix minor issues in cmake-ck-dev script (#3438) Matti Eskelinen 2025-12-17 18:57:21 +02:00
  • fe3d52d9b0 Fix minor issues in cmake-ck-dev script (#3438) Matti Eskelinen 2025-12-17 18:57:21 +02:00
  • 76d5fb93fe Add rocm to prefix path for codegen (#3404) music-dino 2025-12-17 17:51:13 +01:00
  • bc8ec7697e Add rocm to prefix path for codegen (#3404) music-dino 2025-12-17 17:51:13 +01:00
  • 55c2886b17 Add rocm to prefix path for codegen (#3404) music-dino 2025-12-17 17:51:13 +01:00
  • c92c3ac29d [CK] Evened out the wording in ed out the wording in the changelog (#3418) spolifroni-amd 2025-12-17 11:48:56 -05:00
  • f32b4dea6b [CK] Evened out the wording in ed out the wording in the changelog (#3418) spolifroni-amd 2025-12-17 11:48:56 -05:00
  • 871c2ece2d [CK] Evened out the wording in ed out the wording in the changelog (#3418) spolifroni-amd 2025-12-17 11:48:56 -05:00
  • 997da62343 Adding architecture support for Tile Engine ThruptiRajLakshmanaGowda 2025-12-17 16:32:38 +00:00
  • 97b2015929 Fix FMHA fp8 hdim=64 incorrect result in MI200 (#3423) rocking 2025-12-18 00:16:54 +08:00
  • 4016b27f5a Fix FMHA fp8 hdim=64 incorrect result in MI200 (#3423) rocking 2025-12-18 00:16:54 +08:00
  • 292f87aa03 Fix FMHA fp8 hdim=64 incorrect result in MI200 (#3423) rocking 2025-12-18 00:16:54 +08:00
  • 2de39368c2 Adding sscache stats monitoring (#3428) andrew clark 2025-12-17 09:15:27 -07:00
  • 23d11e9792 Adding sscache stats monitoring (#3428) andrew clark 2025-12-17 09:15:27 -07:00
  • e67cd7edeb Adding sscache stats monitoring (#3428) andrew clark 2025-12-17 09:15:27 -07:00
  • 28b493d331 Merge commit '0500fcc017cda3ffab01af1027189c9b7722645b' into develop assistant-librarian[bot] 2025-12-17 15:14:25 +00:00
  • 9b63a65886 Support A/B Quantization in Blockscale GEMM (#3343) kensclin 2025-12-17 23:13:47 +08:00
  • 509bcf1b2a Support A/B Quantization in Blockscale GEMM (#3343) kensclin 2025-12-17 23:13:47 +08:00
  • 0500fcc017 Support A/B Quantization in Blockscale GEMM (#3343) kensclin 2025-12-17 23:13:47 +08:00
  • 7c7e6cc670 fix build yadaish 2025-12-17 14:51:34 +00:00
  • 9289b0cac1 fix conflict zanzhang 2025-12-17 20:53:29 +08:00
  • 3dbed1d340 Revert "Adding tuned instace list for groupoed conv fwd (#3288)" revert-3288-streamhpc/grouped-conv-fwd-wmma-tuned-instances Wojciech Laskowski 2025-12-17 12:43:55 +01:00
  • 9e47664092 Move common codes to detail namespace from Problem class scope Qianfeng Zhang 2025-12-17 10:27:20 +00:00
  • 2c44b4e84d Merge branch 'develop' of github.com:ROCm/composable_kernel into dev/a8w4_and_a8w8splitk yadaish 2025-12-17 10:22:39 +00:00
  • 3850e1bf4d Merge branch 'develop' into ck_moe_bs_splitk_pr ck_moe_bs_splitk_pr felix 2025-12-17 18:09:07 +08:00
  • a69b386311 Limit the explicit cast added in threadwise_tensor_slice_transfer_v7r3 to only be used for f8, just in case it hurts performance. kiefer 2025-12-16 14:32:07 +00:00
  • 89daa890d1 Remove useless call of __builtin_amdgcn_s_waitcnt(0xc07f) Qianfeng Zhang 2025-12-17 07:35:19 +00:00
  • 15624fd6ea add hip test ZheWang 2025-12-17 07:01:42 +00:00
  • 17cf9aaec3 Merge commit '292df2719f28cd01464d5d059820684790c101da' into develop assistant-librarian[bot] 2025-12-17 04:21:55 +00:00
  • c3d078376b fix some minor error (#3409) KateJu 2025-12-17 11:50:49 +08:00
  • ea31e6c4b3 fix some minor error (#3409) KateJu 2025-12-17 11:50:49 +08:00
  • 292df2719f fix some minor error (#3409) KateJu 2025-12-17 11:50:49 +08:00
  • 39afffac85 fixing group32 for prefill shapes khuagarw 2025-12-17 03:37:40 +00:00
  • 88538ca6d2 save tmp Max Podkorytov 2025-12-16 21:28:53 -06:00
  • 12433f104b Merge commit '57e1e4a8485835004c36144ba1b39fc3051538a7' into develop assistant-librarian[bot] 2025-12-17 02:46:12 +00:00
  • af1927262c [CK_TILE] Add FP8xF4 Flatmm (#3401) Yi DING 2025-12-17 10:01:48 +08:00
  • fb72fea980 [CK_TILE] Add FP8xF4 Flatmm (#3401) Yi DING 2025-12-17 10:01:48 +08:00
  • 57e1e4a848 [CK_TILE] Add FP8xF4 Flatmm (#3401) Yi DING 2025-12-17 10:01:48 +08:00
  • f2a8d7b713 add fp6->fp32 convert root 2025-12-17 01:24:58 +00:00
  • 05ff943ef6 Merge branch 'develop' into lwpck-4181 khushbu 2025-12-16 18:23:35 -05:00
  • 373d89d381 working prefill shapes khushbu 2025-12-16 18:21:44 -05:00
  • 11f7c3f136 add cmakelists Max Podkorytov 2025-12-16 17:20:48 -06:00
  • df4f1ec464 Merge remote-tracking branch 'origin/develop' into enable_persistent_async Max Podkorytov 2025-12-16 16:39:29 -06:00
  • 095128f1a1 use /var/jenkins/ck as default path for Build/Test stage aick-482 illsilin_amdeng 2025-12-16 11:48:18 -08:00
  • a7ed94f71c [CK_TILE] FMHA Reduce register spilling in fwd with dropout (workaround for CI failures with clang-22) (#3221) (#3372) rocm-7.2.4 rocm-7.2.3 rocm-7.2.2 rocm-7.2.1 rocm-7.2.0 release/rocm-rel-7.2.0.1 release/rocm-rel-7.2 Illia Silin 2025-12-16 10:47:00 -08:00
  • 0b0aa06016 Adding remaining flavors for grouped conv fwd streamhpc/grouped-conv-fwd-extra-flavors Wojciech Laskowski 2025-11-25 11:37:22 +00:00
  • cd533b2f79 Merge commit '3dfa794fab62dca7c0499791d37298a49630d5ee' into develop assistant-librarian[bot] 2025-12-16 17:15:31 +00:00
  • 1cf868026b Add support of loading QK tiles of hdim96 without padding to hdim128 Qianfeng Zhang 2025-12-14 04:20:05 +00:00
  • f35e7b59cc Add build trace diagnostics to CI. (#3432) Illia Silin 2025-12-16 08:22:52 -08:00
  • de71120c7f Add build trace diagnostics to CI. (#3432) Illia Silin 2025-12-16 08:22:52 -08:00
  • 3dfa794fab Add build trace diagnostics to CI. (#3432) amd-master Illia Silin 2025-12-16 08:22:52 -08:00
  • 4df4747532 Adding tuned instace list for groupoed conv fwd (#3288) Wojciech Laskowski 2025-12-16 17:21:22 +01:00
  • 588f573ee1 Change to the Q/K DramTile encoding and renaming in V/VShuffled DramTile Qianfeng Zhang 2025-12-16 14:31:11 +00:00
  • 294e14b6f8 Limit the explicit cast added in threadwise_tensor_slice_transfer_v7r3 to only be used for f8, just in case it hurts performance. kiefer 2025-12-16 14:32:07 +00:00
  • b8a0598a94 Use correct stride in elementwise kernel jograner/hotfix-grouped-gemm-two-stage-2 Graner, Johannes 2025-12-16 08:38:25 -05:00
  • 7bd9f2cdd2 Zan/moe a8w4 (#3441) Zzz9990 2025-12-16 21:08:31 +08:00
  • d9d064d513 Ck moe bs splitk pr (#3440) yadaish 2025-12-16 21:05:45 +08:00
  • 39ad7898fa [CK_TILE] Add pooling tests for ckTileEngine Aleksander Dudek 2025-12-16 11:20:47 +00:00
  • ef33d16a23 [CK_TILE] fix formatting of pooling in ckTileEngine with clang-format part3 Aleksander Dudek 2025-12-16 10:51:57 +00:00
  • 4c8dbd3847 Merge branch 'develop' into ckTileEnginePooling Aleksander Dudek 2025-12-16 10:42:16 +00:00
  • 1bccd37e06 [CK_TILE] fix formatting of pooling in ckTileEngine with clang-format Aleksander Dudek 2025-12-16 10:40:41 +00:00
  • dcd0b5ecc8 Merge branch 'develop' into sparse_attention_VSA Jiangyon 2025-12-16 10:24:15 +00:00
  • 295e899576 Remove commented code snippet in gridwise Kiefer van Teutem 2025-12-16 11:08:51 +01:00
  • f29d9732a6 add fp6 data-type & fp6-weight preshuffle root 2025-12-16 09:58:33 +00:00
  • 213f10f309 fix testcase error yadaish 2025-12-16 09:37:43 +00:00