Commit Graph

  • 8ac8d391f3 remove test code for now kyle-256 2025-12-16 09:11:15 +00:00
  • 6e821f198f update examples kyle-256 2025-12-16 08:39:01 +00:00
  • 1da833cb93 update kernel kyle-256 2025-12-16 07:20:58 +00:00
  • 39ebfdcfe9 update config kyle-256 2025-12-12 04:56:45 +00:00
  • 614f478a65 update grouped_gemm blockwise kernel kyle-256 2025-12-11 07:53:47 +00:00
  • 7a55f53fcf Merge commit '44f1b5c5de8c85cbae1520fa054405d96df67304' into develop assistant-librarian[bot] 2025-12-22 01:42:28 +00:00
  • 035f6acf3f Fix jenkinsfile for large tensor conv test (#3478) Bartłomiej Kocot 2025-12-22 02:39:30 +01:00
  • 2228960cc4 Fix jenkinsfile for large tensor conv test (#3478) Bartłomiej Kocot 2025-12-22 02:39:30 +01:00
  • 44f1b5c5de Fix jenkinsfile for large tensor conv test (#3478) Bartłomiej Kocot 2025-12-22 02:39:30 +01:00
  • 02cae85af5 Load Q directly from global memory to registers for BlockGemm Qianfeng Zhang 2025-12-20 13:35:45 +00:00
  • 08f7c18f54 remove condition on CK_TILE_HOST_DEVICE_EXTERN atom_test illsilin_amdeng 2025-12-19 18:24:31 -08:00
  • 2f06ac1562 Revert "get LLVM_MAIN_REVISION macro from compiler header (#3469)" (#3476) Illia Silin 2025-12-19 18:12:46 -08:00
  • 5be6381bcb Merge commit '9bd67c2cf2fe8e4479a433bcd6d467e2ea9aedb4' into develop assistant-librarian[bot] 2025-12-20 01:40:48 +00:00
  • 522dda2614 [CK-TILE] Guard against compiler lexer diagnostic (#3444) Jan Patrick Lehr 2025-12-20 02:32:20 +01:00
  • 500d143fa8 [CK-TILE] Guard against compiler lexer diagnostic (#3444) Jan Patrick Lehr 2025-12-20 02:32:20 +01:00
  • 9bd67c2cf2 [CK-TILE] Guard against compiler lexer diagnostic (#3444) Jan Patrick Lehr 2025-12-20 02:32:20 +01:00
  • 09019c1024 Merge commit 'cbc83359649b1b56cd745c4102e9556112f942c2' into develop assistant-librarian[bot] 2025-12-19 23:13:41 +00:00
  • fb7ec3a7aa Improve XDL to WMMA porting for grouped conv fwd (#3456) Bartłomiej Kocot 2025-12-19 23:58:51 +01:00
  • 38ff45abf7 Improve XDL to WMMA porting for grouped conv fwd (#3456) Bartłomiej Kocot 2025-12-19 23:58:51 +01:00
  • cbc8335964 Improve XDL to WMMA porting for grouped conv fwd (#3456) fix/guard-against-compiler-warning Bartłomiej Kocot 2025-12-19 23:58:51 +01:00
  • f6dc1a596b get LLVM_MAIN_REVISION macro from compiler header (#3469) Illia Silin 2025-12-19 14:57:12 -08:00
  • 34d26c63a0 get LLVM_MAIN_REVISION macro from compiler header (#3469) Illia Silin 2025-12-19 14:57:12 -08:00
  • 2d9c962e2c get LLVM_MAIN_REVISION macro from compiler header (#3469) Illia Silin 2025-12-19 14:57:12 -08:00
  • 01390d25a4 Revert "details from org var (#3431)" (#3473) Geo Min 2025-12-19 14:10:58 -08:00
  • f9b62a0e99 Revert "details from org var (#3431)" (#3473) Geo Min 2025-12-19 14:10:58 -08:00
  • f67a20b0be Revert "details from org var (#3431)" (#3473) Geo Min 2025-12-19 14:10:58 -08:00
  • fd5f018a05 Use std::boolalpha for boolean rendering instead of ternary operator copilot/sub-pr-3466 copilot-swe-agent[bot] 2025-12-19 20:51:24 +00:00
  • a7b3913cff Initial plan copilot-swe-agent[bot] 2025-12-19 20:48:14 +00:00
  • 10fb184812 WIP: fixing loading logic Sami Remes 2025-12-19 12:38:32 -05:00
  • 86cc59e754 fix settings for example, fix some things in pipeline Sami Remes 2025-12-19 12:35:03 -05:00
  • ac5610980f Merge commit 'e22622f0ec185bf9e717523c8734acfb13dad0a5' into develop assistant-librarian[bot] 2025-12-19 16:14:44 +00:00
  • a571bf9e3a [TILE ENGINE] Restructure to Base class of GEMM (#3434) Thrupti Raj Lakshmana Gowda 2025-12-19 09:53:56 -06:00
  • 2dacac9561 [TILE ENGINE] Restructure to Base class of GEMM (#3434) Thrupti Raj Lakshmana Gowda 2025-12-19 09:53:56 -06:00
  • e22622f0ec [TILE ENGINE] Restructure to Base class of GEMM (#3434) Thrupti Raj Lakshmana Gowda 2025-12-19 09:53:56 -06:00
  • 1df8077528 Add missing pieces to bwd weight factory. Ville Pietilä 2025-12-19 10:38:27 -05:00
  • 44900da55a Merge commit '0fd2b2f0459b10570788b74bf1a794095a18fc96' into develop assistant-librarian[bot] 2025-12-19 15:13:27 +00:00
  • a6821f428f Adding support for scale and bilinear ops for WMMA grouped conv fwd (#3450) Wojciech Laskowski 2025-12-19 15:15:02 +01:00
  • b65bd9f353 Adding support for scale and bilinear ops for WMMA grouped conv fwd (#3450) Wojciech Laskowski 2025-12-19 15:15:02 +01:00
  • 0fd2b2f045 Adding support for scale and bilinear ops for WMMA grouped conv fwd (#3450) Wojciech Laskowski 2025-12-19 15:15:02 +01:00
  • 5a1c9c9a22 Conv builder test refactoring. Ville Pietilä 2025-12-19 09:14:44 -05:00
  • 2460cf4579 Initial conv bwd weight factory. Ville Pietilä 2025-12-19 07:59:37 -05:00
  • a00ef8184b Only disable on HasMainKBlockLoop mismatch Graner, Johannes 2025-12-19 07:22:57 -05:00
  • adbfcad03b Re-enable two stage kernel Graner, Johannes 2025-12-18 05:07:59 -05:00
  • ee9ba8cb56 [WIP] Partial attempt at implementing RunGemm using RunGemmDesc meskelin/add_desc_tensorviews_universal Matti Eskelinen 2025-12-18 13:31:29 +00:00
  • 3d90b5f90e Remove un-used including from default policy file Qianfeng Zhang 2025-12-19 09:58:37 +00:00
  • b828d35d5b Merge remote-tracking branch 'origin/develop' into vpietila/ckb-bwd-weight-factories Ville Pietilä 2025-12-19 04:31:03 -05:00
  • f92aeb9c8a Merge commit '323e01479940237ea24a078b8616fcf93a6b112e' into develop assistant-librarian[bot] 2025-12-19 09:14:51 +00:00
  • c1ba0e08d7 [CK Grouped Gemm] Fix workspace stride in two stage kernel (#3412) Johannes Graner 2025-12-19 10:04:48 +01:00
  • 80117e7ecc [CK Grouped Gemm] Fix workspace stride in two stage kernel (#3412) Johannes Graner 2025-12-19 10:04:48 +01:00
  • 323e014799 [CK Grouped Gemm] Fix workspace stride in two stage kernel (#3412) Johannes Graner 2025-12-19 10:04:48 +01:00
  • 3bada52484 Merge branch 'develop' into sparse_attention_VSA jiangyon.ren 2025-12-19 16:27:16 +08:00
  • 59902860ea remove lse & dropout & add fmt Jiangyon 2025-12-19 08:26:17 +00:00
  • 494dd1813f Merge commit 'b188a2a89682f124d5adbe4469226f3a680eec1c' into develop assistant-librarian[bot] 2025-12-19 06:16:38 +00:00
  • 77179c529e Minor CHANGELOG.md correction (#3451) John Afaganis 2025-12-18 23:02:42 -07:00
  • 6c801269fb Minor CHANGELOG.md correction (#3451) John Afaganis 2025-12-18 23:02:42 -07:00
  • b188a2a896 Minor CHANGELOG.md correction (#3451) John Afaganis 2025-12-18 23:02:42 -07:00
  • 0779609ddc Merge commit '7795e73b47a34a25b48a14f3e4e0e6d681fcbde5' into develop assistant-librarian[bot] 2025-12-19 05:14:28 +00:00
  • 12e0f0b1ba Added large tensor support for grouped conv fwd wmma (#3437) Wojciech Laskowski 2025-12-19 05:55:50 +01:00
  • 896f918828 Added large tensor support for grouped conv fwd wmma (#3437) Wojciech Laskowski 2025-12-19 05:55:50 +01:00
  • 7795e73b47 Added large tensor support for grouped conv fwd wmma (#3437) Wojciech Laskowski 2025-12-19 05:55:50 +01:00
  • 5ba033d484 Merge branch 'develop' into tlakshma_tileengine_enable_arch Thrupti Raj Lakshmana Gowda 2025-12-18 21:48:24 -06:00
  • 9073821d40 Merge commit '9a6e61de9787be2e7ed4a9566cb59a420c5d3f78' into develop assistant-librarian[bot] 2025-12-19 03:41:23 +00:00
  • eb1ad08f53 [CK_BUILDER] Add noreturn to consteval void functions (#3461) John Shumway 2025-12-18 19:07:30 -08:00
  • 588792f553 [CK_BUILDER] Add noreturn to consteval void functions (#3461) John Shumway 2025-12-18 19:07:30 -08:00
  • 9a6e61de97 [CK_BUILDER] Add noreturn to consteval void functions (#3461) John Shumway 2025-12-18 19:07:30 -08:00
  • c5b29a6cbe Merge commit '2220cbaba75892de5780f8f556554ee92ba19e29' into develop assistant-librarian[bot] 2025-12-19 02:47:04 +00:00
  • 1fc7bdd402 [CK_TILE] MX Flatmm Use Byte Pointer Arithmetic for A Tensor (#3446) Yi DING 2025-12-19 10:28:13 +08:00
  • fc71fcd9ad [CK_TILE] MX Flatmm Use Byte Pointer Arithmetic for A Tensor (#3446) Yi DING 2025-12-19 10:28:13 +08:00
  • 2220cbaba7 [CK_TILE] MX Flatmm Use Byte Pointer Arithmetic for A Tensor (#3446) Yi DING 2025-12-19 10:28:13 +08:00
  • 82fcd3c0bd disabling 128 on non preshufflequant khuagarw 2025-12-19 01:49:22 +00:00
  • 75ed7d9d77 Merge commit 'c0ee71d73527cd8206038b86b6eeb4fcf955154e' into develop assistant-librarian[bot] 2025-12-19 01:41:58 +00:00
  • f1b3ca26b3 Dev/a8w4 and a8w8splitk (#3447) yadaish 2025-12-19 09:26:52 +08:00
  • e76ee195df Dev/a8w4 and a8w8splitk (#3447) yadaish 2025-12-19 09:26:52 +08:00
  • c0ee71d735 Dev/a8w4 and a8w8splitk (#3447) yadaish 2025-12-19 09:26:52 +08:00
  • 9c8de3ca24 ck:tf32:complement CK_ENABLE_TF32 controls (#3426) yinglu 2025-12-19 09:17:29 +08:00
  • 4693c2c2f1 ck:tf32:complement CK_ENABLE_TF32 controls (#3426) yinglu 2025-12-19 09:17:29 +08:00
  • ba897f8435 ck:tf32:complement CK_ENABLE_TF32 controls (#3426) yinglu 2025-12-19 09:17:29 +08:00
  • 5b4a67ec6d fix group 128 for both decode and prefill shapes khuagarw 2025-12-19 01:05:01 +00:00
  • 8d6ae40199 gfx950 support for Tile Engine [rcr Layout only] root 2025-12-18 21:37:11 +00:00
  • 87b8b502e6 Merge commit 'e77a7ca2bc65651b5e87a0127e0335733aca2f35' into develop assistant-librarian[bot] 2025-12-18 21:13:07 +00:00
  • 4b5c3e24ef Supporting Custom Build Trace File Names (#3443) andrew clark 2025-12-18 13:15:33 -07:00
  • cc8e250c35 Supporting Custom Build Trace File Names (#3443) andrew clark 2025-12-18 13:15:33 -07:00
  • e77a7ca2bc Supporting Custom Build Trace File Names (#3443) andrew clark 2025-12-18 13:15:33 -07:00
  • 7215827002 Grouped convolution forward device implementation and base flavors for RDNA3/4 (#2964) Kiefer van Teutem 2025-12-18 21:12:15 +01:00
  • bfe6f37d71 Grouped convolution forward device implementation and base flavors for RDNA3/4 (#2964) Kiefer van Teutem 2025-12-18 21:12:15 +01:00
  • 2ea710e88b Grouped convolution forward device implementation and base flavors for RDNA3/4 (#2964) Kiefer van Teutem 2025-12-18 21:12:15 +01:00
  • 5f3a8e45a0 fix dev/a8w4_and_a8w8splitk_yadai yadaish 2025-12-18 19:21:31 +00:00
  • 6a4951cf8c add mx gemm example Sami Remes 2025-12-18 12:34:38 -05:00
  • 0faed29885 refactor the mx pipeline, backup the modified flatmm pipeline Sami Remes 2025-12-18 12:34:08 -05:00
  • 2d518fbb15 fix accruacy issue yadaish 2025-12-18 17:32:42 +00:00
  • 26cdb3e65f Remove custom RunGemm implementation Matti Eskelinen 2025-12-17 13:38:48 +00:00
  • ccf4558f9a Use RunGemmDesc instead of custom RunGemm in BatchedContractionKernel Matti Eskelinen 2025-12-17 13:36:35 +00:00
  • 96820bf5a8 Implement RunGemmDesc that allows directly passing descriptors Matti Eskelinen 2025-12-16 12:12:52 +00:00
  • 171f38365b Merge branch 'dev/a8w4_and_a8w8splitk' of github.com:ROCm/composable_kernel into dev/a8w4_and_a8w8splitk yadaish 2025-12-18 12:35:58 +00:00
  • 92a47c21f8 fix the problem yadaish 2025-12-18 12:35:23 +00:00
  • b6540cb96a List bwd instances. vpietila/ckb-bwd-instances Ville Pietilä 2025-12-18 06:31:22 -05:00
  • cef729b554 Merge commit '700b2ec9c02da8d367ebe8a223a6dbf16622db09' into develop assistant-librarian[bot] 2025-12-18 10:15:48 +00:00
  • f7955d9402 Add placeholder test. Ville Pietilä 2025-12-18 04:36:02 -05:00
  • 80eaeacea5 Update AMD buffer coherency (#3403) Bartłomiej Kocot 2025-12-18 10:16:22 +01:00
  • 407bdf7eb0 Update AMD buffer coherency (#3403) Bartłomiej Kocot 2025-12-18 10:16:22 +01:00