Commit Graph

  • b0645c93bf Debugging layout fix amd-khushbu 2025-12-03 00:35:53 +00:00
  • 96143e6974 Merge commit '6cb0bc2d11a97a928dd156533d97f59f52f41d5f' into develop assistant-librarian[bot] 2025-12-02 23:12:06 +00:00
  • 5cb0da15ef feat(block_scale_gemm): Support RRR-R, CRR-R and CCR-C layout for aquant quant mode (#3193) Aviral Goel 2025-12-03 02:59:07 +04:00
  • c93bb1714d feat(block_scale_gemm): Support RRR-R, CRR-R and CCR-C layout for aquant quant mode (#3193) Aviral Goel 2025-12-03 02:59:07 +04:00
  • 6cb0bc2d11 feat(block_scale_gemm): Support RRR-R, CRR-R and CCR-C layout for aquant quant mode (#3193) Aviral Goel 2025-12-03 02:59:07 +04:00
  • 43ff4cfd3f Merge commit '2c284a1780acb790f7c52fb94c99694fa4e3f1fe' into develop assistant-librarian[bot] 2025-12-02 20:14:38 +00:00
  • 97fe55be4b Disable gemm_blockscale_f8 on gfx90a by default. (#3338) Illia Silin 2025-12-02 11:33:33 -08:00
  • 2929833ec7 Disable gemm_blockscale_f8 on gfx90a by default. (#3338) Illia Silin 2025-12-02 11:33:33 -08:00
  • 2c284a1780 Disable gemm_blockscale_f8 on gfx90a by default. (#3338) Illia Silin 2025-12-02 11:33:33 -08:00
  • ebfafa78fe Merge commit '280bc4219151c3f79fe8ca076a2d10df4ff88b34' into develop assistant-librarian[bot] 2025-12-02 16:14:43 +00:00
  • c4199307ec Change from NCHW to MHWC based on old-ck and manage verifying for c > 1 Mohsen Saffari 2025-12-02 15:47:46 +00:00
  • fac57abc38 [CK_BUILDER] Refactor builder factory code. (#3276) John Shumway 2025-12-02 07:40:14 -08:00
  • 750a87e7fc [CK_BUILDER] Refactor builder factory code. (#3276) John Shumway 2025-12-02 07:40:14 -08:00
  • 280bc42191 [CK_BUILDER] Refactor builder factory code. (#3276) John Shumway 2025-12-02 07:40:14 -08:00
  • 5b559e7409 disable the gfx90a (#3336) Thomas Ning 2025-12-02 07:27:37 -08:00
  • cdffa8cc83 disable the gfx90a (#3336) Thomas Ning 2025-12-02 07:27:37 -08:00
  • 8459d389ad disable the gfx90a (#3336) Thomas Ning 2025-12-02 07:27:37 -08:00
  • 33f4ab4e59 Added tuned instances for scaleadd_ab Wojciech Laskowski 2025-12-02 15:17:03 +00:00
  • 4fd168ea67 Re-factor of BF16 instance tuples Wojciech Laskowski 2025-12-02 14:57:22 +00:00
  • aef67fef38 Merge commit '66832861ad78cc63584c32e5d231fd29a99c57b3' into develop assistant-librarian[bot] 2025-12-02 14:14:02 +00:00
  • f1c40b8fba [CK_TILE] Merge multiple fwd convolution groups into a single GEMM batch. (#3136) Ville Pietilä 2025-12-02 15:23:32 +02:00
  • 8bf6f9cac8 [CK_TILE] Merge multiple fwd convolution groups into a single GEMM batch. (#3136) Ville Pietilä 2025-12-02 15:23:32 +02:00
  • 66832861ad [CK_TILE] Merge multiple fwd convolution groups into a single GEMM batch. (#3136) Ville Pietilä 2025-12-02 15:23:32 +02:00
  • 9c9a022007 Merge commit '2d3020e5b03109a56fc2498a721134e5c34ab10f' into develop assistant-librarian[bot] 2025-12-02 13:23:30 +00:00
  • e33a88a624 [CK Tile] batched contraction kernel generalizing (#3126) msaffari-amd 2025-12-02 13:30:27 +01:00
  • b06b6e684c [CK Tile] batched contraction kernel generalizing (#3126) msaffari-amd 2025-12-02 13:30:27 +01:00
  • 2d3020e5b0 [CK Tile] batched contraction kernel generalizing (#3126) msaffari-amd 2025-12-02 13:30:27 +01:00
  • ffcda6b5a3 [CK_BUILDER] Install CK builder headers, added missing include (#3334) DarylHawkinsAMD 2025-12-02 05:28:46 -07:00
  • 3ba598e05d [CK_BUILDER] Install CK builder headers, added missing include (#3334) DarylHawkinsAMD 2025-12-02 05:28:46 -07:00
  • d3f37ebf6c [CK_BUILDER] Install CK builder headers, added missing include (#3334) DarylHawkinsAMD 2025-12-02 05:28:46 -07:00
  • be7c12a132 Merge commit '5d67d82a0bb6dbf5f82f3b4ba2e9188eb838b927' into develop assistant-librarian[bot] 2025-12-02 11:12:35 +00:00
  • 07a0dcd688 Merge branch 'develop' into tianxing/unified-attention Tianxing Wu 2025-12-02 10:58:31 +00:00
  • 6785f99c18 [CK_TILE] Fix for comp pipeline v4 (#3307) jakpiase 2025-12-02 11:38:06 +01:00
  • 9632da4f80 [CK_TILE] Fix for comp pipeline v4 (#3307) jakpiase 2025-12-02 11:38:06 +01:00
  • 5d67d82a0b [CK_TILE] Fix for comp pipeline v4 (#3307) jakpiase 2025-12-02 11:38:06 +01:00
  • 23ce6028ab [CK_TILE] Add indexing optimizations for conv bwd data (#3309) jakpiase 2025-12-02 11:37:26 +01:00
  • a587054099 [CK_TILE] Add indexing optimizations for conv bwd data (#3309) jakpiase 2025-12-02 11:37:26 +01:00
  • 59265d5eb2 [CK_TILE] Add indexing optimizations for conv bwd data (#3309) jakpiase 2025-12-02 11:37:26 +01:00
  • ae8f3a3b19 Merge commit 'f211156ce6e9a8411c9ab8c3647147b6a9cf78d8' into develop assistant-librarian[bot] 2025-12-02 07:14:17 +00:00
  • cfb8ae528f [CK_Tile] Flatmm MX Cleanup & Explicite Offset Calculation (#3286) Yi DING 2025-12-02 14:21:12 +08:00
  • 07158d16ad [CK_Tile] Flatmm MX Cleanup & Explicite Offset Calculation (#3286) Yi DING 2025-12-02 14:21:12 +08:00
  • f211156ce6 [CK_Tile] Flatmm MX Cleanup & Explicite Offset Calculation (#3286) Yi DING 2025-12-02 14:21:12 +08:00
  • c3e510c1e1 draft so/f4moe solin 2025-12-02 02:05:01 +00:00
  • 94dda8df22 Merge commit '46f1d740f03d11bc2a78fce60a95cd0933b9dd4d' into develop assistant-librarian[bot] 2025-12-02 00:36:50 +00:00
  • 90bebdb065 Add grouped gemm instances for RDNA4 (#3237) Erwin Terpstra 2025-12-02 00:32:10 +01:00
  • 328a733e0e Add grouped gemm instances for RDNA4 (#3237) Erwin Terpstra 2025-12-02 00:32:10 +01:00
  • 46f1d740f0 Add grouped gemm instances for RDNA4 (#3237) Erwin Terpstra 2025-12-02 00:32:10 +01:00
  • 1b8a648333 Merge commit '23fb253c4e5ed6ef1a9b69feda5e037d08325bc6' into develop assistant-librarian[bot] 2025-12-01 23:13:36 +00:00
  • fef4a437af Make CK TILE GEMM Aquant support block tile 128x128x128 (#3325) Cong Ma 2025-12-01 16:04:37 -07:00
  • a6ec08a1d2 Make CK TILE GEMM Aquant support block tile 128x128x128 (#3325) Cong Ma 2025-12-01 16:04:37 -07:00
  • 23fb253c4e Make CK TILE GEMM Aquant support block tile 128x128x128 (#3325) Cong Ma 2025-12-01 16:04:37 -07:00
  • 45c3d34009 Merge commit '7873f8fa13ce42d7ef570f7ae99f76f68f463109' into develop assistant-librarian[bot] 2025-12-01 21:12:49 +00:00
  • fa7b8600fe [CK_BUILDER] Update the testing documentation (#3312) John Shumway 2025-12-01 13:05:32 -08:00
  • fc586d2de6 [CK_BUILDER] Update the testing documentation (#3312) John Shumway 2025-12-01 13:05:32 -08:00
  • 7873f8fa13 [CK_BUILDER] Update the testing documentation (#3312) John Shumway 2025-12-01 13:05:32 -08:00
  • 8c96970cda [CK_BUILDER] Fix cosmetic problem with conv_description (#3333) John Shumway 2025-12-01 12:45:04 -08:00
  • 6f2f67b0b6 [CK_BUILDER] Fix cosmetic problem with conv_description (#3333) John Shumway 2025-12-01 12:45:04 -08:00
  • d17994f3df [CK_BUILDER] Fix cosmetic problem with conv_description (#3333) John Shumway 2025-12-01 12:45:04 -08:00
  • 08bd4decf3 Address reviewer comments. John Shumway 2025-12-01 12:12:21 -05:00
  • d4849708a7 Update README.md and comments for dispatch refactor John Shumway 2025-11-30 20:21:37 -05:00
  • 8b76da104c Change paramters in test_conv_discription to fix gfx950 John Shumway 2025-11-24 05:32:43 +00:00
  • 2ca5be5399 Add README.md file for the factory subdirectory John Shumway 2025-11-23 23:21:02 +00:00
  • cac7a8022a Add unit tests for factory helpers John Shumway 2025-11-23 19:52:45 +00:00
  • 96eb0ef193 Clean up convolution dispatcher John Shumway 2025-11-22 17:18:49 +00:00
  • 858322f216 Update builder factory namespaces John Shumway 2025-11-22 16:24:57 +00:00
  • 67004105d4 Convert to dispatching through a function John Shumway 2025-11-22 00:27:27 +00:00
  • 304856c233 Split conv_factory.hpp into separate files John Shumway 2025-11-21 23:02:26 +00:00
  • b705b73c00 Update experimental/builder/README.md John Shumway 2025-12-01 09:07:55 -08:00
  • 649974949f Merge branch 'develop' into moe_xcd_remap Illia Silin 2025-12-01 07:33:30 -08:00
  • 7234b2fc1a Simplifying the codes with regard to k_lds_wite_windows and k_lds_read_windows in the pipelines Qianfeng Zhang 2025-12-01 14:34:53 +00:00
  • bee8b4766e [CKBuilder] Update the testing documentation John Shumway 2025-11-27 04:41:20 +00:00
  • bf6447cd54 support NRepeat=1 in A16W4_MoE_gemm2 to improve performance in the small tokens case opt-a16w4_moe_gemm2 Feng Shijie 2025-12-01 12:24:18 +00:00
  • 6ecbd7c831 Merge branch 'develop' into tianxing/unified-attention Tianxing Wu 2025-12-01 11:03:33 +00:00
  • f38ca54019 Merge commit 'abd6a4b3fc535772ecff047b02f2af666987f859' into develop assistant-librarian[bot] 2025-12-01 09:16:45 +00:00
  • 4f2900a966 Cleanup convolution description (#3329) John Shumway 2025-12-01 01:03:58 -08:00
  • f645d827c8 Cleanup convolution description (#3329) John Shumway 2025-12-01 01:03:58 -08:00
  • abd6a4b3fc Cleanup convolution description (#3329) John Shumway 2025-12-01 01:03:58 -08:00
  • 572df7d4d1 Merge commit '9ed9539ddfcdd8de4180fb992b718b57e1cadfae' into develop assistant-librarian[bot] 2025-12-01 07:15:08 +00:00
  • b36c7d76a0 update yadaish 2025-12-01 06:59:48 +00:00
  • 43da4ac445 [CK_TILE] Disable cast_tile_pk_fp16bf16_fp32 as It Causes Extra spills on Recent Compilers (#3327) Yi DING 2025-12-01 14:48:22 +08:00
  • 2688602697 [CK_TILE] Disable cast_tile_pk_fp16bf16_fp32 as It Causes Extra spills on Recent Compilers (#3327) Yi DING 2025-12-01 14:48:22 +08:00
  • 9ed9539ddf [CK_TILE] Disable cast_tile_pk_fp16bf16_fp32 as It Causes Extra spills on Recent Compilers (#3327) Yi DING 2025-12-01 14:48:22 +08:00
  • 0dff04aa27 Merge commit 'ba6af9fe7c6689075b46052cc40b7f94d96f647f' into develop assistant-librarian[bot] 2025-12-01 06:17:27 +00:00
  • 4fb6b9c561 [CK_TILE] Add unit test for fp4 warp gemm (#2817) Gino Lu 2025-12-01 13:56:48 +08:00
  • 0551d4412e [CK_TILE] Add unit test for fp4 warp gemm (#2817) Gino Lu 2025-12-01 13:56:48 +08:00
  • ba6af9fe7c [CK_TILE] Add unit test for fp4 warp gemm (#2817) Gino Lu 2025-12-01 13:56:48 +08:00
  • b2b6fa1aa9 update yadaish 2025-12-01 05:44:21 +00:00
  • 2182364ebb scale bf16 yadaish 2025-12-01 05:30:02 +00:00
  • c1817464be Tiny fix in GetQKBlockGemm Qianfeng Zhang 2025-11-30 14:04:48 +00:00
  • f01e0ef37d Enable the using of WarpTile-32x32x16 and add scripts to verify Qianfeng Zhang 2025-11-29 16:18:31 +00:00
  • 2d7a35de3e update yandai/moe_flatmm_async yadaish 2025-11-29 16:32:33 +00:00
  • 6e9f0e9673 Merge pull request #3323 from ROCm/swdev-569320 Jessey Harrymanoharan 2025-11-28 18:20:59 -05:00
  • fc03ef0142 Merge pull request #3291 from ROCm/lwpck-4163 Illia Silin 2025-11-28 15:20:33 -08:00
  • 4f8c179bfd Merge commit '004784ef98beffb24a03d106b143ee9f8e03e826' into develop assistant-librarian[bot] 2025-11-28 22:12:10 +00:00
  • bb41ea37e1 chore(copyright) update library wide CMakeLists.txt copyright header template (#3313) Aviral Goel 2025-11-29 01:49:54 +04:00
  • 0861395425 chore(copyright) update library wide CMakeLists.txt copyright header template (#3313) Aviral Goel 2025-11-29 01:49:54 +04:00
  • 004784ef98 chore(copyright) update library wide CMakeLists.txt copyright header template (#3313) Aviral Goel 2025-11-29 01:49:54 +04:00
  • 3194b653f7 add tiling, pipeline, follow layernorm2d Mohsen Saffari 2025-11-28 18:29:10 +00:00
  • 26626a6839 update yadaish 2025-11-28 16:28:15 +00:00
  • d99493606e Add static_assert and comments in the with_softmax pipelines Qianfeng Zhang 2025-11-28 14:49:33 +00:00