Commit Graph

  • 616ad45cef Print number of valid instances in profiler and tests. kiefer 2025-12-05 13:37:07 +00:00
  • d201572ae4 Actually print the reason when a device implementation is not supported. kiefer 2025-12-04 09:57:55 +00:00
  • 1a822947eb Fix bug in various bwd wei device implementations / profiler where the occupancy based split_k value could not be found because the Argument did not derive from ArgumentSplitK, leading to incorrect error tolerances. kiefer 2025-12-03 15:54:22 +00:00
  • 4cf3e61954 Grab device and gridwise files from bkp branch, this should enable splitK support for convolution and also we no longer ForceThreadTileTransfer for explicit gemm. Also grab some updates from 7e7243783008b11e904f127ecf1df55ef95e9af2 to fix building on clang20. kiefer 2025-10-24 14:26:31 +00:00
  • 3e27e627bb Always ForceThreadTileTransfer for now, WaveTileTransfer does not work for convolution yet. kiefer 2025-10-23 12:18:59 +00:00
  • 29265aa82f Fix add_test_executable Enrico Degregori 2025-10-09 13:25:49 +00:00
  • 4c09ae57bc Disable splitk for 2stage xdl on rdna (bug to be fixed) Enrico Degregori 2025-10-06 12:32:01 +00:00
  • 9eece6c0c4 Revert "Adapt all grouped conv bwd weight vanilla Xdl instances to 16x16. MRepeat doubled for all but 12 of them (some static assert failure). Also added custom reduced profiler target for building grouped conv bwd weight vanilla only profiler. Verified with gtest test." kiefer 2025-12-09 09:08:53 +00:00
  • 98ddeebdc0 update zanzhang 2025-12-09 16:02:16 +08:00
  • f7a75d6414 update zanzhang 2025-12-09 11:29:13 +08:00
  • 260bbb49fb Merge commit 'c1c2e41a0387e8e76970ad86959e28963f569d54' into develop assistant-librarian[bot] 2025-12-09 03:37:16 +00:00
  • b726f9606c [CK_TILE] Generate random tensor values with multiple threads (#3324) Yi DING 2025-12-09 11:02:33 +08:00
  • 9c7b0388a9 [CK_TILE] Generate random tensor values with multiple threads (#3324) Yi DING 2025-12-09 11:02:33 +08:00
  • c1c2e41a03 [CK_TILE] Generate random tensor values with multiple threads (#3324) Yi DING 2025-12-09 11:02:33 +08:00
  • aee38fdbf3 update yandai/moe_flatmm_async_scale_b16 yadaish 2025-12-09 00:30:52 +00:00
  • c28cf0e96d fine-grained working khuagarw 2025-12-09 00:16:10 +00:00
  • 375e499d10 Merge commit 'c363a98d4154c647c1a2d5331ad0d76879b84dfa' into develop assistant-librarian[bot] 2025-12-08 21:13:22 +00:00
  • b85cf9d37c [CK_TILE] Support more layouts for BQuant GEMM (#3349) Sami Remes 2025-12-08 21:05:56 +00:00
  • 64f4467064 [CK_TILE] Support more layouts for BQuant GEMM (#3349) Sami Remes 2025-12-08 21:05:56 +00:00
  • c363a98d41 [CK_TILE] Support more layouts for BQuant GEMM (#3349) tianwyan/streamk Sami Remes 2025-12-08 21:05:56 +00:00
  • 7e54399be4 [CK Tile] Grouped GEMM aquant mode and non-persistent kernel (#3337) Erwin Terpstra 2025-12-08 21:19:22 +01:00
  • 142ec27ea0 [CK Tile] Grouped GEMM aquant mode and non-persistent kernel (#3337) Erwin Terpstra 2025-12-08 21:19:22 +01:00
  • fe07b5a1bf [CK Tile] Grouped GEMM aquant mode and non-persistent kernel (#3337) Erwin Terpstra 2025-12-08 21:19:22 +01:00
  • 564276eff9 Merge commit 'ca6143f0b2237a1af80ef5550f1b774fd463676d' into develop assistant-librarian[bot] 2025-12-08 17:14:48 +00:00
  • 97b0ae4a51 update zanzhang 2025-12-08 18:17:19 +08:00
  • 0688e667df refactotred int8 tuples upstream/streamhpc/grouped-conv-fwd-wmma-tuned-instances streamhpc/grouped-conv-fwd-wmma-tuned-instances Wojciech Laskowski 2025-12-08 16:50:58 +00:00
  • 8640ffe8eb Further correction with regard to using n0_loops and k1_loops Qianfeng Zhang 2025-12-08 15:00:21 +00:00
  • 576956298c [CK_TILE] Add basic tutorials to separate directory ck-tile-basic-tutorials Amir Ghamarian 2025-12-08 15:47:01 +00:00
  • 9cb42b092a Add a workaround for a compiler issue for bwd on gfx90a and ROCm 7.1.1 (#3369) Anton Gorenko 2025-12-08 20:44:17 +05:00
  • 84e56d1120 Add a workaround for a compiler issue for bwd on gfx90a and ROCm 7.1.1 (#3369) Anton Gorenko 2025-12-08 20:44:17 +05:00
  • ca6143f0b2 Add a workaround for a compiler issue for bwd on gfx90a and ROCm 7.1.1 (#3369) Anton Gorenko 2025-12-08 20:44:17 +05:00
  • 5f9c363ba8 packed fp8 convert Tianxing Wu 2025-12-08 15:20:05 +00:00
  • 641dae10e8 Add kN0Sub to separate the n0_loop and k1_loop tile size for more flexible tuning Qianfeng Zhang 2025-12-08 10:47:04 +00:00
  • f1f46d5f75 Merge commit '878b4e7f46d7e47618f4d860d71b438cb6d992fd' into develop assistant-librarian[bot] 2025-12-08 12:18:59 +00:00
  • e63ba15ae2 [CK_TILE] Optimize Flatmm MXFP4 by Eliminating Runtime Division by 2 (#3287) Yi DING 2025-12-08 19:20:44 +08:00
  • 8b98fe0353 [CK_TILE] Optimize Flatmm MXFP4 by Eliminating Runtime Division by 2 (#3287) Yi DING 2025-12-08 19:20:44 +08:00
  • 878b4e7f46 [CK_TILE] Optimize Flatmm MXFP4 by Eliminating Runtime Division by 2 (#3287) Yi DING 2025-12-08 19:20:44 +08:00
  • e5a3277261 Merge commit '04612c30ceab818cd6c03a3e833a6c6d1a21dafa' into develop assistant-librarian[bot] 2025-12-08 11:12:53 +00:00
  • 24b1a08444 Finish fp16 refactor Wojciech Laskowski 2025-12-08 11:02:24 +00:00
  • c3d2457327 Python script to check the block tilings. vpietila/ckb-block-tiling Ville Pietilä 2025-12-08 05:52:04 -05:00
  • 13c9c8580f [CK_BUILDER] Ck Tile Grouped convolution factory (#3352) Bartłomiej Kocot 2025-12-08 10:32:56 +01:00
  • 75156c492e [CK_BUILDER] Ck Tile Grouped convolution factory (#3352) Bartłomiej Kocot 2025-12-08 10:32:56 +01:00
  • 04612c30ce [CK_BUILDER] Ck Tile Grouped convolution factory (#3352) Bartłomiej Kocot 2025-12-08 10:32:56 +01:00
  • 00b1d9794f Debug Tianxing Wu 2025-12-08 09:11:52 +00:00
  • 65b6d8efcd Ck benchmark Bartlomiej Kocot 2025-12-08 04:07:42 -05:00
  • fc7547a552 ck: add tf32 in DTYPES to control instances build(#3317) yinglu 2025-12-08 16:24:20 +08:00
  • cec66a4b18 ck: add tf32 in DTYPES to control instances build(#3317) yinglu 2025-12-08 16:24:20 +08:00
  • 8fec8054b2 ck: add tf32 in DTYPES to control instances build(#3317) yinglu 2025-12-08 16:24:20 +08:00
  • 6c941ccbbc update yandai/a16w4_old_layout yadaish 2025-12-08 06:13:40 +00:00
  • 98d6a7c551 update yadaish 2025-12-08 06:10:20 +00:00
  • 1e2b1f1584 fix scale offset calc. ck_moe_bs_splitk oscar 2025-12-07 17:46:36 +08:00
  • 8c0152d1d6 update ck moe a8w4 zanzhang 2025-12-08 10:37:12 +08:00
  • 9a56ea579b update yadaish 2025-12-07 15:29:14 +00:00
  • 3a89eb8857 Simplify the codes in block_gemm Qianfeng Zhang 2025-12-06 14:59:49 +00:00
  • 3ea3ca7b36 debugging khuagarw 2025-12-06 08:57:22 +00:00
  • 66f05c1fbf Merge commit '86a84ae61122b8ed2d2e40e45f108a8fa23d3210' into develop assistant-librarian[bot] 2025-12-05 23:13:30 +00:00
  • 771f37e4aa Add the gfx1011 support on CK Tile with the SGPR builtin reading protection (#3350) Thomas Ning 2025-12-05 14:18:30 -08:00
  • 10e48d2f3c Add the gfx1011 support on CK Tile with the SGPR builtin reading protection (#3350) Thomas Ning 2025-12-05 14:18:30 -08:00
  • 86a84ae611 Add the gfx1011 support on CK Tile with the SGPR builtin reading protection (#3350) Thomas Ning 2025-12-05 14:18:30 -08:00
  • b2019db495 Merge commit '6b1bceca7baea62941793e562d6ff58c571d9191' into develop assistant-librarian[bot] 2025-12-05 18:14:37 +00:00
  • 5ddc132a7a Merge pull request #3361 from spolifroni-amd/users/spolifroni-amd/composable-kernel-add-cktile-doc-to-711 spolifroni-amd 2025-12-05 13:04:23 -05:00
  • 5ab9a6cfe4 [CK_Tile] Enable PreshuffleB for 2d block scale Gemm (#3298) Khushbu Agarwal 2025-12-05 09:57:52 -08:00
  • bc49b0e57b [CK_Tile] Enable PreshuffleB for 2d block scale Gemm (#3298) Khushbu Agarwal 2025-12-05 09:57:52 -08:00
  • 6b1bceca7b [CK_Tile] Enable PreshuffleB for 2d block scale Gemm (#3298) Khushbu Agarwal 2025-12-05 09:57:52 -08:00
  • e4b2f98d0d Merge commit '608232ce82636e7c9ab8dec55dc7507c6792fb65' into develop assistant-librarian[bot] 2025-12-05 17:31:42 +00:00
  • 8aa45533d4 Merge branch 'docs/7.1.1' into users/spolifroni-amd/composable-kernel-add-cktile-doc-to-711 spolifroni-amd 2025-12-05 11:41:41 -05:00
  • 12738d2e45 do not build hipblaslt for gfx90a to save time and disc space (#3362) Illia Silin 2025-12-05 08:39:18 -08:00
  • 67d6c4514a do not build hipblaslt for gfx90a to save time and disc space (#3362) Illia Silin 2025-12-05 08:39:18 -08:00
  • 608232ce82 do not build hipblaslt for gfx90a to save time and disc space (#3362) Illia Silin 2025-12-05 08:39:18 -08:00
  • 70a8425dfb Congma/ck tile/aquant mem pipeline (#3346) Cong Ma 2025-12-05 09:35:27 -07:00
  • 8bdc28e607 Congma/ck tile/aquant mem pipeline (#3346) Cong Ma 2025-12-05 09:35:27 -07:00
  • ed080f5a56 Congma/ck tile/aquant mem pipeline (#3346) Cong Ma 2025-12-05 09:35:27 -07:00
  • ddb6e53d2f updated contributing guide spolifroni-amd 2025-09-09 15:24:44 -04:00
  • 6e1455e879 first commit of the glossary (#2702) spolifroni-amd 2025-09-08 13:55:32 -04:00
  • 99a748498a Ignore .cmake-format.yaml (#3356) John Shumway 2025-12-05 08:26:00 -08:00
  • 1cffd4042e Ignore .cmake-format.yaml (#3356) John Shumway 2025-12-05 08:26:00 -08:00
  • 7541d9b5b0 Ignore .cmake-format.yaml (#3356) John Shumway 2025-12-05 08:26:00 -08:00
  • 17e2c816c3 Profile resnet layout fixes (#3360) Bartłomiej Kocot 2025-12-05 17:20:46 +01:00
  • b411358e21 Profile resnet layout fixes (#3360) Bartłomiej Kocot 2025-12-05 17:20:46 +01:00
  • 82f796a1f0 Profile resnet layout fixes (#3360) Bartłomiej Kocot 2025-12-05 17:20:46 +01:00
  • e4f7f176c8 Merge commit 'f5b0af22722b130f03cac590ca9b8729b1b84991' into develop assistant-librarian[bot] 2025-12-05 16:14:41 +00:00
  • ae29dd5d7a [composable_kernel] initial draft of the ck tile conceptual doc (#3242) spolifroni-amd 2025-12-04 14:09:21 -05:00
  • a157e33311 Simplify includes for CK builder reflection (#3357) John Shumway 2025-12-05 07:44:10 -08:00
  • a94db7fc98 Simplify includes for CK builder reflection (#3357) John Shumway 2025-12-05 07:44:10 -08:00
  • f5b0af2272 Simplify includes for CK builder reflection (#3357) John Shumway 2025-12-05 07:44:10 -08:00
  • 157d2c87db Add new section to changelog (#3295) Bartłomiej Kocot 2025-12-05 16:14:52 +01:00
  • 4beaf7709d Add new section to changelog (#3295) Bartłomiej Kocot 2025-12-05 16:14:52 +01:00
  • 35fc7c9e4f Add new section to changelog (#3295) Bartłomiej Kocot 2025-12-05 16:14:52 +01:00
  • 4bb53408cb Improve testing transfer parameters. Ville Pietilä 2025-12-05 14:55:29 +00:00
  • 6ed727a5a0 Removed int8 merged groups Wojciech Laskowski 2025-12-05 12:09:04 +00:00
  • 787e25685e Remove fp16 merged groups Wojciech Laskowski 2025-12-05 11:47:43 +00:00
  • 4cec09547f fix Juuso Korhonen 2025-12-05 11:16:08 +00:00
  • 2b32dd75ee Add Saved statistics and running statistics to example to verifykernel calculations Mohsen Saffari 2025-12-05 10:54:54 +00:00
  • 0501d37efe fix Juuso Korhonen 2025-12-05 10:52:15 +00:00
  • 971ed7da51 update yadaish 2025-12-05 10:12:13 +00:00
  • c3d40a4a7c Removed merged groups Wojciech Laskowski 2025-12-05 09:29:03 +00:00
  • 5d81343b18 added back generic instance list Wojciech Laskowski 2025-12-05 09:13:45 +00:00
  • 86c35117b5 Merge commit 'f7650ee82b306a05d9c3c44d3feefdd570a4bd58' into develop assistant-librarian[bot] 2025-12-05 09:13:29 +00:00
  • f01e964e22 change the gemm name Juuso Korhonen 2025-12-05 09:09:30 +00:00
  • 774f8fde01 removed unnecessary source file Wojciech Laskowski 2025-12-05 09:08:02 +00:00