Commit Graph

  • 8ba592d65c fix the copyright and pipeline Jiangyon 2025-12-16 09:30:12 +00:00
  • 776664aa7b fix the copyright & rename block to pipeline Jiangyon 2025-12-16 09:18:27 +00:00
  • 7d897cea19 remove test code for now kyle-256 2025-12-16 09:11:15 +00:00
  • 12420cd6f7 fix the copyright Jiangyon 2025-12-16 08:59:26 +00:00
  • e7170df9c3 Merge branch 'develop' into streamhpc/grouped-conv-fwd-wmma Kiefer van Teutem 2025-12-16 09:47:31 +01:00
  • e82236292b update examples kyle-256 2025-12-16 08:39:01 +00:00
  • 8fafd6db2f update kernel kyle-256 2025-12-16 07:20:58 +00:00
  • 2198f8b583 update config kyle-256 2025-12-12 04:56:45 +00:00
  • 09af58d18d update grouped_gemm blockwise kernel kyle-256 2025-12-11 07:53:47 +00:00
  • a86fc8076c fix the copyrights Jiangyon 2025-12-16 08:38:34 +00:00
  • 55d9a8e576 fix the pre-commit Jiangyon 2025-12-16 08:28:07 +00:00
  • faff9abaac fix the pre-commit Jiangyon 2025-12-16 08:22:20 +00:00
  • d2278ab252 split kernel code to block & kernel Jiangyon 2025-12-16 08:09:06 +00:00
  • 5e8a010fc6 remove lse arg Jiangyon 2025-12-16 07:40:45 +00:00
  • 29d96a90f0 add jenga support bf16 Jiangyon 2025-12-16 06:57:42 +00:00
  • 19c9ecf732 Merge branch 'develop' into jograner/hotfix-grouped-gemm-two-stage Johannes Graner 2025-12-16 07:42:42 +01:00
  • 997ec8f89c add bf16 for vsa Jiangyon 2025-12-16 06:18:13 +00:00
  • d37c19e367 Merge branch 'develop' of github.com:ROCm/composable_kernel into ck_moe_bs_splitk_pr yadaish 2025-12-16 04:40:20 +00:00
  • 3ed4d5a6dc rebase file KenSCLin 2025-12-16 03:00:08 +00:00
  • 6b6f8b612d fix enumeration values not handled in switch KenSCLin 2025-12-16 00:55:00 +00:00
  • ad434f0976 Merge commit '1e6bbed1fb77d790f2b5ec4ef8a6617e99c8f145' into develop assistant-librarian[bot] 2025-12-16 00:38:54 +00:00
  • 29ed00bbd1 [CK_BUILDER] CK Tile header installation for builder, algorithm concept improvements (#3419) DarylHawkinsAMD 2025-12-15 16:24:36 -07:00
  • 035c9afa8d [CK_BUILDER] CK Tile header installation for builder, algorithm concept improvements (#3419) DarylHawkinsAMD 2025-12-15 16:24:36 -07:00
  • 1e6bbed1fb [CK_BUILDER] CK Tile header installation for builder, algorithm concept improvements (#3419) DarylHawkinsAMD 2025-12-15 16:24:36 -07:00
  • 5a3e7de060 Merge branch 'develop' into lwpck-4181 khushbu 2025-12-15 15:28:16 -05:00
  • 3c8ce1482b Merge commit '2544e394cff83d5992e265f9a29b640a7c74e90d' into develop assistant-librarian[bot] 2025-12-15 20:13:52 +00:00
  • ec9afcfe8d Add missing enums to data_type_sizeof (#3430) John Shumway 2025-12-15 11:49:36 -08:00
  • 9f00d51b14 Add missing enums to data_type_sizeof (#3430) John Shumway 2025-12-15 11:49:36 -08:00
  • 2544e394cf Add missing enums to data_type_sizeof (#3430) John Shumway 2025-12-15 11:49:36 -08:00
  • f7e0b65883 fix clang-format KenSCLin 2025-12-15 16:40:03 +00:00
  • e47c648602 fix clang-format KenSCLin 2025-12-15 16:40:03 +00:00
  • c7f1c37551 Merge branch 'develop' into ck_tile/gemm_blockscale_abquant kensclin 2025-12-16 00:29:19 +08:00
  • 6dd37ab1bb Fix copyright headers kiefer 2025-12-15 16:28:40 +00:00
  • 291c6fef56 Small post-merge fixes kiefer 2025-12-15 16:27:17 +00:00
  • 9e037c82a8 Merge commit '5e2d25e20f40eb7a6ba2e788f82f677649fb37d6' into develop assistant-librarian[bot] 2025-12-15 16:15:30 +00:00
  • dd2d640e6d Add new INT8 WMMA instances. vpietila/int8-perf-on-navi4x Ville Pietilä 2025-12-15 11:12:51 -05:00
  • 389e797a9b build: reduce build time for bquant tests by splitting into multiple cpp & support on other gfx10 case (#3395) Aviral Goel 2025-12-15 19:19:29 +04:00
  • fdd6c3e0ee build: reduce build time for bquant tests by splitting into multiple cpp & support on other gfx10 case (#3395) Aviral Goel 2025-12-15 19:19:29 +04:00
  • 5e2d25e20f build: reduce build time for bquant tests by splitting into multiple cpp & support on other gfx10 case (#3395) Aviral Goel 2025-12-15 19:19:29 +04:00
  • 4a29a8f84d [CK_TILE] Fix some inconsistencies with OverrideBDatatype in BQuant GEMM (#3394) Sami Remes 2025-12-15 15:18:38 +00:00
  • ab598aa499 [CK_TILE] Fix some inconsistencies with OverrideBDatatype in BQuant GEMM (#3394) Sami Remes 2025-12-15 15:18:38 +00:00
  • a0cdb0b493 [CK_TILE] Fix some inconsistencies with OverrideBDatatype in BQuant GEMM (#3394) Sami Remes 2025-12-15 15:18:38 +00:00
  • 6b08653fb7 Merge commit '7e93eed8787afd175d3a045303096a4a98638f4b' into develop assistant-librarian[bot] 2025-12-15 15:18:13 +00:00
  • 7cdba74e97 [ck][gfx12] support contraction on gfx12 (#3421) linqunAMD 2025-12-15 23:16:01 +08:00
  • fe6b0fb707 [ck][gfx12] support contraction on gfx12 (#3421) linqunAMD 2025-12-15 23:16:01 +08:00
  • 7e93eed878 [ck][gfx12] support contraction on gfx12 (#3421) linqunAMD 2025-12-15 23:16:01 +08:00
  • 8811c57d44 [ck_tile] remove duplicate functions in ck_tile (#3311) linqunAMD 2025-12-15 23:13:00 +08:00
  • 3d079f66cf [ck_tile] remove duplicate functions in ck_tile (#3311) linqunAMD 2025-12-15 23:13:00 +08:00
  • 6d7299ff78 [ck_tile] remove duplicate functions in ck_tile (#3311) linqunAMD 2025-12-15 23:13:00 +08:00
  • ff8d296d75 Merge branch 'develop' into jograner/hotfix-grouped-gemm-two-stage Johannes Graner 2025-12-15 15:50:13 +01:00
  • dbb2e39386 Merge remote-tracking branch 'origin/develop' into 65-grouped-conv-fwd-wmma-upstreamable kiefer 2025-12-15 13:59:36 +00:00
  • 742acf2707 Merge commit 'fe35ba5dac168619462669192423ff40548d532d' into develop assistant-librarian[bot] 2025-12-15 13:25:53 +00:00
  • 0f4c7d7a12 Improve perf results plotting. Ville Pietilä 2025-12-15 08:17:19 -05:00
  • e20e646f8c adressed review comments kabraham/prng_tests_integration Kevin Abraham 2025-12-15 12:51:15 +00:00
  • ef629bcd02 Fix profiling output. Ville Pietilä 2025-12-15 07:46:27 -05:00
  • cbec566928 Improve script. Ville Pietilä 2025-12-15 07:41:32 -05:00
  • 2fe4c8acec Add grouped convnd dataset tests for bwd_data, bwd_weight and make them parallel (#3380) Johannes Graner 2025-12-15 13:38:25 +01:00
  • 9d6790dc2e Add grouped convnd dataset tests for bwd_data, bwd_weight and make them parallel (#3380) Johannes Graner 2025-12-15 13:38:25 +01:00
  • fe35ba5dac Add grouped convnd dataset tests for bwd_data, bwd_weight and make them parallel (#3380) Johannes Graner 2025-12-15 13:38:25 +01:00
  • 4f79e0d308 Merge commit '3b773109e5b98a7b11d2976e465ecb7c57f2bea6' into develop assistant-librarian[bot] 2025-12-15 12:19:45 +00:00
  • ee6b3f34b5 clang-format Kevin Abraham 2025-12-11 08:46:52 +00:00
  • 1192d2c3c5 run clang format Kevin Abraham 2025-12-11 08:42:32 +00:00
  • 43efeb67e6 add interface to device_mem for tests Kevin Abraham 2025-12-11 07:15:27 +00:00
  • 60a476420d integrate tensor initialization Kevin Abraham 2025-12-10 13:52:55 +00:00
  • ee08e55e48 add tensor initialization to builder Kevin Abraham 2025-12-10 08:57:39 +00:00
  • 8fa013b782 first implementation of device prng integration Kevin Abraham 2025-12-08 21:02:53 +00:00
  • a45c051ac9 [CK TILE][AICK-439] Fix cshuffle epilogue wave per shuffle (#3364) Bartłomiej Kocot 2025-12-15 12:59:48 +01:00
  • 5941f4fb83 [CK TILE][AICK-439] Fix cshuffle epilogue wave per shuffle (#3364) Bartłomiej Kocot 2025-12-15 12:59:48 +01:00
  • 3b773109e5 [CK TILE][AICK-439] Fix cshuffle epilogue wave per shuffle (#3364) Bartłomiej Kocot 2025-12-15 12:59:48 +01:00
  • dd897f8799 Revert "WIP: Grouped convolution bwd weight wmma v3 instance selection" revert-3378-streamhpc/conv_bwd_weight_wmma_instance_selection Kiefer van Teutem 2025-12-15 10:37:44 +01:00
  • 2027fca5b6 Merge pull request #3378 from ROCm/streamhpc/conv_bwd_weight_wmma_instance_selection Kiefer van Teutem 2025-12-15 10:27:54 +01:00
  • 94f6c66f8c Remove unwanted instances. This includes all instances which are not NHWGCxGKYXC and F16 or BF16 (no mixed in-out types). kiefer 2025-12-10 16:22:42 +00:00
  • 5d9c3705d5 Fix clang format kiefer 2025-12-10 11:08:14 +00:00
  • 2154ee899f Remove [[maybe_unused]] kiefer 2025-12-09 13:38:55 +00:00
  • aba5959e11 Remove straggler comments kiefer 2025-12-09 13:24:17 +00:00
  • 9021fd68b6 Re-enable all xdl instances (un-16x16-adapted) and dl instances. Remove custom ckProfiler target. kiefer 2025-12-09 13:13:49 +00:00
  • c2faf36db2 Remove unused instance lists and related add_x_instance() functions, fwd declarations, cmakelists entries. Also merge the "wmma" and "wmma v3" instance list files, which are both v3. kiefer 2025-12-09 13:02:48 +00:00
  • 31f08a0169 Disable all non-generic two-stage instances in the instance lists for NHWGC. They are never faster and support is already carried by CShuffleV3 and Explicit. kiefer 2025-12-08 15:07:15 +00:00
  • 5470b80774 Remove more instances which fail verification, for bf16_f32_bf16 and for f16 scale / bilinear. kiefer 2025-12-08 14:38:49 +00:00
  • 6a5ea18a65 Disable two stage f16 instances which produce incorrect results. kiefer 2025-12-05 15:34:14 +00:00
  • 6380f9ab81 Add instances for scale and bilinear based on the bf16 NHWGC GKYXC tuning. Keep generic instances for support. kiefer 2025-12-05 13:37:51 +00:00
  • c2fab9d677 Add back some generic instances to make sure we have the same shape / layout / datatype support as before the instance selection process. kiefer 2025-12-05 13:35:22 +00:00
  • 93ab2c4a8c Add bf16 f32 bf16 instances based on tuned b16 NHWGC GKYXC instances. kiefer 2025-12-04 09:55:23 +00:00
  • c9c05bfc7d Remove some instances that give incorrect results (f16 NHWGC) kiefer 2025-12-04 08:27:08 +00:00
  • 9d5810942a Replace cshuffle non-v3 lists with v3 lists, making sure to not have duplications. Also removing stride1pad0 support for NHWGC since we can use explicit for those cases. kiefer 2025-12-02 16:02:40 +00:00
  • 4fe6c7ddcb Add two stage instances based on the parameters from the tuned cshuffle V3 instances. CShuffleBlockTranserScalarPerVector adapted to 4, and mergegroups fixed to 1 for now. No more special instance lists. kiefer 2025-12-02 15:50:59 +00:00
  • 8e88660834 Add explicit oddMN support with custom tuned instances kiefer 2025-12-02 14:14:04 +00:00
  • 133f5538e3 Reduce instances to only the tuned wmma V3 ones for implicit v1 intra and explicit v1 intra pad/nopad. kiefer 2025-12-01 11:25:03 +00:00
  • 7a516b8e99 Adapt all grouped conv bwd weight vanilla Xdl instances to 16x16. MRepeat doubled for all but 12 of them (some static assert failure). Also added custom reduced profiler target for building grouped conv bwd weight vanilla only profiler. Verified with gtest test. kiefer 2025-10-20 13:34:56 +00:00
  • b5ccc070a8 Fix splitk ab scale Enrico Degregori 2025-12-15 08:19:21 +00:00
  • 6164d076de Merge commit '3143a5a480e4fcf216670012fe491b44324f03b6' into develop assistant-librarian[bot] 2025-12-15 07:16:25 +00:00
  • b369be22d9 Merge branch 'develop' into jograner/hotfix-grouped-gemm-two-stage Johannes Graner 2025-12-15 08:04:27 +01:00
  • 6238fe6d0d [CK Grouped Gemm] Disable split-k kernel for split-k > 1 with non-contiguous strides (#3405) Johannes Graner 2025-12-15 08:03:00 +01:00
  • 39bb48ed2e [CK Grouped Gemm] Disable split-k kernel for split-k > 1 with non-contiguous strides (#3405) Johannes Graner 2025-12-15 08:03:00 +01:00
  • 3143a5a480 [CK Grouped Gemm] Disable split-k kernel for split-k > 1 with non-contiguous strides (#3405) Johannes Graner 2025-12-15 08:03:00 +01:00
  • f3162c067e Merge branch 'develop' into ck_moe_bs_splitk_pr huaiguxu 2025-12-15 14:06:20 +08:00
  • 8537a356a3 [CK Tile] Fix FMHA LSE calculation and potential division by zero yewang12/te_bias_all_minf Jeff Huang 2025-11-28 11:30:10 +08:00
  • 669906c786 Merge commit 'f5573f56d9d4981def16f575ddb14535b93bb9bb' into develop assistant-librarian[bot] 2025-12-15 04:28:43 +00:00
  • 51886bf22b Add attention sink support for FMHA FWD (#3368) Linjun-AMD 2025-12-15 12:21:59 +08:00
  • e7584178fd Add attention sink support for FMHA FWD (#3368) Linjun-AMD 2025-12-15 12:21:59 +08:00