Commit Graph

  • b60af5bde9 [CK_TILE]enhance elementwise test (#2683) joyeamd 2025-09-30 23:29:37 +08:00
  • 20333fd850 [CK] Add command option instance_index and param_mask to run partial ck test (#2889) linqunAMD 2025-09-30 23:24:40 +08:00
  • 6c4ff0b062 [CK] Add command option instance_index and param_mask to run partial ck test (#2889) linqunAMD 2025-09-30 23:24:40 +08:00
  • e78a897ec0 [CK] Add command option instance_index and param_mask to run partial ck test (#2889) linqunAMD 2025-09-30 23:24:40 +08:00
  • 0af3803a3c Enable gmock in gtest.cmake. John Shumway 2025-09-30 14:07:38 +00:00
  • 0ba63c314f Merge branch 'develop' into tests_for_batched_grouped_gemm Aleksander Dudek 2025-09-30 06:00:32 -05:00
  • c3f0c1a866 Add additional check for non-supported c > 1 case. vpietila/merge-multiple-depthwise-conv-groups-into-single-gemm-batch Ville Pietilä 2025-09-30 07:46:24 +00:00
  • db835e065c Make MPerGroup and NPerGroup template parameters. Ville Pietilä 2025-09-30 07:14:28 +00:00
  • 1a6f602c65 Remove debug code. Ville Pietilä 2025-09-30 05:53:28 +00:00
  • e37ff54723 Merge branch 'wjx/preshuffle_format' of https://github.com/ROCm/composable_kernel into wjx/preshuffle_format lalala-sh 2025-09-30 11:07:32 +08:00
  • adfdc69f7a code clean valarLip 2025-09-30 03:06:09 +00:00
  • 001d6227ee Merge branch 'develop' into wjx/preshuffle_format lalala-sh 2025-09-30 10:07:14 +08:00
  • 67ded700d1 preshuffle reformat lalala-sh 2025-09-29 02:17:20 +00:00
  • e44e37654c code clean lalala-sh 2025-09-26 08:19:11 +00:00
  • d0ca011c1a fix wp gemm bug when permuteN is false lalala-sh 2025-09-26 08:11:59 +00:00
  • 4028ae4993 Adding CK Tile documentation Vidyasagar 2025-09-29 17:50:47 -07:00
  • 631a25de61 Merge commit '28ad8ae5d8558e147f29aba29db569fe25210947' into develop assistant-librarian[bot] 2025-09-29 23:11:42 +00:00
  • dbc623f455 tmp barkocot/lwpck-3853 Bartlomiej Kocot 2025-09-29 22:54:04 +00:00
  • f5273249d8 Merge branch 'wjx/flatmm_merge' into felix/flatmm_fix_splitk felix/flatmm_fix_splitk Illia Silin 2025-09-29 15:35:13 -07:00
  • f0f6686b6f Fix timing issue in CK_TILE GEMM example (#2940) Hosang 2025-09-29 18:34:04 -04:00
  • 780456f1ce Fix timing issue in CK_TILE GEMM example (#2940) Hosang 2025-09-29 18:34:04 -04:00
  • 28ad8ae5d8 Fix timing issue in CK_TILE GEMM example (#2940) Hosang 2025-09-29 18:34:04 -04:00
  • 78f2779870 Merge commit 'bebf0e9d158c13d34c9f263a9551f60fa463bc66' into develop assistant-librarian[bot] 2025-09-29 22:11:28 +00:00
  • a69e4ed8b7 Extend Grouped GEMM with MultiD (Single & Double Shared Memory) feature to use persistent kernel option (#2933) Aviral Goel 2025-09-29 18:03:56 -04:00
  • 7775768c88 Extend Grouped GEMM with MultiD (Single & Double Shared Memory) feature to use persistent kernel option (#2933) Aviral Goel 2025-09-29 18:03:56 -04:00
  • bebf0e9d15 Extend Grouped GEMM with MultiD (Single & Double Shared Memory) feature to use persistent kernel option (#2933) Aviral Goel 2025-09-29 18:03:56 -04:00
  • 1d3938e703 Merge branch 'develop' into new_time_based_ckprofiler_emin new_time_based_ckprofiler_emin Muhammed Emin Ozturk 2025-09-29 14:48:00 -07:00
  • ae91f79c96 Code style clean-up and documentation Emily Martins 2025-09-24 15:32:25 +00:00
  • 7a9bff148f Code style clean-up and documentation Emily Martins 2025-09-24 15:32:25 +00:00
  • 243118c275 Code style clean-up and documentation Emily Martins 2025-09-24 15:32:25 +00:00
  • 38afb46dcd Add CK Tile Stream-K bf16 and fp16 examples Emily Martins 2025-09-16 22:40:40 +00:00
  • e81e1b3221 Add CK Tile Stream-K bf16 and fp16 examples Emily Martins 2025-09-16 22:40:40 +00:00
  • a3499e38b2 Add CK Tile Stream-K bf16 and fp16 examples Emily Martins 2025-09-16 22:40:40 +00:00
  • ad28433ab6 Merge branch 'develop' into new_time_based_ckprofiler_emin Muhammed Emin Ozturk 2025-09-29 14:43:48 -07:00
  • 639b61786a timing based configuration was added ozturkosu 2025-09-29 21:40:44 +00:00
  • 5891af25fa update log ozturkosu 2025-09-29 20:38:41 +00:00
  • 1ff47b0020 Merge commit '35e116f5c088dc7673856e8a78539243e61044dc' into develop assistant-librarian[bot] 2025-09-29 20:24:04 +00:00
  • 390a427be6 increase time limit for AITER tests (#2948) Illia Silin 2025-09-29 13:11:42 -07:00
  • c2be5afb7a increase time limit for AITER tests (#2948) Illia Silin 2025-09-29 13:11:42 -07:00
  • 35e116f5c0 increase time limit for AITER tests (#2948) Illia Silin 2025-09-29 13:11:42 -07:00
  • 678c9bb01f Weight Preshuffle Block Scale gemm support (#2877) Khushbu Agarwal 2025-09-29 12:46:37 -07:00
  • 7c20b1f690 Weight Preshuffle Block Scale gemm support (#2877) Khushbu Agarwal 2025-09-29 12:46:37 -07:00
  • 81458a6681 Weight Preshuffle Block Scale gemm support (#2877) Khushbu Agarwal 2025-09-29 12:46:37 -07:00
  • 3cf7343e08 Merge commit '2e9428eb63be091b109537e082aa7f0fc05a634d' into develop assistant-librarian[bot] 2025-09-29 17:12:15 +00:00
  • 3c553f66b2 hot fix check eid range (#2924) carlushuang 2025-09-30 00:38:38 +08:00
  • 47b8632296 hot fix check eid range (#2924) carlushuang 2025-09-30 00:38:38 +08:00
  • 2e9428eb63 hot fix check eid range (#2924) carlushuang 2025-09-30 00:38:38 +08:00
  • 2e7d600076 Merge commit '2b684f0a7d2317b1b1f001716acb62f566cc71ee' into develop assistant-librarian[bot] 2025-09-29 16:12:12 +00:00
  • afe012e7fb [CK][Examples] Extending support for rdna3/4 in following examples: (#2884) Michał Kulikowski 2025-09-29 18:05:04 +02:00
  • ac4ecdacc5 [CK][Examples] Extending support for rdna3/4 in following examples: (#2884) Michał Kulikowski 2025-09-29 18:05:04 +02:00
  • 2b684f0a7d [CK][Examples] Extending support for rdna3/4 in following examples: (#2884) Michał Kulikowski 2025-09-29 18:05:04 +02:00
  • 193907fd85 Fix case k > 1 and c=1. Ville Pietilä 2025-09-29 16:02:00 +00:00
  • 091c5200ce Merge commit '0f04f020d979875de01274901b8f3cc15e600a8f' into develop assistant-librarian[bot] 2025-09-29 15:12:26 +00:00
  • 75888c151d fix:tf32:fix build fail for all supported targets (#2942) yinglu 2025-09-29 23:04:11 +08:00
  • f9daaa9724 fix:tf32:fix build fail for all supported targets (#2942) yinglu 2025-09-29 23:04:11 +08:00
  • 0f04f020d9 fix:tf32:fix build fail for all supported targets (#2942) yinglu 2025-09-29 23:04:11 +08:00
  • dde91b60fb [CK] Fix example_grouped_conv_bwd_data_xdl_fp16 with ksplit = 2 (#2943) linqunAMD 2025-09-29 22:56:33 +08:00
  • b6cb76a555 [CK] Fix example_grouped_conv_bwd_data_xdl_fp16 with ksplit = 2 (#2943) linqunAMD 2025-09-29 22:56:33 +08:00
  • 769c58f133 [CK] Fix example_grouped_conv_bwd_data_xdl_fp16 with ksplit = 2 (#2943) linqunAMD 2025-09-29 22:56:33 +08:00
  • a522bf38fe Packed f32 -> fp16 fix, disable LDS check as temp fix samremes/fmha_fwd_v3_for_gfx942 Sami Remes 2025-09-29 14:26:00 +00:00
  • 1d9ec09cf2 Grouped Conv Bwd Data out index calculation optimizations (#2917) Bartłomiej Kocot 2025-09-29 15:59:11 +02:00
  • ef933ee241 Grouped Conv Bwd Data out index calculation optimizations (#2917) Bartłomiej Kocot 2025-09-29 15:59:11 +02:00
  • 5477811670 Grouped Conv Bwd Data out index calculation optimizations (#2917) Bartłomiej Kocot 2025-09-29 15:59:11 +02:00
  • 79bee7c549 Fix cmake file for tests Enrico Degregori 2025-09-29 12:31:40 +00:00
  • f9767142cf Merge commit '0f10e6d9218ce9d00a34a66572c0686dce1e45ea' into develop assistant-librarian[bot] 2025-09-29 11:12:04 +00:00
  • 0da205766b [CK_TILE] Fixing Type Conversions in PassThroughPack8 (#2769) SamiAario-AMD 2025-09-29 13:34:47 +03:00
  • 4bc708f401 [CK_TILE] Fixing Type Conversions in PassThroughPack8 (#2769) SamiAario-AMD 2025-09-29 13:34:47 +03:00
  • 0f10e6d921 [CK_TILE] Fixing Type Conversions in PassThroughPack8 (#2769) SamiAario-AMD 2025-09-29 13:34:47 +03:00
  • 1221921679 Merge branch 'explicit_bwd_weight' into 'feature/conv_bwd_weight_wmma' Enrico Degregori 2025-09-29 07:56:59 +00:00
  • 85570f98a0 Review fixes Enrico Degregori 2025-09-03 11:05:08 +00:00
  • 80f72391c5 Fix ckProfiler dependencies Enrico Degregori 2025-08-18 11:22:05 +00:00
  • b56e9f6bc4 Add support for occupancy-based splitk Enrico Degregori 2025-08-14 09:19:58 +00:00
  • 45b3d26e3c Add instances for pipeline v1 and v3 Enrico Degregori 2025-08-12 10:18:26 +00:00
  • 70238cab87 Device implementation of explicit gemm for grouped conv bwd weight Enrico Degregori 2025-08-08 16:23:04 +00:00
  • 207cc39ee4 Merge branch 'grouped_conv_bwd_weight_instances_examples' into 'feature/conv_bwd_weight_wmma' Enrico Degregori 2025-08-18 07:34:51 +00:00
  • 7c1c070471 Compute tolerances instead of using default ones in bilinear and scale tests Enrico Degregori 2025-08-13 09:04:36 +00:00
  • 671fb7f383 Compute tolerances in examples instead of using default ones Enrico Degregori 2025-08-13 09:03:24 +00:00
  • 0dc8f8e769 Fix instances Enrico Degregori 2025-08-13 07:06:11 +00:00
  • 23ccaeef7d Fix compilation error Enrico Degregori 2025-08-13 07:05:11 +00:00
  • a783028023 Add atomic add float4 Enrico Degregori 2025-08-12 19:32:46 +00:00
  • 202cc22c19 Fix examples compilation Enrico Degregori 2025-08-12 18:49:45 +00:00
  • 8ec5908e0e Fix copyright Enrico Degregori 2025-08-12 10:23:15 +00:00
  • ca078f8fc1 Uncomment scale instances Enrico Degregori 2025-08-06 16:50:06 +00:00
  • 23c9189103 Add examples Enrico Degregori 2025-08-06 13:03:01 +00:00
  • c71f2f25eb Add multiple Ds instances Enrico Degregori 2025-08-06 13:00:33 +00:00
  • e6b7d5ed65 Add two stage instances (xdl parity) Enrico Degregori 2025-08-05 09:59:33 +00:00
  • 0b7f0cbbeb Add instances for xdl parity (for pipeline v1) Enrico Degregori 2025-08-01 14:12:22 +00:00
  • b1c6973ad1 Remove workaround for 1x1Stride1Pad0 conv specialization Enrico Degregori 2025-08-01 14:08:42 +00:00
  • 305cbbc3ac Add padding in conv to gemm transformers for 1x1Stride1Pad0 specialization Enrico Degregori 2025-08-01 14:06:14 +00:00
  • 82133901a4 Fix bugs in device implementation: Enrico Degregori 2025-07-31 11:29:48 +00:00
  • 2ba2c5d1b4 check gridwise level validity in device impl for 1 stage D0 Enrico Degregori 2025-07-29 10:44:22 +00:00
  • 6514b15fb5 Add generic instances for bf16 f32 bf16 Enrico Degregori 2025-07-29 10:43:39 +00:00
  • 9dbbb07953 Fix bug and disable splitK=-1 tests for wmma Enrico Degregori 2025-08-07 07:27:11 +00:00
  • 37b6d28dc0 Merge branch 'grouped_conv_bwd_weight_device_impl_wmma' into 'feature/conv_bwd_weight_wmma' Enrico Degregori 2025-08-05 12:51:20 +00:00
  • 9d7a01f82f Convolution bwd weight device implementation Enrico Degregori 2025-08-05 12:51:20 +00:00
  • f2a89da6c9 preshuffle reformat valarLip 2025-09-29 02:17:20 +00:00
  • 858058213d change gen config felix/tunx_norm felix 2025-09-28 07:32:55 +00:00
  • 2593ecf5b5 Merge commit 'e8842e3c1fe75f4967105914032aced63e233225' into develop assistant-librarian[bot] 2025-09-27 22:11:27 +00:00
  • 867351e019 Use git ls-files to select candidate files for clang format John Afaganis 2025-09-25 17:24:04 -06:00
  • 3e6bc62993 Use git ls-files to select candidate files for clang format John Afaganis 2025-09-25 17:24:04 -06:00