Commit Graph

  • 31939e7b2b [CK][Examples] Fixing stride issues in ck examples by workaround - Bypassing hostTensor validation. Michal Kulikowski 2025-10-16 13:01:24 +02:00
  • b9789a0742 [CK][Examples] Fixing stride issues in ck examples by workaround - Bypassing hostTensor validation. Michal Kulikowski 2025-10-16 13:01:24 +02:00
  • 2b8e53b623 Revert "Add padding to 1x1Stride1Pad0 conv specialization (grouped conv bwd weight) (#2610)" (#2637) CK/rel_stg_3076 Enrico Degregori 2025-08-07 12:30:08 +02:00
  • 32b15133e0 Merge commit '0d3860dfdb3299dea139953c3ce62da5325019c6' into develop assistant-librarian[bot] 2025-10-23 01:39:45 +00:00
  • 895983c816 [CKTILE] FMHA fwd trload lse fix (#3046) Haocong WANG 2025-10-23 09:33:33 +08:00
  • 2a01918313 [CKTILE] FMHA fwd trload lse fix (#3046) Haocong WANG 2025-10-23 09:33:33 +08:00
  • 0d3860dfdb [CKTILE] FMHA fwd trload lse fix (#3046) Haocong WANG 2025-10-23 09:33:33 +08:00
  • 6490427c89 updated the changelog for 7.1 and beyond spolifroni-amd 2025-10-17 14:06:04 -04:00
  • c4c504b867 Merge commit '1b95803431d50361d22c3b76c4caf6608e83069d' into develop assistant-librarian[bot] 2025-10-22 20:13:26 +00:00
  • 7c14d97d0e updated the changelog with 7.1 and beyond info spolifroni-amd 2025-10-17 14:06:04 -04:00
  • 93115d8e56 updated the changelog with 7.1 and beyond info spolifroni-amd 2025-10-17 14:06:04 -04:00
  • 1b95803431 updated the changelog with 7.1 and beyond info spolifroni-amd 2025-10-17 14:06:04 -04:00
  • 8f0182b6b4 Revert "Add padding to 1x1Stride1Pad0 conv specialization (grouped conv bwd weight) (#2610)" (#2637) Enrico Degregori 2025-08-07 12:30:08 +02:00
  • 898ae9c620 Merge commit '211d64e18a1bf2ecb1d13c5eb87983bdcabb3b5e' into develop assistant-librarian[bot] 2025-10-22 15:12:27 +00:00
  • 0329d71fb9 [CK_TILE] Update flatmm related kernels (#3022) lalala-sh 2025-10-22 22:36:11 +08:00
  • 63e0a73bd3 [CK_TILE] Update flatmm related kernels (#3022) lalala-sh 2025-10-22 22:36:11 +08:00
  • 211d64e18a [CK_TILE] Update flatmm related kernels (#3022) lalala-sh 2025-10-22 22:36:11 +08:00
  • d8559798d5 Print out aggregated statistics. Ville Pietilä 2025-10-22 14:09:31 +00:00
  • e2284597c5 Merge remote-tracking branch 'origin/develop' into conv_bwd_weight_wmma kiefer 2025-10-22 13:59:32 +00:00
  • 5065ddd409 Plot a large set of benchmark results. Ville Pietilä 2025-10-22 13:41:14 +00:00
  • df5d804f47 Merge branch 'vpietila/ck-vs-ck-tile-conv-benchmarking' of github.com:ROCm/composable_kernel into vpietila/ck-vs-ck-tile-conv-benchmarking Ville Pietilä 2025-10-22 13:16:58 +00:00
  • 05b27bbf95 More script improvements. Ville Pietilä 2025-10-22 13:16:54 +00:00
  • fbbadd4315 Add more gfx942 instances. Ville Pietilä 2025-10-22 13:16:00 +00:00
  • 60db94037e Fix compilation issues on MI300. Ville Pietilä 2025-10-22 12:51:02 +00:00
  • 2934bb0489 Merge commit 'cbd1279ae68d8b463b9b20106e968f8ccf2a11e6' into develop assistant-librarian[bot] 2025-10-22 12:17:24 +00:00
  • a6c3252766 [CK_TILE] Conv bwd splitN support (#3047) Johannes Graner 2025-10-22 13:34:06 +02:00
  • b8882aae95 [CK_TILE] Conv bwd splitN support (#3047) Johannes Graner 2025-10-22 13:34:06 +02:00
  • cbd1279ae6 [CK_TILE] Conv bwd splitN support (#3047) Johannes Graner 2025-10-22 13:34:06 +02:00
  • d100ab690a fix formatting Sami Remes 2025-10-22 11:12:39 +00:00
  • f179a8a97b remove commented code and enable all tests again Sami Remes 2025-10-22 11:09:21 +00:00
  • 2708c12866 Merge commit '5a27a97391d08652c3da0a5347209c19d3ebb03d' into develop assistant-librarian[bot] 2025-10-22 07:14:09 +00:00
  • f23b8cde7b Introduce tree reduction for BlockReduce2dCrossWarpSync (#2588) MHYangAMD 2025-10-22 14:41:35 +08:00
  • 6d802e7ba4 Introduce tree reduction for BlockReduce2dCrossWarpSync (#2588) MHYangAMD 2025-10-22 14:41:35 +08:00
  • 5a27a97391 Introduce tree reduction for BlockReduce2dCrossWarpSync (#2588) MHYangAMD 2025-10-22 14:41:35 +08:00
  • 5fbbd3eed7 Merge commit '37dff024c1d2c6420a91d9a4b0801b350db3eede' into develop assistant-librarian[bot] 2025-10-22 04:13:42 +00:00
  • a488126d3e [CK_BUILDER] Add compile-time reflection for a convolution instance (#3065) John Shumway 2025-10-21 21:10:19 -07:00
  • 8f48205046 [CK_BUILDER] Add compile-time reflection for a convolution instance (#3065) John Shumway 2025-10-21 21:10:19 -07:00
  • 37dff024c1 [CK_BUILDER] Add compile-time reflection for a convolution instance (#3065) John Shumway 2025-10-21 21:10:19 -07:00
  • a12261aced Merge remote-tracking branch 'origin/barkocot/explicit-string-out' into cderb/prefetch_tuning_251021 cderb/prefetch_tuning_251021 Christopher Erb 2025-10-21 10:57:43 -05:00
  • bb52cd9889 Fix handling of n dim blocks in tile windows etc Sami Remes 2025-10-21 15:51:23 +00:00
  • 784c84c54b Benchmarking script improvements. Ville Pietilä 2025-10-21 15:09:57 +00:00
  • 6144f5c490 Enable vectorization in descriptor-based batched contraction. Add pad_tensor_view to local RunGemm Mohsen Saffari 2025-10-21 14:29:49 +00:00
  • 6ecded14e2 Merge commit '3a28632b203f9219ed4906d46457872ef1084054' into develop assistant-librarian[bot] 2025-10-21 14:13:05 +00:00
  • ebd8495721 Gridwise gemm conv v3 force padded layout on gfx950 (#2961) Bartłomiej Kocot 2025-10-21 15:41:02 +02:00
  • 4f83a3d745 Gridwise gemm conv v3 force padded layout on gfx950 (#2961) Bartłomiej Kocot 2025-10-21 15:41:02 +02:00
  • 3a28632b20 Gridwise gemm conv v3 force padded layout on gfx950 (#2961) Bartłomiej Kocot 2025-10-21 15:41:02 +02:00
  • c8e373c4ab Merge commit '35754d2ec817087a2a7de53729f2a97c7c9f05fa' into develop assistant-librarian[bot] 2025-10-21 13:22:20 +00:00
  • 12e9bcd7e2 fix identity value of AbsMax (#3058) Yashvardhan Agarwal 2025-10-21 15:42:08 +03:00
  • 9072046e55 fix identity value of AbsMax (#3058) Yashvardhan Agarwal 2025-10-21 15:42:08 +03:00
  • 35754d2ec8 fix identity value of AbsMax (#3058) Yashvardhan Agarwal 2025-10-21 15:42:08 +03:00
  • a5afa4c07e Script to convert MIOpenDriver commands to CK profiler input. Ville Pietilä 2025-10-21 08:16:14 +00:00
  • e237b82762 Merge commit '4043401db186ee006f14fb00842af29c194ba209' into develop assistant-librarian[bot] 2025-10-21 08:15:24 +00:00
  • c3fe4ef002 fixed builder_utils.hpp (took from wrong branch) builder_inlineDiff Kevin Abraham 2025-10-21 07:54:22 +00:00
  • 0be14218d4 Fix race conditions in ck_tile remod (#3061) Johannes Graner 2025-10-21 09:35:04 +02:00
  • 671f2686c0 Fix race conditions in ck_tile remod (#3061) Johannes Graner 2025-10-21 09:35:04 +02:00
  • 4043401db1 Fix race conditions in ck_tile remod (#3061) Johannes Graner 2025-10-21 09:35:04 +02:00
  • 159bcf6750 Small script improvements. Ville Pietilä 2025-10-21 06:38:49 +00:00
  • 23b05f3c99 update yandai/wip_mi355 yadai 2025-10-21 04:16:46 +00:00
  • d3658e9aa2 Merge commit 'ff6efa2fb17db0266b0ff2fa531ffc9fad31b0cc' into develop assistant-librarian[bot] 2025-10-21 03:28:40 +00:00
  • eecc99e83d refine Max Podkorytov 2025-10-18 04:38:41 +00:00
  • 7d1d0565d9 refine Max Podkorytov 2025-10-18 04:38:41 +00:00
  • ff6efa2fb1 refine Max Podkorytov 2025-10-18 04:38:41 +00:00
  • 983a221831 update build instructions Max Podkorytov 2025-10-18 04:25:22 +00:00
  • 6200ea9dfc update build instructions Max Podkorytov 2025-10-18 04:25:22 +00:00
  • b9e966e574 update build instructions Max Podkorytov 2025-10-18 04:25:22 +00:00
  • 15be57e8d1 Merge branch 'develop' into ginolu/ut_async ginolu/ut_async Gino Lu 2025-10-20 21:58:30 -05:00
  • 600620c284 refine code Gino Lu 2025-10-20 21:56:05 -05:00
  • 4da620cc9d fix grid bug Gino Lu 2025-10-20 21:53:13 -05:00
  • 9d8f10c9f3 Merge commit 'e20923f384492dab3dafdbace6f2bd2b45186cc2' into develop assistant-librarian[bot] 2025-10-21 02:41:02 +00:00
  • 0c61d0da8d [CK_TILE] Add fmt: skip to FMHA codegen scripts for readability (#3057) Yi DING 2025-10-21 10:15:04 +08:00
  • 698810c92f [CK_TILE] Add fmt: skip to FMHA codegen scripts for readability (#3057) Yi DING 2025-10-21 10:15:04 +08:00
  • e20923f384 [CK_TILE] Add fmt: skip to FMHA codegen scripts for readability (#3057) Yi DING 2025-10-21 10:15:04 +08:00
  • 93cc3fd985 Merge commit '2570462ecf46b51267548d41eb749c67a52d6085' into develop assistant-librarian[bot] 2025-10-20 21:11:26 +00:00
  • df3f347a27 [CK_TILE] Fix transpose_vectors for 2x2 8-bit tiles (#3042) Max Podkorytov 2025-10-20 13:40:44 -07:00
  • 1d7e4157c5 [CK_TILE] Fix transpose_vectors for 2x2 8-bit tiles (#3042) Max Podkorytov 2025-10-20 13:40:44 -07:00
  • 2570462ecf [CK_TILE] Fix transpose_vectors for 2x2 8-bit tiles (#3042) Max Podkorytov 2025-10-20 13:40:44 -07:00
  • 156cfffbc6 Merge commit '9f770610948b2666cc021e8ae6955821caad7791' into develop assistant-librarian[bot] 2025-10-20 16:13:25 +00:00
  • 09acf06d06 [CK TILE ENGINE] Code changes to finding GPU id from TARGET (#3055) Thrupti Raj Lakshmana Gowda 2025-10-20 11:02:18 -05:00
  • 61dbfdb27b [CK TILE ENGINE] Code changes to finding GPU id from TARGET (#3055) Thrupti Raj Lakshmana Gowda 2025-10-20 11:02:18 -05:00
  • 9f77061094 [CK TILE ENGINE] Code changes to finding GPU id from TARGET (#3055) Thrupti Raj Lakshmana Gowda 2025-10-20 11:02:18 -05:00
  • 3f2ae16775 added builder_utils Kevin Abraham 2025-10-20 15:55:05 +00:00
  • f72b994b00 More compilation fixes Tianxing Wu 2025-10-20 15:53:35 +00:00
  • 3d16b2fa27 run clang-format Kevin Abraham 2025-10-20 15:49:11 +00:00
  • 4dae6b6f13 added inlineDiff and some more comprehensive tests Kevin Abraham 2025-10-20 15:46:57 +00:00
  • 36b88c665c WIP Sami Remes 2025-10-20 15:42:39 +00:00
  • 50d984a353 Add printouts to illustrate differences that can be seen on difference images emimarti/ck_tile/incorrect_validation Emily Martins 2025-10-20 15:33:01 +00:00
  • d0b980ba30 Merge commit 'f18b79f328df35e2305416b890dbb9eb561fa9e2' into develop assistant-librarian[bot] 2025-10-20 15:12:34 +00:00
  • d68a541c19 fixing compile errors... Juuso Korhonen 2025-10-20 15:04:47 +00:00
  • f57d4937c6 [CK_BUILDER] Add experimental builder directory and configuration for composable_kernel (#3043) John Shumway 2025-10-20 07:54:09 -07:00
  • 5891e2ae79 [CK_BUILDER] Add experimental builder directory and configuration for composable_kernel (#3043) John Shumway 2025-10-20 07:54:09 -07:00
  • f18b79f328 [CK_BUILDER] Add experimental builder directory and configuration for composable_kernel (#3043) John Shumway 2025-10-20 07:54:09 -07:00
  • bbfe4501fa Add complete multi-dimensional stride support via descriptors Mohsen Saffari 2025-10-20 14:43:32 +00:00
  • d1505786f8 Add support of softmax in hstu attention Qianfeng Zhang 2025-10-16 16:02:45 +00:00
  • a874839dc2 Add template parameter to gemm_0 MakeCBlockTile() for the need of defining PcompBlockTileType Qianfeng Zhang 2025-10-16 15:41:26 +00:00
  • 1a8f2f21fb Move scaling by attn_scale to inside the main-loop Qianfeng Zhang 2025-10-15 09:24:44 +00:00
  • bbda3f6f1c Let IsTokenPairInsideMask() return bool type Qianfeng Zhang 2025-10-15 08:50:48 +00:00
  • fdb89d3e2f Add instances to consider for adding softmax support Qianfeng Zhang 2025-10-14 09:40:23 +00:00
  • 97e7527eb1 fixing compile errors... Juuso Korhonen 2025-10-20 14:03:15 +00:00
  • d20c869d3d Adapt all grouped conv bwd weight vanilla Xdl instances to 16x16. MRepeat doubled for all but 12 of them (some static assert failure). Also added custom reduced profiler target for building grouped conv bwd weight vanilla only profiler. Verified with gtest test. kiefer 2025-10-20 13:34:56 +00:00
  • 9fda954253 Compiling fix Tianxing Wu 2025-10-20 13:16:19 +00:00