Commit Graph

  • 03d3c63948 [CK-Tile] functional support for transposed inputs in compute-bound double-lds-buffer pipeline with async loads from global memory to LDS (#2984) Max Podkorytov 2025-10-10 12:57:50 -07:00
  • 9d060d3e3c [CK-Tile] functional support for transposed inputs in compute-bound double-lds-buffer pipeline with async loads from global memory to LDS (#2984) Max Podkorytov 2025-10-10 12:57:50 -07:00
  • 1f4648dab5 refactor. and fixed q transformation Tianxing Wu 2025-10-10 15:27:36 +00:00
  • 930e1c9f39 [CK_TILE] FMHA Fix synchronization issue in FWD splitkv combine pipeline (#2934) afagaj/rocm-rel-7.1-fix Anton Gorenko 2025-09-27 09:16:10 +06:00
  • 12c812e993 Add signature types to source control. Ville Pietilä 2025-10-10 13:39:24 +00:00
  • 92a02510bd Rename conv algorithm. Ville Pietilä 2025-10-10 13:37:05 +00:00
  • fe7ed96c2a Separate types from concepts. Ville Pietilä 2025-10-10 13:34:57 +00:00
  • df60493219 refactor Tianxing Wu 2025-10-10 13:25:19 +00:00
  • 436eb3a4f8 transform q tensor view Juuso Korhonen 2025-10-10 12:08:16 +00:00
  • 81ac06d29a Rename conv algorithm assets. Ville Pietilä 2025-10-10 11:42:22 +00:00
  • 7d14a73740 Disable missing headers error from GTest. Ville Pietilä 2025-10-10 11:28:13 +00:00
  • 5d4b386e04 Merge remote-tracking branch 'origin/jshumway/convolution-builder' into features/convolution-builder Ville Pietilä 2025-10-10 10:48:42 +00:00
  • cc1c3705e2 Remove debug code. vpietila/merge-multiple-fwd-conv-groups-into-single-gemm-batch Ville Pietilä 2025-10-10 08:57:59 +00:00
  • 7b3f4507ae Working baseline for merging fwd conv groups. Ville Pietilä 2025-10-10 08:52:12 +00:00
  • 96578b8d43 Merge commit 'fada1a3cae190aa6c1568b44eac7d6b2d4e33740' into develop assistant-librarian[bot] 2025-10-10 08:15:20 +00:00
  • c1780cfebe Conv:TF32: add more instances - 2 (#2879) yinglu 2025-10-10 15:28:17 +08:00
  • 7fd5de4ec4 Conv:TF32: add more instances - 2 (#2879) yinglu 2025-10-10 15:28:17 +08:00
  • fada1a3cae Conv:TF32: add more instances - 2 (#2879) yinglu 2025-10-10 15:28:17 +08:00
  • c7f3bcc81e Fix splitK for grouped conv bwd data (#2991) Bartłomiej Kocot 2025-10-10 09:24:21 +02:00
  • feace7e1d0 Fix splitK for grouped conv bwd data (#2991) Bartłomiej Kocot 2025-10-10 09:24:21 +02:00
  • ad7a215aba Fix splitK for grouped conv bwd data (#2991) Bartłomiej Kocot 2025-10-10 09:24:21 +02:00
  • a46088e8fb fix:tf32:fix build fail for all supported targets (#2942) yinglu 2025-09-29 23:04:11 +08:00
  • f6db0f34b6 Merge commit 'b6036bc76a5ce55ef85b7f8578ae81c990f5932d' into develop assistant-librarian[bot] 2025-10-10 04:13:14 +00:00
  • 249105f297 [CK_TILE] FMHA Tests Enhancement (#2945) Yi DING 2025-10-10 11:34:47 +08:00
  • ce9f9ddef6 [CK_TILE] FMHA Tests Enhancement (#2945) Yi DING 2025-10-10 11:34:47 +08:00
  • b6036bc76a [CK_TILE] FMHA Tests Enhancement (#2945) Yi DING 2025-10-10 11:34:47 +08:00
  • 86cf28d164 Incorrect validation for 32x32x8 bf16 GEMM Emily Martins 2025-10-09 22:58:45 +00:00
  • 4a6bad668b Merge branch 'develop' into shuffle_tile_enhance shuffle_tile_enhance Illia Silin 2025-10-09 10:40:37 -07:00
  • c1c15f6645 Merge commit 'fb66b4f5e4b5b178e3eee04189224e139e939c0c' into develop assistant-librarian[bot] 2025-10-09 15:31:27 +00:00
  • c81483b230 [CK_TILE] fix pk_fp4 compilation for non-gfx950 GPUs (#2983) ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟 2025-10-09 17:43:41 +03:00
  • 8134f1d6f4 [CK_TILE] fix pk_fp4 compilation for non-gfx950 GPUs (#2983) ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟 2025-10-09 17:43:41 +03:00
  • fb66b4f5e4 [CK_TILE] fix pk_fp4 compilation for non-gfx950 GPUs (#2983) ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟 2025-10-09 17:43:41 +03:00
  • 8ae2be1027 [CK_TILE] Pooling FWD (Lwpck 3683) (#2956) Yashvardhan Agarwal 2025-10-09 17:13:26 +03:00
  • bc9e9df38f [CK_TILE] Pooling FWD (Lwpck 3683) (#2956) Yashvardhan Agarwal 2025-10-09 17:13:26 +03:00
  • 7b6451b68e [CK_TILE] Pooling FWD (Lwpck 3683) (#2956) Yashvardhan Agarwal 2025-10-09 17:13:26 +03:00
  • 3cdc751c32 Merge branch 'develop' into tests_for_batched_grouped_gemm Aleksander Dudek 2025-10-09 05:15:36 -05:00
  • 31e302e51b Add KBatch support for gemm_ab_scale (#2740) Sami Remes 2025-10-09 07:33:16 +01:00
  • f4662b03ec Add Memory pipeline for AQuant Block Scale GEMM (#2987) Aviral Goel 2025-10-08 20:22:30 -04:00
  • 6de911757b [CI] Enable ccache w/ namespace for external use (#2988) JC 2025-10-08 16:03:22 -07:00
  • 6c4df918b6 CI Skip and Status Checks Fix (#2952) andrew clark 2025-10-08 15:48:08 -06:00
  • db89fb51a3 [CK][Examples] Extending support for rdna3/4 part 3: -example_gemm_xdl_int8 -example_gemm_xdl_fp8 -example_gemm_xdl_fp8_bf8 -example_gemm_xdl_fp16_fp8 -example_gemm_add_add_fastgelu_xdl_int8 -example_grouped_gemm_xdl_int8 -example_grouped_conv_bwd_weight_xdl_bf16 -example_cgemm_xdl_fp32 -example_cgemm_xdl_int8 Michal Kulikowski 2025-10-03 15:30:40 +02:00
  • d4aab28089 [CK][Examples] Extending support for rdna3/4 part 2: -example_batched_gemm_xdl_int8 -example_batched_gemm_xdl_fp8_rowwise_v3 -example_batched_gemm_xdl_fp32 -example_batched_gemm_xdl_bf16 -example_batched_gemm_xdl_bf16_v3 -example_batched_gemm_xdl_fp16 -example_splitk_gemm_bias_e_permute_xdl_fp32 *fixing return value to return 0 as success in above examples. Michal Kulikowski 2025-10-01 16:04:25 +02:00
  • f866af9ff2 [CK TILE GEMM] Refactor the code of transposeC and quantpreshuffle of AQuant Gemm (#2965) Cong Ma 2025-10-08 01:05:38 -06:00
  • 191f179038 unified attention rename Tianxing Wu 2025-10-09 08:47:19 +00:00
  • 64603db299 Merge commit '9d4bfe393276317b5c1f9dda990eb0bd6c1ec3e7' into develop assistant-librarian[bot] 2025-10-09 07:12:56 +00:00
  • e7ef841a68 Add KBatch support for gemm_ab_scale (#2740) Sami Remes 2025-10-09 07:33:16 +01:00
  • 4839132e99 Add KBatch support for gemm_ab_scale (#2740) Sami Remes 2025-10-09 07:33:16 +01:00
  • 9d4bfe3932 Add KBatch support for gemm_ab_scale (#2740) Sami Remes 2025-10-09 07:33:16 +01:00
  • 8620b69b7a Merge commit 'e99356dabce7c391423567297b934fae683e2c66' into develop assistant-librarian[bot] 2025-10-09 00:33:38 +00:00
  • e9ade69185 Add Memory pipeline for AQuant Block Scale GEMM (#2987) Aviral Goel 2025-10-08 20:22:30 -04:00
  • 7146674c11 Add Memory pipeline for AQuant Block Scale GEMM (#2987) Aviral Goel 2025-10-08 20:22:30 -04:00
  • e99356dabc Add Memory pipeline for AQuant Block Scale GEMM (#2987) Aviral Goel 2025-10-08 20:22:30 -04:00
  • 0737a185c2 Merge commit 'e29151b53321697fec4fc028cc91bc976086655c' into develop assistant-librarian[bot] 2025-10-08 23:11:38 +00:00
  • 5683f584a6 [CI] Enable ccache w/ namespace for external use (#2988) JC 2025-10-08 16:03:22 -07:00
  • 99ea905237 [CI] Enable ccache w/ namespace for external use (#2988) JC 2025-10-08 16:03:22 -07:00
  • e29151b533 [CI] Enable ccache w/ namespace for external use (#2988) JC 2025-10-08 16:03:22 -07:00
  • a00a3bb2b5 Merge commit '0a4c45b4d3d4dff423c0777f5883d3067c65da20' into develop assistant-librarian[bot] 2025-10-08 22:11:29 +00:00
  • 433b969e7d CI Skip and Status Checks Fix (#2952) andrew clark 2025-10-08 15:48:08 -06:00
  • f4b9c0c604 CI Skip and Status Checks Fix (#2952) andrew clark 2025-10-08 15:48:08 -06:00
  • 0a4c45b4d3 CI Skip and Status Checks Fix (#2952) andrew clark 2025-10-08 15:48:08 -06:00
  • e5008e5aef Merge commit '2444c448955897918d0f724b2bb49fb630a540e6' into develop assistant-librarian[bot] 2025-10-08 16:14:41 +00:00
  • 9da8a056df [CK][Examples] Extending support for rdna3/4 part 3: -example_gemm_xdl_int8 -example_gemm_xdl_fp8 -example_gemm_xdl_fp8_bf8 -example_gemm_xdl_fp16_fp8 -example_gemm_add_add_fastgelu_xdl_int8 -example_grouped_gemm_xdl_int8 -example_grouped_conv_bwd_weight_xdl_bf16 -example_cgemm_xdl_fp32 -example_cgemm_xdl_int8 Michal Kulikowski 2025-10-03 15:30:40 +02:00
  • 573df0d546 [CK][Examples] Extending support for rdna3/4 part 3: -example_gemm_xdl_int8 -example_gemm_xdl_fp8 -example_gemm_xdl_fp8_bf8 -example_gemm_xdl_fp16_fp8 -example_gemm_add_add_fastgelu_xdl_int8 -example_grouped_gemm_xdl_int8 -example_grouped_conv_bwd_weight_xdl_bf16 -example_cgemm_xdl_fp32 -example_cgemm_xdl_int8 Michal Kulikowski 2025-10-03 15:30:40 +02:00
  • 2444c44895 [CK][Examples] Extending support for rdna3/4 part 3: -example_gemm_xdl_int8 -example_gemm_xdl_fp8 -example_gemm_xdl_fp8_bf8 -example_gemm_xdl_fp16_fp8 -example_gemm_add_add_fastgelu_xdl_int8 -example_grouped_gemm_xdl_int8 -example_grouped_conv_bwd_weight_xdl_bf16 -example_cgemm_xdl_fp32 -example_cgemm_xdl_int8 Michal Kulikowski 2025-10-03 15:30:40 +02:00
  • f85778eab4 [CK][Examples] Extending support for rdna3/4 part 2: -example_batched_gemm_xdl_int8 -example_batched_gemm_xdl_fp8_rowwise_v3 -example_batched_gemm_xdl_fp32 -example_batched_gemm_xdl_bf16 -example_batched_gemm_xdl_bf16_v3 -example_batched_gemm_xdl_fp16 -example_splitk_gemm_bias_e_permute_xdl_fp32 *fixing return value to return 0 as success in above examples. Michal Kulikowski 2025-10-01 16:04:25 +02:00
  • 57d49c435e [CK][Examples] Extending support for rdna3/4 part 2: -example_batched_gemm_xdl_int8 -example_batched_gemm_xdl_fp8_rowwise_v3 -example_batched_gemm_xdl_fp32 -example_batched_gemm_xdl_bf16 -example_batched_gemm_xdl_bf16_v3 -example_batched_gemm_xdl_fp16 -example_splitk_gemm_bias_e_permute_xdl_fp32 *fixing return value to return 0 as success in above examples. Michal Kulikowski 2025-10-01 16:04:25 +02:00
  • 7259b9c4db [CK][Examples] Extending support for rdna3/4 part 2: -example_batched_gemm_xdl_int8 -example_batched_gemm_xdl_fp8_rowwise_v3 -example_batched_gemm_xdl_fp32 -example_batched_gemm_xdl_bf16 -example_batched_gemm_xdl_bf16_v3 -example_batched_gemm_xdl_fp16 -example_splitk_gemm_bias_e_permute_xdl_fp32 *fixing return value to return 0 as success in above examples. Michal Kulikowski 2025-10-01 16:04:25 +02:00
  • d0e5159fea Merge pull request #2990 from spolifroni-amd/spolifroni-amd/cherry-pick-702 docs/7.0.2 spolifroni-amd 2025-10-08 11:07:44 -04:00
  • 86d5709ec3 Improving the contribution page (#2804) spolifroni-amd 2025-09-09 15:24:44 -04:00
  • 11e491ac77 first commit of the glossary (#2702) spolifroni-amd 2025-09-08 13:55:32 -04:00
  • 6951b9cc61 removed the blog posts as as these are broken links (#2732) spolifroni-amd 2025-08-25 14:16:57 -04:00
  • d8235b53d8 Initial pipeline to push number of merged conv groups to convolution fwd kernel. Ville Pietilä 2025-10-08 13:00:51 +00:00
  • 595a9f01c1 Fix passing the number of merged groups. Ville Pietilä 2025-10-08 11:23:27 +00:00
  • 36f75d061a Use tile and vector load template parameters. Ville Pietilä 2025-10-08 11:15:18 +00:00
  • 91465770b0 Merge remote-tracking branch 'origin/develop' into vpietila/merge-multiple-conv-groups-into-single-wg-in-ck-tile Ville Pietilä 2025-10-08 09:13:34 +00:00
  • d2b4802c15 Remove unnecessary includes. Ville Pietilä 2025-10-08 07:59:50 +00:00
  • 71c266900a Fix new unit projects. Ville Pietilä 2025-10-08 07:47:17 +00:00
  • b1d2abf8a4 Merge commit '1d4db30af9be83ca9af3fedb7e98ca24daba4c8d' into develop assistant-librarian[bot] 2025-10-08 07:12:55 +00:00
  • 5912982bda [CK TILE GEMM] Refactor the code of transposeC and quantpreshuffle of AQuant Gemm (#2965) Cong Ma 2025-10-08 01:05:38 -06:00
  • 6650feee3a [CK TILE GEMM] Refactor the code of transposeC and quantpreshuffle of AQuant Gemm (#2965) Cong Ma 2025-10-08 01:05:38 -06:00
  • 1d4db30af9 [CK TILE GEMM] Refactor the code of transposeC and quantpreshuffle of AQuant Gemm (#2965) Cong Ma 2025-10-08 01:05:38 -06:00
  • 61969fe9f5 Merge branch 'develop' into jakpiase/gemm_pipeline_mem_skip_lds jakpiase/gemm_pipeline_mem_skip_lds Damien Lejeune 2025-10-08 06:12:59 +00:00
  • 4247f7f4d4 Clean up description and tree formatter. John Shumway 2025-10-08 01:16:15 +00:00
  • 0f67b3bde8 fix barkocot/bwd-data-instances-opt Bartlomiej Kocot 2025-10-07 22:20:35 +00:00
  • 8fcbc0dcd6 remove added instances Bartlomiej Kocot 2025-10-07 22:14:08 +00:00
  • b10281c034 Merge Bartlomiej Kocot 2025-10-07 22:12:13 +00:00
  • 753c61aa0f Add conv_traits. John Shumway 2025-10-07 21:54:53 +00:00
  • 98ed59f1ca Merge branch 'develop' into tests_for_batched_grouped_gemm Aleksander Dudek 2025-10-07 16:10:07 -05:00
  • d12bd51265 Merge commit 'ae9f29b7d514b0829256a0a3ca9ab4511e7a1e04' into develop assistant-librarian[bot] 2025-10-07 19:11:30 +00:00
  • 192536597e add the sync barrier for persistent kernel (#2977) Thomas Ning 2025-10-07 11:54:04 -07:00
  • 8ca0987705 add the sync barrier for persistent kernel (#2977) Thomas Ning 2025-10-07 11:54:04 -07:00
  • ae9f29b7d5 add the sync barrier for persistent kernel (#2977) Thomas Ning 2025-10-07 11:54:04 -07:00
  • 364c0c3983 Add skip lds as standalone example config + fix parametrized skip lds tests semantic Damien Lejeune 2025-10-07 17:32:32 +00:00
  • fe5fbcbc64 Extract TreeFormatter from Description class. John Shumway 2025-10-07 15:43:13 +00:00
  • 0c00794e14 Remove obsolete enumeration. Ville Pietilä 2025-10-07 15:28:49 +00:00
  • c62835e91b Fix build after removing obsolete functionality. Ville Pietilä 2025-10-07 15:23:28 +00:00
  • fdfbd1e770 Check that number of convolution groups is multiple of merged groups. Ville Pietilä 2025-10-07 15:05:09 +00:00
  • 438787dbb6 Remove debug prints and obsolete tests. Ville Pietilä 2025-10-07 13:49:18 +00:00
  • d9e9f19ca4 Run clang-formatting. Ville Pietilä 2025-10-07 13:41:54 +00:00
  • 3c50e984c9 Remove unused code. Ville Pietilä 2025-10-07 13:28:32 +00:00