Commit Graph

  • a56307b07e Add good instance Graner, Johannes 2026-01-30 10:45:31 -05:00
  • 227cb33a93 Testing mh/testing MHYang 2026-01-30 23:35:39 +08:00
  • 486eac508f Merge branch 'develop' into vpietila/add-fwd-conv-v3-instances-for-unit-group-size vpietila/add-fwd-conv-v3-instances-for-unit-group-size Ville Pietilä 2026-01-30 17:32:26 +02:00
  • fbd1fa14bd Fix clang-format. Ville Pietilä 2026-01-30 10:27:30 -05:00
  • 4b7ec1bacb Filter 3x3, pad1, stride1, dilation 1 - specialization. Ville Pietilä 2026-01-30 10:26:48 -05:00
  • 88f8417f22 Move gfx950 specific instances under other double rate instances. Ville Pietilä 2026-01-30 10:17:21 -05:00
  • e4cf7ed2e1 Merge branch 'develop' into aviralgoel/test_labels Aviral Goel 2026-01-30 19:04:37 +04:00
  • 40f2b51a4a Merge remote-tracking branch 'origin/develop' into vpietila/improved-fwd-merged-conv-group-instances Ville Pietilä 2026-01-30 09:45:57 -05:00
  • 7a62f4cdc3 Revert "Fix redundant cast in model sensitive rmsnorm (#3681)" mh/revert-rmsnorm-align-HF MHYang 2026-01-30 22:26:49 +08:00
  • abe790cab8 Merge branch 'develop' into jograner/bwd-weight-group-merge-type-string jograner/bwd-weight-group-merge-type-string Johannes Graner 2026-01-30 14:36:33 +01:00
  • c14fa9aab3 feat: add test for abquant preshuffle b + bquant Erwin Terpstra 2026-01-30 13:34:03 +00:00
  • 87dda06e77 fix: disable PermuteN for ABQuant PreshuffleB in example Erwin Terpstra 2026-01-30 13:33:34 +00:00
  • b6213e6943 feat: bquant preshuffle for preshuffleb abquant pipeline Erwin Terpstra 2026-01-30 12:26:08 +00:00
  • 3a18b2f29e Merge commit '6a6177a246d6c81932fbb1061ad6a62e90b788a1' into develop assistant-librarian[bot] 2026-01-30 12:22:40 +00:00
  • 09d443a7ad [CK_Tile] Support for a4w4 (fp4) in block scale gemm AB quant (#3603) Erwin Terpstra 2026-01-30 12:40:50 +01:00
  • a5824466fb [CK_Tile] Support for a4w4 (fp4) in block scale gemm AB quant (#3603) Erwin Terpstra 2026-01-30 12:40:50 +01:00
  • 6a6177a246 [CK_Tile] Support for a4w4 (fp4) in block scale gemm AB quant (#3603) Erwin Terpstra 2026-01-30 12:40:50 +01:00
  • 8f404a9985 Add missing applicability check to v3 fwd convs. Ville Pietilä 2026-01-30 05:05:38 -05:00
  • cb60ec9d2a Add new grouped conv instance to the gfx950 branch. Ville Pietilä 2026-01-30 04:13:36 -05:00
  • 7ffa682bd3 Add missing applicability check to v3 fwd convs. Ville Pietilä 2026-01-30 05:05:38 -05:00
  • c360e0cbc4 Add scripts for benchmark sparsity 0.9 cases with mattn256 & full256 Qianfeng Zhang 2026-01-30 09:58:12 +00:00
  • 966706bb21 Add new grouped conv instance to the gfx950 branch. Ville Pietilä 2026-01-30 04:13:36 -05:00
  • 2cc0e3d019 override base policys vector size with static_assert 4/12/16 bytes Sami Remes 2026-01-30 03:55:56 -05:00
  • f5732e875b WIP Matti Eskelinen 2026-01-30 03:47:34 -05:00
  • 409a7d8edb Merge remote-tracking branch 'origin/develop' into samremes/ck_tile_mx_gemm Sami Remes 2026-01-30 03:30:11 -05:00
  • d4a93660fb Merge commit '565fea26455b8e4f78ac57ed64d6bd12e701a9c9' into develop assistant-librarian[bot] 2026-01-30 08:21:17 +00:00
  • e7483043e6 fix undefined behaviour in softmax kernel (#3683) Zoltán Lakatos 2026-01-30 08:22:54 +01:00
  • 1b1dd65b83 fix undefined behaviour in softmax kernel (#3683) Zoltán Lakatos 2026-01-30 08:22:54 +01:00
  • 565fea2645 fix undefined behaviour in softmax kernel (#3683) Zoltán Lakatos 2026-01-30 08:22:54 +01:00
  • d5bbd4c3f1 [WIP] initial implementation jograner/bwd-data-group-merge Graner, Johannes 2026-01-30 01:58:18 -05:00
  • f1fcb64b37 WIP: demonstrate_single_stage Andriy Roshchenko 2026-01-30 06:53:10 +00:00
  • 0cda2801c4 Merge commit 'f3d8b7210fb99827bcb1d1bdaf9672b3ae8fb209' into develop assistant-librarian[bot] 2026-01-30 04:42:40 +00:00
  • e38029e946 Extend CK fmha_batch_prefill kernel coverage to head_dim=256 (#3328) vivienfanghuagood 2026-01-30 11:18:20 +08:00
  • dbd9809fa0 Extend CK fmha_batch_prefill kernel coverage to head_dim=256 (#3328) vivienfanghuagood 2026-01-30 11:18:20 +08:00
  • f3d8b7210f Extend CK fmha_batch_prefill kernel coverage to head_dim=256 (#3328) vivienfanghuagood 2026-01-30 11:18:20 +08:00
  • 0014d8d9df Grouped Convolution Backward Data Direct Load Bartlomiej Kocot 2026-01-30 00:05:26 +00:00
  • a59a3944e8 Merge commit '6ff073784321a55ee276f38af195532d8d812670' into develop assistant-librarian[bot] 2026-01-30 03:12:55 +00:00
  • a3914b9099 Adapt parser to monorepo users/randyspauldingamd/dep_parser_monorepo Randy J. Spaulding 2026-01-30 02:58:41 +00:00
  • 24cf4cf9a8 Fix redundant cast in model sensitive rmsnorm (#3681) MHYangAMD 2026-01-30 10:52:19 +08:00
  • c4998f09bd Fix redundant cast in model sensitive rmsnorm (#3681) MHYangAMD 2026-01-30 10:52:19 +08:00
  • 6ff0737843 Fix redundant cast in model sensitive rmsnorm (#3681) MHYangAMD 2026-01-30 10:52:19 +08:00
  • 05846dfc99 Merge commit '83b61553548019eb9aa77a5efc72258a48dee42a' into develop assistant-librarian[bot] 2026-01-30 02:36:23 +00:00
  • e01e32ee52 Add ck-rocprof: GPU profiling tool for rocprof-compute (#3627) Max Podkorytov 2026-01-29 17:20:22 -08:00
  • aedfc7249e Add ck-rocprof: GPU profiling tool for rocprof-compute (#3627) Max Podkorytov 2026-01-29 17:20:22 -08:00
  • 83b6155354 Add ck-rocprof: GPU profiling tool for rocprof-compute (#3627) Max Podkorytov 2026-01-29 17:20:22 -08:00
  • 33acc874f2 Merge branch 'develop' into ck_tile/gemm_blockscale_eightwarps Yi DING 2026-01-30 09:12:38 +08:00
  • 7945ef7d78 Merge commit '05ef93a69d8ccaf63f84b43b3dcb9b585f428051' into develop assistant-librarian[bot] 2026-01-30 00:44:38 +00:00
  • 4883171ee4 Add a flag to build CK libs required for HipTensor. (#3684) Illia Silin 2026-01-29 16:12:49 -08:00
  • ddf8e8ee58 Add a flag to build CK libs required for HipTensor. (#3684) Illia Silin 2026-01-29 16:12:49 -08:00
  • 05ef93a69d Add a flag to build CK libs required for HipTensor. (#3684) Illia Silin 2026-01-29 16:12:49 -08:00
  • 6fb9e0e7e7 Merge branch 'develop' into cshuffle-fix Thomas Ning 2026-01-29 15:06:01 -08:00
  • bce6ec11cd Optimize tensor descriptor functor template instantiation Max Podkorytov 2026-01-29 14:26:43 -07:00
  • 0c166da64e [CK TILE] Rename trivial_array to static_array congma13/ck_tile/inclusive_scan_sequence Cong Ma 2026-01-29 15:48:21 -05:00
  • add410083f [CK] refactoring according to the review feedback congma13/ck_tile/inclusive_scan_sequence_ck Cong Ma 2026-01-23 19:36:50 -05:00
  • 6f27bf1d7f added draft of conv bwd data direct loads jakpiase/grouped_conv_dl Jakub Piasecki 2026-01-29 20:11:26 +00:00
  • 1460f04e77 tmp save jakpiase/tmp_branch Jakub Piasecki 2026-01-29 18:43:20 +00:00
  • 8a70a0d08a Merge commit 'f16d9100e42a978261f76319c66a7995e5f6d555' into develop assistant-librarian[bot] 2026-01-29 18:34:46 +00:00
  • a07d76a460 Multi AB support for wave transfer (#3578) Enrico Degregori 2026-01-29 19:29:40 +01:00
  • 968e54f90f Multi AB support for wave transfer (#3578) Enrico Degregori 2026-01-29 19:29:40 +01:00
  • f16d9100e4 Multi AB support for wave transfer (#3578) Enrico Degregori 2026-01-29 19:29:40 +01:00
  • 1998be34bf [Conv] Enable bwd weight splitk autodeduction with cap (#3656) Johannes Graner 2026-01-29 18:40:28 +01:00
  • e8bf2e1418 [Conv] Enable bwd weight splitk autodeduction with cap (#3656) Johannes Graner 2026-01-29 18:40:28 +01:00
  • fabac7e2c3 [Conv] Enable bwd weight splitk autodeduction with cap (#3656) Johannes Graner 2026-01-29 18:40:28 +01:00
  • fc7964462e Conv specializations. Ville Pietilä 2026-01-29 11:50:50 -05:00
  • 538ec4d896 Merge branch 'develop' into aviralgoel/test_labels Aviral Goel 2026-01-29 20:41:52 +04:00
  • 0369750978 Fix clang-format. Ville Pietilä 2026-01-29 10:23:17 -05:00
  • 84daa4d305 Merge commit 'e33f15709f8c1e05f5056edc7295276e121dc253' into develop assistant-librarian[bot] 2026-01-29 15:20:56 +00:00
  • 8a23393aa2 Fix clang-format pragmas. Ville Pietilä 2026-01-29 10:19:27 -05:00
  • 23453093e0 ck-builder: fix test related to changed xdl bwd cshuf v3 interface (#3677) Robin Voetter 2026-01-29 16:15:56 +01:00
  • 4008976a26 ck-builder: fix test related to changed xdl bwd cshuf v3 interface (#3677) Robin Voetter 2026-01-29 16:15:56 +01:00
  • e33f15709f ck-builder: fix test related to changed xdl bwd cshuf v3 interface (#3677) Robin Voetter 2026-01-29 16:15:56 +01:00
  • f8aec67c14 Conditionally compile new instances only for gfx950. Ville Pietilä 2026-01-29 10:04:29 -05:00
  • 165805cee7 tmp save subhajitdchow 2026-01-29 14:59:18 +00:00
  • 5301efc8e4 Add NumGroupsToMerge to BwdWeight type string Graner, Johannes 2026-01-29 09:02:40 -05:00
  • bacba218d7 [CK_BUILDER] Fix missing template arguments in ConvBwdWeightV3 test poyenc/fix-missing-template-argument PoYen, Chen 2026-01-29 08:05:36 -06:00
  • 61f7dff009 Add NumGroupsToMerge to BwdWeight type string Graner, Johannes 2026-01-29 09:02:40 -05:00
  • 0fba67a7e7 Add fwd conv group merging to the v3 conv instances. Ville Pietilä 2026-01-29 07:29:24 -05:00
  • da895cdd88 Tile on the C dimensions to support large C Damien Lejeune 2026-01-29 08:00:34 -05:00
  • adb8f67b4f feat: add new optimized tutorial kernels AviralGoelAMD 2026-01-29 12:45:18 +00:00
  • ab12d435a5 Add fwd conv group merging to the v3 conv instances. Ville Pietilä 2026-01-29 07:29:24 -05:00
  • 10b597f11a Merge branch 'develop' into aviralgoel/test_labels Aviral Goel 2026-01-29 14:56:57 +04:00
  • d7c4775455 Improve logging. Ville Pietilä 2026-01-29 05:50:09 -05:00
  • c83b1c482b Remove hard coded lds size Damien Lejeune 2026-01-29 05:24:19 -05:00
  • d92e8010f1 Fix async acc Ding, Yi 2026-01-29 10:17:36 +00:00
  • 393355d4fe Merge develop: Resolve unit_validation.cpp formatting conflict JH-Leon-KIM-AMD 2026-01-29 09:51:06 +00:00
  • 4c3ed25d90 correcting conflict JH-Leon-KIM-AMD 2026-01-29 09:48:48 +00:00
  • fbae2aba20 fix issue letaoqin/batch_prefile_block_scale_ptkvb ltqin 2026-01-29 09:46:19 +00:00
  • 5122eef3f2 Merge branch 'develop' into jeonghyun/ckb-almiopen-522-descriptor-init JH-Leon-KIM-AMD 2026-01-29 09:31:41 +00:00
  • bfd9d2382a Add PreshuffleB Support for 8wave Pipeline Ding, Yi 2026-01-29 09:29:50 +00:00
  • 2aaeac29b1 Merge branch 'develop' into vpietila/add-fwd-conv-v3-instances-for-unit-group-size Ville Pietilä 2026-01-29 10:38:35 +02:00
  • a9d85dfe16 add qscaleenum and shift value ltqin 2026-01-29 08:17:31 +00:00
  • 9e338c5b47 add q_pertensor_kv_blockscale to host code ltqin 2026-01-29 07:58:15 +00:00
  • f62478bd98 fix compile error KenSCLin 2026-01-29 07:20:36 +00:00
  • 9a657edaaf kernel and pipeline add q per-tensor kv block scale ltqin 2026-01-29 07:16:47 +00:00
  • e95b111c3a Merge branch 'develop' into ck_tile/gemm_blockscale_eightwarps origin/ck_tile/gemm_blockscale_eightwarps kensclin 2026-01-29 14:55:29 +08:00
  • 24baa5245f add up the padding algorithm ThomasNing 2026-01-28 23:15:21 -06:00
  • 961da131ed Merge commit '9b168082b7aa19bcf50fd9991baf10a0c77d105b' into develop assistant-librarian[bot] 2026-01-29 04:42:46 +00:00
  • 68b475ad92 [CK_Tile] Adding support for preshuffleQuant in AB quant Block Scale Gemm (#3629) Khushbu Agarwal 2026-01-28 19:45:09 -08:00
  • 9fc9cc598f [CK_Tile] Adding support for preshuffleQuant in AB quant Block Scale Gemm (#3629) Khushbu Agarwal 2026-01-28 19:45:09 -08:00
  • 9b168082b7 [CK_Tile] Adding support for preshuffleQuant in AB quant Block Scale Gemm (#3629) Khushbu Agarwal 2026-01-28 19:45:09 -08:00