Commit Graph

  • c3c8c20144 Merge commit 'b0ea67e37725c26860a3520dc31c1f7a01164db9' into develop assistant-librarian[bot] 2025-12-29 01:43:07 +00:00
  • 13134864cc [CK_TILE] MX FLATMM Fix M Padding (#3489) Yi DING 2025-12-29 09:09:12 +08:00
  • 9045cafc8c [CK_TILE] MX FLATMM Fix M Padding (#3489) Yi DING 2025-12-29 09:09:12 +08:00
  • b0ea67e377 [CK_TILE] MX FLATMM Fix M Padding (#3489) Yi DING 2025-12-29 09:09:12 +08:00
  • dd314aaa48 Merge commit 'a3916a8d16d6e8d676b890ea3f242a180aeef61b' into develop assistant-librarian[bot] 2025-12-27 09:13:12 +00:00
  • 77499511af enable f8 tests (#3488) joyeamd 2025-12-27 16:21:56 +08:00
  • 38a547df56 enable f8 tests (#3488) joyeamd 2025-12-27 16:21:56 +08:00
  • a3916a8d16 enable f8 tests (#3488) joyeamd 2025-12-27 16:21:56 +08:00
  • 1d4d925ba3 Fix in K-LdsBuffer and V-LdsBuffer over-lap checking Qianfeng Zhang 2025-12-27 05:43:11 +00:00
  • d2dadc22a7 Remove un-needed constexpr checking for loading v_tiles in Gemm0 loop Qianfeng Zhang 2025-12-26 15:13:28 +00:00
  • df902c6a06 Tiny fix in using v_tiles[] index Qianfeng Zhang 2025-12-25 15:37:22 +00:00
  • 2d53d67b6d Update the NumPrefetchK and NumPrefetchV in the softmax pipeline on mi350 to achieve better interleaving Qianfeng Zhang 2025-12-25 14:32:44 +00:00
  • ddf0f1c8ed Update the NumPrefetchK and NumPrefetchV in the softmax pipeline on mi300 to achieve better interleaving Qianfeng Zhang 2025-12-25 14:30:57 +00:00
  • c9bf3fde79 Merge commit '7ce532eac7faab5041d472b7dabebf57e09fbaf6' into develop assistant-librarian[bot] 2025-12-25 08:16:26 +00:00
  • 2fe41a5635 [CK_TILE] Align FMHA BWD Reference with Kernel Implementation (#3486) Yi DING 2025-12-25 16:12:36 +08:00
  • d80a3f9c70 [CK_TILE] Align FMHA BWD Reference with Kernel Implementation (#3486) Yi DING 2025-12-25 16:12:36 +08:00
  • 7ce532eac7 [CK_TILE] Align FMHA BWD Reference with Kernel Implementation (#3486) Yi DING 2025-12-25 16:12:36 +08:00
  • 4f1df06484 Merge commit 'e08efa551ff260f0e55c839cfc0e2b64c929eb57' into develop assistant-librarian[bot] 2025-12-25 07:15:36 +00:00
  • eb6a9170bc [CK_TILE] Grouped gemm quant tensor layouts (#3414) Erwin Terpstra 2025-12-25 08:01:23 +01:00
  • bd73699148 [CK_TILE] Grouped gemm quant tensor layouts (#3414) Erwin Terpstra 2025-12-25 08:01:23 +01:00
  • e08efa551f [CK_TILE] Grouped gemm quant tensor layouts (#3414) Erwin Terpstra 2025-12-25 08:01:23 +01:00
  • 199991cf05 Merge commit '14668a56e376550cd68d116aa64302a1df05b56f' into develop assistant-librarian[bot] 2025-12-25 01:42:14 +00:00
  • c553a74747 remove the LLVM_MAIN_REVISION usage (#3487) Illia Silin 2025-12-24 16:49:35 -08:00
  • 21d679acab remove the LLVM_MAIN_REVISION usage (#3487) Illia Silin 2025-12-24 16:49:35 -08:00
  • 14668a56e3 remove the LLVM_MAIN_REVISION usage (#3487) Illia Silin 2025-12-24 16:49:35 -08:00
  • 446db13a0f Merge commit '62a8ec155facd901232977b688d5225d72969709' into develop assistant-librarian[bot] 2025-12-24 19:11:47 +00:00
  • d65cd6d0fa [CK TILE ENGINE] CI configuration with basic cases (#3475) Thrupti Raj Lakshmana Gowda 2025-12-24 12:45:56 -06:00
  • b17fa5656f [CK TILE ENGINE] CI configuration with basic cases (#3475) Thrupti Raj Lakshmana Gowda 2025-12-24 12:45:56 -06:00
  • 62a8ec155f [CK TILE ENGINE] CI configuration with basic cases (#3475) Thrupti Raj Lakshmana Gowda 2025-12-24 12:45:56 -06:00
  • a169d59e06 Merge commit '7f68f3c4fa5bf478313c2147610317b199f9e65b' into develop assistant-librarian[bot] 2025-12-24 17:14:26 +00:00
  • 0eb5d4a93f Enable padding blockscale for abquant (#3453) kensclin 2025-12-25 01:12:40 +08:00
  • b29e16aa67 Enable padding blockscale for abquant (#3453) kensclin 2025-12-25 01:12:40 +08:00
  • 7f68f3c4fa Enable padding blockscale for abquant (#3453) kensclin 2025-12-25 01:12:40 +08:00
  • 48a371dee2 add hack fix_moe_a16w4 zanzhang 2025-12-24 16:23:37 +08:00
  • b8431a023d fix clang gpu marco zanzhang 2025-12-24 15:51:16 +08:00
  • d4f1263072 fix clang format zanzhang 2025-12-24 15:08:24 +08:00
  • 925b245697 forward compatible zanzhang 2025-12-24 14:51:24 +08:00
  • 0b42f814d3 update on rocm7.2 kyle/grouped_gemm_blockwise kyle-256 2025-12-24 03:38:51 +00:00
  • e1039a7eeb Merge commit '1c3151963bd5abd30a5ced62f6859994a45f710e' into develop assistant-librarian[bot] 2025-12-24 02:47:07 +00:00
  • a2402950de [CK_TILE][FMHA] Add FP8 support for batch_prefill kernel (#3425) Po Yen Chen 2025-12-24 10:34:06 +08:00
  • 51b7e7d2d6 [CK_TILE][FMHA] Add FP8 support for batch_prefill kernel (#3425) Po Yen Chen 2025-12-24 10:34:06 +08:00
  • 1c3151963b [CK_TILE][FMHA] Add FP8 support for batch_prefill kernel (#3425) Po Yen Chen 2025-12-24 10:34:06 +08:00
  • 14c3738a43 test on gemm bf16 kyle/c_column_layout_test kyle-256 2025-12-24 01:29:19 +00:00
  • 27c1ae2774 Merge commit 'c0797c167143aa750936c108caa0945640eeefd1' into develop assistant-librarian[bot] 2025-12-23 23:13:10 +00:00
  • 07b16d48e4 [CK_TILE] Minor splitk bugfix for gemms and conv (#3387) jakpiase 2025-12-24 00:10:13 +01:00
  • 0d94859dca [CK_TILE] Minor splitk bugfix for gemms and conv (#3387) jakpiase 2025-12-24 00:10:13 +01:00
  • c0797c1671 [CK_TILE] Minor splitk bugfix for gemms and conv (#3387) jakpiase 2025-12-24 00:10:13 +01:00
  • 77e10c7b08 Concept improvements. Ville Pietilä 2025-12-23 10:27:38 -05:00
  • a1740c614b Refactor handing of GEMM-K batch template parameter in conv bwd weight factory. Ville Pietilä 2025-12-23 10:08:56 -05:00
  • 166fe9db60 Merge commit 'e1381d6a712ce5703cd9bc9e3ec351fa91b1d87d' into develop assistant-librarian[bot] 2025-12-23 11:12:47 +00:00
  • 023a3e658f [CK grouped gemm] Fix grouped gemm two stage HasMainK0BlockLoop (#3466) Johannes Graner 2025-12-23 11:33:09 +01:00
  • 8f9d91fe6c [CK grouped gemm] Fix grouped gemm two stage HasMainK0BlockLoop (#3466) Johannes Graner 2025-12-23 11:33:09 +01:00
  • e1381d6a71 [CK grouped gemm] Fix grouped gemm two stage HasMainK0BlockLoop (#3466) Johannes Graner 2025-12-23 11:33:09 +01:00
  • 3e31171d74 Merge commit '4ce7d4c511c7e98a9ac01580ed1e9112e59061a0' into develop assistant-librarian[bot] 2025-12-23 10:13:44 +00:00
  • 55e92c15c6 async_load pass mxfp6-flatmm ZheWang 2025-12-23 09:45:40 +00:00
  • c618d8bba3 [ck_builder] add utility functions to convolution (#3459) kabrahamAMD 2025-12-23 10:39:49 +01:00
  • 34edd1d99d [ck_builder] add utility functions to convolution (#3459) kabrahamAMD 2025-12-23 10:39:49 +01:00
  • 4ce7d4c511 [ck_builder] add utility functions to convolution (#3459) kabrahamAMD 2025-12-23 10:39:49 +01:00
  • b8269a8c17 Merge commit 'ead81d1b0bba57b86ac28f3e2994dc97279f8eb3' into develop assistant-librarian[bot] 2025-12-23 09:20:57 +00:00
  • 536315849b [CK_TILE] Add splitk support to ck tile conv bwd data (#3353) jakpiase 2025-12-23 10:03:42 +01:00
  • b4626d7093 [CK_TILE] Add splitk support to ck tile conv bwd data (#3353) jakpiase 2025-12-23 10:03:42 +01:00
  • ead81d1b0b [CK_TILE] Add splitk support to ck tile conv bwd data (#3353) jakpiase 2025-12-23 10:03:42 +01:00
  • e64347d747 Merge commit '8b73633e651822d90b66ffd7d174a21891a99611' into develop assistant-librarian[bot] 2025-12-23 07:15:46 +00:00
  • 0c7e829224 fix: handle void return type in TailHandler error path with ROCm6 compiler (clang++) (#3477) Lyu, Xudong 2025-12-23 15:03:18 +08:00
  • 677e3cd174 fix: handle void return type in TailHandler error path with ROCm6 compiler (clang++) (#3477) Lyu, Xudong 2025-12-23 15:03:18 +08:00
  • 8b73633e65 fix: handle void return type in TailHandler error path with ROCm6 compiler (clang++) (#3477) Lyu, Xudong 2025-12-23 15:03:18 +08:00
  • d0bc7ccc31 Merge commit '6864a618f47e5ba8d28ada30e2a59da7d051085d' into develop assistant-librarian[bot] 2025-12-23 06:16:51 +00:00
  • 436322bef4 [CK_TILE] FMHA Ignore BWD Failed Cases in Smoke Test (#3480) Yi DING 2025-12-23 13:28:15 +08:00
  • b0959a72b9 [CK_TILE] FMHA Ignore BWD Failed Cases in Smoke Test (#3480) Yi DING 2025-12-23 13:28:15 +08:00
  • 6864a618f4 [CK_TILE] FMHA Ignore BWD Failed Cases in Smoke Test (#3480) Yi DING 2025-12-23 13:28:15 +08:00
  • 49929c3b7d tmp save kyle-256 2025-12-23 02:20:50 +00:00
  • 9569b291ed Merge commit '2955d77f3cfb3515c6d36d54879ed65b854dafa6' into develop assistant-librarian[bot] 2025-12-22 21:12:09 +00:00
  • f4b22287cd Fix grouped conv fwd wmma porting (#3479) Bartłomiej Kocot 2025-12-22 21:32:48 +01:00
  • 83d15b7bb4 Fix grouped conv fwd wmma porting (#3479) Bartłomiej Kocot 2025-12-22 21:32:48 +01:00
  • 2955d77f3c Fix grouped conv fwd wmma porting (#3479) Bartłomiej Kocot 2025-12-22 21:32:48 +01:00
  • 608266a4ef First functional version of bwd weight conv factory. Ville Pietilä 2025-12-22 11:50:00 -05:00
  • 9d93dd9352 Merge commit 'a8aebb7a8efbd9860487a4bc563706cf7a71f988' into develop assistant-librarian[bot] 2025-12-22 16:14:04 +00:00
  • 96a4a5de37 Factory bug fixes. Ville Pietilä 2025-12-22 11:05:00 -05:00
  • a8e7edd814 Update algorithm signature diagnostics. Ville Pietilä 2025-12-22 10:56:47 -05:00
  • b4aa8dbd18 Post-merge cleanup for WMMA grouped conv fwd (#3468) Wojciech Laskowski 2025-12-22 15:57:45 +01:00
  • d8164b2632 Post-merge cleanup for WMMA grouped conv fwd (#3468) Wojciech Laskowski 2025-12-22 15:57:45 +01:00
  • a8aebb7a8e Post-merge cleanup for WMMA grouped conv fwd (#3468) Wojciech Laskowski 2025-12-22 15:57:45 +01:00
  • 8eb62241fb Remove debug assert. Ville Pietilä 2025-12-22 09:30:43 -05:00
  • dacf82d652 Concept bug fixes. Ville Pietilä 2025-12-22 09:23:47 -05:00
  • 5ee99d83d5 Improve compile time diagnostics. Ville Pietilä 2025-12-22 08:59:46 -05:00
  • 9679d9b141 Improve missing member/wrong type compile-time errors. Ville Pietilä 2025-12-22 08:39:47 -05:00
  • 8d40e6d9fe Small improvements. Ville Pietilä 2025-12-22 08:39:01 -05:00
  • c6798d3673 Improve compile time diagnostics. Ville Pietilä 2025-12-22 08:06:41 -05:00
  • 4d20cc6b4d Use amcro to ensure automatic macthing between concepts are their string representations. Ville Pietilä 2025-12-22 07:36:13 -05:00
  • 4d5b5b7ef3 Improve compile time erros message when no matching factory is found. Ville Pietilä 2025-12-22 07:12:46 -05:00
  • e9baabdfc5 todo: fix asyn_load ZheWang 2025-12-22 11:30:18 +00:00
  • 5d34251e99 355 update kyle-256 2025-12-22 08:49:06 +00:00
  • 7723175746 global load succ ZheWang 2025-12-22 08:32:47 +00:00
  • 56aa5385c6 bug fix kyle-256 2025-12-22 01:51:50 +00:00
  • c08987af05 update kernel kyle-256 2025-12-18 07:49:20 +00:00
  • 3053fb50ef update RCC layout kyle-256 2025-12-18 06:50:22 +00:00
  • 518d02b925 fix code-lint kyle-256 2025-12-18 03:53:41 +00:00
  • 79e5e9b887 fix code lint kyle-256 2025-12-18 02:59:40 +00:00
  • 6fe4ce46fc update example kyle-256 2025-12-18 02:57:01 +00:00
  • f59440916b sync test files with origin/develop kyle-256 2025-12-18 02:29:43 +00:00