Commit Graph

  • d05825d823 Merge commit 'a575acb245847d96d54c1e6d198748bda3e57952' into develop assistant-librarian[bot] 2026-01-13 02:50:03 +00:00
  • 0a2c5c6262 fix mxfp8-gemm example failure (#3531) ZheWang 2026-01-13 10:26:45 +08:00
  • 91c829504a fix mxfp8-gemm example failure (#3531) ZheWang 2026-01-13 10:26:45 +08:00
  • a575acb245 fix mxfp8-gemm example failure (#3531) ZheWang 2026-01-13 10:26:45 +08:00
  • ee8b4cf26f Test build time improvements. build-time-investigation-tile-gemm Vidyasagar Ananthan 2026-01-12 18:40:06 +00:00
  • acd77f9c2c Merge commit '5aaa0313503305ad697f6614836be87f8e0b281a' into develop assistant-librarian[bot] 2026-01-12 18:17:03 +00:00
  • d4718f5f31 WIP: extract MakeALdsDescriptor() from child to parent class for code readability (#3392) Aviral Goel 2026-01-12 23:21:58 +05:30
  • 8dceee271e WIP: extract MakeALdsDescriptor() from child to parent class for code readability (#3392) Aviral Goel 2026-01-12 23:21:58 +05:30
  • 5aaa031350 WIP: extract MakeALdsDescriptor() from child to parent class for code readability (#3392) Aviral Goel 2026-01-12 23:21:58 +05:30
  • 3096269434 refactor: remove Default scheduler implementation as it not used anymore (#3542) Aviral Goel 2026-01-12 23:21:06 +05:30
  • 23a1768487 refactor: remove Default scheduler implementation as it not used anymore (#3542) Aviral Goel 2026-01-12 23:21:06 +05:30
  • e809861d49 refactor: remove Default scheduler implementation as it not used anymore (#3542) Aviral Goel 2026-01-12 23:21:06 +05:30
  • 4f721aca8a Fix remaining fwd/bwd instances tests. Ville Pietilä 2026-01-12 11:33:46 -05:00
  • 0f2c9134d8 refactor: unify interwave and intrawave pipeline implementation at work-group level AviralGoelAMD 2026-01-12 16:07:28 +00:00
  • ec1a069a60 Use simpler layout for scales. Sami Remes 2026-01-12 11:03:27 -05:00
  • c8fda65534 added tests for wmma conv_tratis Kevin Abraham 2026-01-12 15:39:26 +00:00
  • 46afc66543 Fix fwd/bwd conv factory tests after tile transfer XDL/WMMA concepts refactoring. Ville Pietilä 2026-01-12 09:26:04 -05:00
  • 7e02790293 Remove the C++26 extensions. Ville Pietilä 2026-01-12 08:47:03 -05:00
  • bd27b4f097 Merge commit '18c2ff6019309d991c7f8d4d9c6f643191c28040' into develop assistant-librarian[bot] 2026-01-12 11:13:21 +00:00
  • c89e55681e [CK profiler] Perform verification on GPU when using GPU reference (#3482) Johannes Graner 2026-01-12 12:12:41 +01:00
  • 32e0beb399 [CK profiler] Perform verification on GPU when using GPU reference (#3482) Johannes Graner 2026-01-12 12:12:41 +01:00
  • 18c2ff6019 [CK profiler] Perform verification on GPU when using GPU reference (#3482) Johannes Graner 2026-01-12 12:12:41 +01:00
  • d196ee4a2e Merge commit '20f66c1e6b314a39533cac95b81e08f89645af2a' into develop assistant-librarian[bot] 2026-01-12 09:19:02 +00:00
  • 706a75f6d9 adressed review comments from PR3459 (#3526) kabrahamAMD 2026-01-12 09:47:00 +01:00
  • 529fbdc771 adressed review comments from PR3459 (#3526) kabrahamAMD 2026-01-12 09:47:00 +01:00
  • 20f66c1e6b adressed review comments from PR3459 (#3526) kabrahamAMD 2026-01-12 09:47:00 +01:00
  • feabf9c026 ck-builder: tensor input/output reflection (#3536) Robin Voetter 2026-01-12 09:45:53 +01:00
  • 61e6e155b0 ck-builder: tensor input/output reflection (#3536) Robin Voetter 2026-01-12 09:45:53 +01:00
  • b352a68606 ck-builder: tensor input/output reflection (#3536) Robin Voetter 2026-01-12 09:45:53 +01:00
  • 2fe054eb61 Merge branch 'develop' into vpietila/ckb-bwd-weight-factories Ville Pietilä 2026-01-11 23:29:55 -08:00
  • c3228aaf0d [ck] support remap 32x32 warp tile to 16x16 qlin/remap_xdl Qun Lin 2026-01-12 14:30:09 +08:00
  • 5de7a34a91 Merge commit '32408c8bc05b759ba62c2f97c9b7c3e808e2a6bc' into develop assistant-librarian[bot] 2026-01-12 02:56:46 +00:00
  • 684ebd42da moe fp8 blockscale use nt (#3524) yadaish 2026-01-12 10:48:10 +08:00
  • 981c891757 moe fp8 blockscale use nt (#3524) yadaish 2026-01-12 10:48:10 +08:00
  • 32408c8bc0 moe fp8 blockscale use nt (#3524) yadaish 2026-01-12 10:48:10 +08:00
  • 6db269892c [ck] add gridwise base class for in all xdl kernel (#186) Qun Lin 2026-01-12 10:01:53 +08:00
  • 2240aa51c2 fix: label existing memory pipeline for aquant as intrawave AviralGoelAMD 2026-01-09 15:53:36 +00:00
  • 958c99e538 chore: add descriptive comments about amd intrinsic hardware sync instructions AviralGoelAMD 2026-01-10 21:47:54 +00:00
  • 09aa53610c refactor: remove dead code from gemm universal kernel AviralGoelAMD 2026-01-10 19:20:48 +00:00
  • b06f6f582c refactor: remove Default scheduler implementation as it not used anymore AviralGoelAMD 2026-01-10 18:09:26 +00:00
  • e69370d29d debugging permuteN khuagarw 2026-01-09 19:46:57 +00:00
  • 6bcdc10593 Fix fwd factories after refactoring. Ville Pietilä 2026-01-09 10:48:18 -05:00
  • 63fc27b0b1 Refactor algorithm specialization and GEMM pipeline definitions. Ville Pietilä 2026-01-09 10:05:54 -05:00
  • fcaa812f8c add more parameters to multiple d wmma instance traits Kevin Abraham 2026-01-09 14:26:46 +00:00
  • f74e034ae9 Adapt factories to warp GEMM and transfer parameters refactoring. Ville Pietilä 2026-01-09 09:17:45 -05:00
  • 5d53fd2380 Add isolated test for FMHA dropout: used for debugging numerical errors dlejeune/fmha_fwd_test_all_hdim_dropout_test Damien Lejeune 2026-01-09 06:31:52 -05:00
  • 96c8f16f1e Merge commit '4216d43da86e08efad810671605cdb72a19dc026' into develop assistant-librarian[bot] 2026-01-09 11:13:18 +00:00
  • b2731df5ba added conv traits for conv_fwd_multiple_d_wmma_cshuffle Kevin Abraham 2026-01-09 10:37:37 +00:00
  • 693548d8b2 Dlejeune/ck tile 2d multiple reductions (#3147) damien-lejeune 2026-01-09 11:16:37 +01:00
  • 58d8d793b1 Dlejeune/ck tile 2d multiple reductions (#3147) damien-lejeune 2026-01-09 11:16:37 +01:00
  • 4216d43da8 Dlejeune/ck tile 2d multiple reductions (#3147) damien-lejeune 2026-01-09 11:16:37 +01:00
  • 3f0bac4e7b Fix conv algorithm types after refactoring. Ville Pietilä 2026-01-08 09:18:37 -05:00
  • 0336ac573e clang-format Ville Pietilä 2026-01-08 06:57:34 -05:00
  • 1abe9ab6c9 Merge branch 'vpietila/ckb-bwd-weight-factories' into vpietila/ckb-refactor-warp-gemm-descriptors Ville Pietilä 2026-01-08 06:55:01 -05:00
  • 18c2631c79 Fix test after merge. Ville Pietilä 2026-01-08 06:51:57 -05:00
  • fd8edf9d3f Remove obsolete test file. Ville Pietilä 2026-01-08 05:25:38 -05:00
  • 16b86803ea Clarify builder Readme. Ville Pietilä 2026-01-08 05:22:17 -05:00
  • 6c41727997 Merge remote-tracking branch 'origin/develop' into vpietila/ckb-bwd-weight-factories Ville Pietilä 2026-01-08 05:12:35 -05:00
  • f46a18694e Merge commit 'e3884bbf0512f539a2ce0e1493e41fc19369911d' into develop assistant-librarian[bot] 2026-01-08 09:17:23 +00:00
  • a77b9e56fd [CK_BUILDER] Debug utilities (#3528) Robin Voetter 2026-01-08 10:14:13 +01:00
  • 1a4deaded3 [CK_BUILDER] Debug utilities (#3528) Robin Voetter 2026-01-08 10:14:13 +01:00
  • e3884bbf05 [CK_BUILDER] Debug utilities (#3528) Robin Voetter 2026-01-08 10:14:13 +01:00
  • 3fab0cc57f fixup! Add DstDataType as a template parameter to load_tile_with_elementwise, and use it for type conversion LWPCK-3549-two-stage Sami Aario 2026-01-08 09:13:09 +00:00
  • 96eecd01e2 Merge commit '770a14494e944c803661c89575bf7be70fdbbfdf' into develop assistant-librarian[bot] 2026-01-08 08:16:23 +00:00
  • 82b5464c67 fixup! Add DstDataType as a template parameter to load_tile_with_elementwise, and use it for type conversion Sami Aario 2026-01-05 13:55:41 +00:00
  • d75d38bf05 Add DstDataType as a template parameter to load_tile_with_elementwise, and use it for type conversion Sami Aario 2025-12-15 13:41:17 +00:00
  • 2e798d15e1 Add functionality and tests for fp16 x fp8 and fp8 x fp16 Sami Aario 2025-11-12 15:09:01 +00:00
  • 7fdf8222c2 Add functionality and tests for bf16 x fp8 and fp8 x bf16 Sami Aario 2025-10-09 09:04:13 +00:00
  • 0b0ddf1a38 Add MFMA warp gemm for float, float, float, 32, 32, 16 Sami Aario 2025-11-12 12:38:04 +00:00
  • 7bb452d9b8 Refactor type conversions out of MakeBLdsBlockDescriptor, WIP! Sami Aario 2025-12-18 09:14:11 +00:00
  • fc82ebc174 Introduce DetermineWarpPrecType for determining warp GEMM precision types Sami Aario 2025-10-09 08:07:04 +00:00
  • e62c96f1dd Merge branch 'develop' into LWPCK-3549-cleanups SamiAario-AMD 2026-01-08 09:48:15 +02:00
  • 011d705947 Removing memop from chshuffle (#3530) Thrupti Raj Lakshmana Gowda 2026-01-08 01:34:43 -06:00
  • f8d1442908 Removing memop from chshuffle (#3530) Thrupti Raj Lakshmana Gowda 2026-01-08 01:34:43 -06:00
  • 770a14494e Removing memop from chshuffle (#3530) Thrupti Raj Lakshmana Gowda 2026-01-08 01:34:43 -06:00
  • cba48a5cab Merge commit 'ee2c35b92db5ef4c4703935d203e9612e6b5f573' into develop assistant-librarian[bot] 2026-01-08 07:16:26 +00:00
  • 0a4388d4cc Merge branch 'develop' into LWPCK-3549-cleanups SamiAario-AMD 2026-01-08 09:08:21 +02:00
  • c427b9ba2a [CK] Allow tensors larger than 2GB in grouped conv bwd weight (#3169) Johannes Graner 2026-01-08 08:02:02 +01:00
  • 9d6add54e5 [CK] Allow tensors larger than 2GB in grouped conv bwd weight (#3169) Johannes Graner 2026-01-08 08:02:02 +01:00
  • ee2c35b92d [CK] Allow tensors larger than 2GB in grouped conv bwd weight (#3169) Johannes Graner 2026-01-08 08:02:02 +01:00
  • 5b70f71374 [CK TILE] Fix grouped conv kernels splitk and double lds (#3527) Bartłomiej Kocot 2026-01-08 07:59:38 +01:00
  • 5e0d3e77b9 [CK TILE] Fix grouped conv kernels splitk and double lds (#3527) Bartłomiej Kocot 2026-01-08 07:59:38 +01:00
  • bc497beffb [CK TILE] Fix grouped conv kernels splitk and double lds (#3527) Bartłomiej Kocot 2026-01-08 07:59:38 +01:00
  • f320f2e280 Initial plan copilot/sub-pr-3341 copilot-swe-agent[bot] 2026-01-08 05:05:19 +00:00
  • 0c659dc743 Merge commit 'f449a5faaaf52a2194e82989bdb46b23392e97a3' into develop assistant-librarian[bot] 2026-01-08 01:41:45 +00:00
  • abecfaf3a2 Disable fp32 atomic adds on gfx11 (#3510) Bartłomiej Kocot 2026-01-08 00:32:04 +01:00
  • dcc6ce0e22 Disable fp32 atomic adds on gfx11 (#3510) Bartłomiej Kocot 2026-01-08 00:32:04 +01:00
  • f449a5faaa Disable fp32 atomic adds on gfx11 (#3510) Bartłomiej Kocot 2026-01-08 00:32:04 +01:00
  • b91efe5b07 Merge branch 'develop' into LWPCK-3549-cleanups SamiAario-AMD 2026-01-07 21:44:58 +02:00
  • 2edd077b50 Adjust whitespace with clang-format Sami Aario 2026-01-07 16:19:35 +00:00
  • ca17ac3358 When possible, use the overload of load_tile_transpose that does not require assignment Sami Aario 2026-01-02 15:43:35 +00:00
  • 321611081f Remove an unused overload of load_tile_transpose_with_offset Sami Aario 2026-01-02 15:41:54 +00:00
  • 8fc4030a57 Add an instance of load_tile_transpose that takes a reference to the output tensor as an input Sami Aario 2026-01-02 14:47:32 +00:00
  • 63a455952a No need to specify DstDataType in load_and_convert_tile as WarpTile knows its DataType Sami Aario 2025-12-16 11:32:30 +00:00
  • 3d55a1e682 No need to specify SrcDataType in load_and_convert_tile as WarpWindow knows its DataType Sami Aario 2025-12-16 10:24:23 +00:00
  • 514035e6cf In BQuantGemmPipelineAgBgCrCompV3, always convert BDatatype pk_int4_t to ADataType regardless of BLayout Sami Aario 2026-01-07 14:33:24 +00:00
  • a73a06fb1d Merge commit 'aad4cf098511b3f58c5bd3c32e4534d438f7539c' into develop assistant-librarian[bot] 2026-01-07 19:21:57 +00:00
  • 6eab5bea54 Wmma support for gemm_bias_add_reduce (#3316) Enrico Degregori 2026-01-07 19:27:16 +01:00
  • 5a3fc30228 Wmma support for gemm_bias_add_reduce (#3316) Enrico Degregori 2026-01-07 19:27:16 +01:00
  • aad4cf0985 Wmma support for gemm_bias_add_reduce (#3316) amd-develop Enrico Degregori 2026-01-07 19:27:16 +01:00