Commit Graph

  • 4c98535456 fix compilation errors on RHEL8 and SLES15 (#2967) Illia Silin 2025-10-03 07:08:49 -07:00
  • 8e705b2cde Add instructions for building the builder example. Ville Pietilä 2025-10-03 13:44:14 +00:00
  • cde8844ee0 Create invoker through the CK builder. Ville Pietilä 2025-10-03 13:02:21 +00:00
  • 95f86d723f Add missing concepts and PODs for DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3 kernel. Ville Pietilä 2025-10-03 12:29:41 +00:00
  • 22fb5c5d75 Remove spurious ck_tile changes that were presumably introduced somewhere in the repeated merging from develop. kiefer 2025-10-03 11:07:21 +00:00
  • 8c8063134a Merge remote-tracking branch 'origin/develop' into 65-grouped-conv-fwd-wmma kiefer 2025-10-03 12:03:30 +00:00
  • e94d76de53 Create re-usable assets. Ville Pietilä 2025-10-03 10:43:28 +00:00
  • 0570ea48c0 Use CK builder to create kernel instance. Ville Pietilä 2025-10-03 10:30:41 +00:00
  • 1d14d83e59 Optionally allow num_k_loop <= PrefetchStages in gridwise CheckValidity. Use this for grouped conv fwd but not in general. kiefer 2025-10-03 09:59:17 +00:00
  • 9f39a171aa Skeleton for builder example. Ville Pietilä 2025-10-03 09:02:24 +00:00
  • c90d0cd84e Merge branch 'develop' into jakpiase/gemm_pipeline_mem_skip_lds Damien Lejeune 2025-10-03 08:46:31 +00:00
  • 36a321f12e Fix some copy-pasta typos. Ville Pietilä 2025-10-03 07:10:48 +00:00
  • fd3fb6af92 Rename bwd test file. Ville Pietilä 2025-10-03 07:07:31 +00:00
  • ff1ad9acaf Add build instructions to builder Readme. Ville Pietilä 2025-10-03 07:03:32 +00:00
  • e495f51791 Avoid the un-needed calls of v_page_block_navigator.move_tile_window fmha_pipeline_nwarp_sshuffle_improve Qianfeng Zhang 2025-10-03 06:09:22 +00:00
  • 7f601541a1 Modify the ck_tile gemm config ThomasNing 2025-10-03 05:05:18 +00:00
  • ca9ed65e57 Remove k_dram_block_window since k_dram_window and seqlen_k_curr_offset is enough Qianfeng Zhang 2025-10-03 03:41:52 +00:00
  • 3c5adc9938 Tiny codes movement in qr_ks_vs_nwarp_sshuffle pipeline Qianfeng Zhang 2025-10-03 03:21:07 +00:00
  • e12dc895d0 Content Modification of CK Tile Example ThomasNing 2025-10-03 00:58:03 +00:00
  • 592b4b19ae Updates based on PR feedback 8 Vidyasagar 2025-10-02 12:19:51 -07:00
  • f9c9dfa364 Merge commit '0a30c3063068dcefea2291309fbe269812d06956' into develop assistant-librarian[bot] 2025-10-02 19:11:49 +00:00
  • e8205916c9 Updates based on PR feedback 7 Vidyasagar 2025-10-02 12:10:46 -07:00
  • e7ab5057c9 Updates based on PR feedback 6 Vidyasagar 2025-10-02 12:03:51 -07:00
  • ae4fa7bfe8 Merge branch 'develop' into jakpiase/gemm_pipeline_mem_skip_lds Damien Lejeune 2025-10-02 19:00:36 +00:00
  • 20dc0c4037 fix build on legacy systems without cpp20 compiler (#2958) Max Podkorytov 2025-10-02 11:54:45 -07:00
  • fbf8619599 fix build on legacy systems without cpp20 compiler (#2958) Max Podkorytov 2025-10-02 11:54:45 -07:00
  • 0a30c30630 fix build on legacy systems without cpp20 compiler (#2958) Max Podkorytov 2025-10-02 11:54:45 -07:00
  • 4899d0fad7 Updates based on PR feedback 5 Vidyasagar 2025-10-02 11:50:18 -07:00
  • 7785774fcb Updates based on PR feedback 4 Vidyasagar 2025-10-02 11:33:29 -07:00
  • 384dddddfe Updates based on PR feedback 3 Vidyasagar 2025-10-02 11:28:02 -07:00
  • 79d37b4d0b Updates based on PR feedback 2 Vidyasagar 2025-10-02 11:16:25 -07:00
  • 1f65936567 add the check of granularity for atomic add (#2959) Thomas Ning 2025-10-02 11:15:24 -07:00
  • 0959c6582a add the check of granularity for atomic add (#2959) Thomas Ning 2025-10-02 11:15:24 -07:00
  • cadafde722 add the check of granularity for atomic add (#2959) Thomas Ning 2025-10-02 11:15:24 -07:00
  • bdbab2394b Merge commit '6fc28ab4934d3668bf4ec96db1e082cf26b11384' into develop assistant-librarian[bot] 2025-10-02 18:14:39 +00:00
  • 24c0e0d01e [CK TILE GEMM] Support Aquant GEMM with transposeC and preshuffle (#2897) Cong Ma 2025-10-02 12:13:51 -06:00
  • 1aa5b318cb [CK TILE GEMM] Support Aquant GEMM with transposeC and preshuffle (#2897) Cong Ma 2025-10-02 12:13:51 -06:00
  • 6fc28ab493 [CK TILE GEMM] Support Aquant GEMM with transposeC and preshuffle (#2897) Cong Ma 2025-10-02 12:13:51 -06:00
  • 6fc878cbeb Updates based on PR feedback 1 Vidyasagar 2025-10-02 11:02:36 -07:00
  • 88b0bdbca2 GH-2368 Adding a basic glossary AviralGoelAMD 2025-07-16 12:53:41 +00:00
  • 15d7637f89 GH-2368 Adding a basic glossary Vidyasagar Ananthan 2025-06-18 16:29:08 -04:00
  • 0c174fd196 Use seqlen_k_curr_offset to replace k_origin Qianfeng Zhang 2025-10-02 15:53:01 +00:00
  • 9510171377 WIP: Put back the generic tensor descriptors for convolutions. Ville Pietilä 2025-10-02 15:06:30 +00:00
  • a67ea9db58 Merge commit 'a4ab33f539ac9d7209c6274958dc0285eacf3e78' into develop assistant-librarian[bot] 2025-10-02 14:11:58 +00:00
  • 13d666f707 Fix building test_fmha_bwd_fp32 on SLES15 (#2962) Anton Gorenko 2025-10-02 20:09:49 +06:00
  • d1b8e66374 Fix building test_fmha_bwd_fp32 on SLES15 (#2962) Anton Gorenko 2025-10-02 20:09:49 +06:00
  • a4ab33f539 Fix building test_fmha_bwd_fp32 on SLES15 (#2962) Anton Gorenko 2025-10-02 20:09:49 +06:00
  • 89cd7b482b Fix file formatting vec_stores_c_col_v3 Aleksander Dudek 2025-10-02 05:28:18 -05:00
  • c3d5da4457 Post merge fix to vanilla test kiefer 2025-10-02 08:57:18 +00:00
  • 3b0979a4d4 Merge remote-tracking branch 'origin/develop' into 65-grouped-conv-fwd-wmma kiefer 2025-10-02 08:25:52 +00:00
  • c0fdd5f7b9 Merge branch 'develop' into tests_for_batched_grouped_gemm Aleksander Dudek 2025-10-02 01:08:58 -05:00
  • 87f8319753 Merge branch 'develop' into vec_stores_c_col_v3 Aleksander Dudek 2025-10-02 01:03:13 -05:00
  • c9ba502141 [CK_TILE] Vector stores for C Column Layout Aleksander Dudek 2025-10-02 01:02:05 -05:00
  • afec40a56e Addition of streamk fp8 example for CK Tile Astha Rai 2025-09-19 21:07:50 +00:00
  • 59de7d5848 Update README.md to align with the Algorithm concept. John Shumway 2025-10-02 00:51:13 +00:00
  • ba74f76cbc Merge commit 'a7da3c68b979bd46c315da09208271d26f5e2900' into develop assistant-librarian[bot] 2025-10-01 23:11:22 +00:00
  • 0f7644177c Add a new gemm pipeline based on ComputeV4 which utilizes async copy API (#2949) Max Podkorytov 2025-10-01 15:38:07 -07:00
  • de41af5d2e Add a new gemm pipeline based on ComputeV4 which utilizes async copy API (#2949) Max Podkorytov 2025-10-01 15:38:07 -07:00
  • a7da3c68b9 Add a new gemm pipeline based on ComputeV4 which utilizes async copy API (#2949) Max Podkorytov 2025-10-01 15:38:07 -07:00
  • 3712abe256 tests: add unit tests for grouped_gemm_multi_d persistent kernels (#2941) Aviral Goel 2025-10-01 18:22:46 -04:00
  • db83ff21e8 tests: add unit tests for grouped_gemm_multi_d persistent kernels (#2941) Aviral Goel 2025-10-01 18:22:46 -04:00
  • f2d367262f tests: add unit tests for grouped_gemm_multi_d persistent kernels (#2941) Aviral Goel 2025-10-01 18:22:46 -04:00
  • 8f5016eee4 Merge branch 'develop' into wjx/preshuffle_format Illia Silin 2025-10-01 15:12:27 -07:00
  • 24ac4febf4 Merge commit 'a76c7b10281cf46486e6563ffeb3ee9cb4a20348' into develop assistant-librarian[bot] 2025-10-01 22:11:21 +00:00
  • 4de3217b4f fix clang format illsilin_amdeng 2025-10-01 15:10:33 -07:00
  • d6dc70c711 tweak version (#2954) Max Podkorytov 2025-10-01 15:00:41 -07:00
  • 1eafaa321f tweak version (#2954) Max Podkorytov 2025-10-01 15:00:41 -07:00
  • a76c7b1028 tweak version (#2954) Max Podkorytov 2025-10-01 15:00:41 -07:00
  • 8948ac317c updated mxfp4 moe gemm2 config (#2330) Mingtao Gu 2025-10-02 03:32:55 +08:00
  • 190ad2ccee updated mxfp4 moe gemm2 config (#2330) Mingtao Gu 2025-10-02 03:32:55 +08:00
  • 667db96ce1 Add placeholder README.md file John Shumway 2025-10-01 14:24:39 +00:00
  • 6020f81481 Use [kM0, kQKHeaddim] as q_tile size Qianfeng Zhang 2025-10-01 14:18:34 +00:00
  • 58353e999c Merge commit '7cb1f30cfb6045bccbbd484c5e8e4715e2ebc2f3' into develop assistant-librarian[bot] 2025-10-01 14:12:07 +00:00
  • fecad5a998 Remove default constructor to fix c++17 build issue (#2953) Rostyslav Geyyer 2025-10-01 09:02:21 -05:00
  • 69d1edb5c9 Remove default constructor to fix c++17 build issue (#2953) Rostyslav Geyyer 2025-10-01 09:02:21 -05:00
  • 7cb1f30cfb Remove default constructor to fix c++17 build issue (#2953) Rostyslav Geyyer 2025-10-01 09:02:21 -05:00
  • bd6ebe0c62 Tiny re-arrangement in nwarp_sshuffle pipeline codes Qianfeng Zhang 2025-10-01 14:02:11 +00:00
  • c07cec8809 Clean up + add TODO comment about A-column-wise/B-row-wise layout Damien Lejeune 2025-10-01 12:44:32 +00:00
  • bd00884f39 Re-enable sharding for wmma cshufflev3 instances kiefer 2025-10-01 08:45:32 +00:00
  • d0f59a5ebe Make sure all strides in ComputePtrOffset are at least value initialized to avoid undefined strides. Not convinced this struct is properly initialized in other code / future code. kiefer 2025-10-01 07:59:42 +00:00
  • 28706e6173 Properly use the splitN offsets for D tensors in the gridwise Run() function. Was necessary to pass the bias_clamp_large_cases test. kiefer 2025-09-30 15:08:37 +00:00
  • 1b9bf99e3b Fixup comments and ignored kernel arg name kiefer 2025-09-30 09:17:12 +00:00
  • 8cd5e3fe74 Extend regular instance lists. kiefer 2025-09-29 07:59:04 +00:00
  • 2bb627f02b Extend "mem" instance lists. kiefer 2025-09-26 13:48:46 +00:00
  • 5cc80ca90f Extend "comp" instance lists, including "2x" and "part2" instances. 2x instances disabled for now since they do not compile. kiefer 2025-09-26 12:33:18 +00:00
  • ee5225fb44 Extend merged groups instance lists, including adaptations of xdl "2x" instances. kiefer 2025-09-25 14:04:01 +00:00
  • f26e00e676 Extend scaleadd_ab instance lists kiefer 2025-09-25 09:27:55 +00:00
  • 238218b356 Do not build or run Xdl operations with Wmma backend for now. Will be reverted before upstreaming. kiefer 2025-09-24 10:58:46 +00:00
  • a126f5c39b Small post-merge fixup, everything seems to work. kiefer 2025-09-24 10:57:48 +00:00
  • f10bae28c9 Merge commit 'ef43078788a91b21284e697ce7707cc7d1797000' into develop assistant-librarian[bot] 2025-09-30 22:12:47 +00:00
  • 18f4a0728b Use __builtin_amdgcn_readfirstlane for buffer resource in fused_moe (#2893) Sami Remes 2025-10-01 01:12:30 +03:00
  • 93ba707be4 Use __builtin_amdgcn_readfirstlane for buffer resource in fused_moe (#2893) Sami Remes 2025-10-01 01:12:30 +03:00
  • ef43078788 Use __builtin_amdgcn_readfirstlane for buffer resource in fused_moe (#2893) Sami Remes 2025-10-01 01:12:30 +03:00
  • 840a0008f9 Merge remote-tracking branch 'origin/develop' into cderb/prefetch_tuning_250930 cderb/prefetch_tuning_250930 Christopher Erb 2025-09-30 15:28:54 -05:00
  • d9febb413f Fix and document the inlineDiff function. John Shumway 2025-09-30 17:11:39 +00:00
  • cc86608690 Add StringEqWithDiff matcher. John Shumway 2025-09-30 16:30:15 +00:00
  • 3afa42529c initial work on generator for single instance Philip Maybank 2025-09-30 17:20:08 +01:00
  • ee9718a427 Merge commit 'b60af5bde965a2bb007bb582f7836b43ca647b81' into develop assistant-librarian[bot] 2025-09-30 16:14:10 +00:00
  • e17b20625e [CK_TILE]enhance elementwise test (#2683) joyeamd 2025-09-30 23:29:37 +08:00
  • 7dcf623fcb [CK_TILE]enhance elementwise test (#2683) joyeamd 2025-09-30 23:29:37 +08:00