Commit Graph

  • 712cbfb304 fix fmha fwd kernel name (#2880) ltqin 2025-09-25 11:00:10 +08:00
  • 24a8daf662 fix fmha fwd kernel name (#2880) ltqin 2025-09-25 11:00:10 +08:00
  • ab22f91a7c fix fmha fwd kernel name (#2880) ltqin 2025-09-25 11:00:10 +08:00
  • 58b3560182 Merge commit 'df97a286d5486de76bcd2bd7c634b11287cd12ca' into develop assistant-librarian[bot] 2025-09-25 01:39:57 +00:00
  • 9cb95d4bc2 Conv:TF32: add more instances - 1 (#2867) yinglu 2025-09-25 09:27:18 +08:00
  • c5fdba5a96 Conv:TF32: add more instances - 1 (#2867) yinglu 2025-09-25 09:27:18 +08:00
  • df97a286d5 Conv:TF32: add more instances - 1 (#2867) yinglu 2025-09-25 09:27:18 +08:00
  • 9df3f6f886 N dimension parallelism code drop kylasa_mdim_functional_working Sudhir Kylasa 2025-09-24 20:59:10 +00:00
  • 3ef0545001 Merge commit 'f076f207ceb3d8199ddc8219a2859b38a63d3c5e' into develop assistant-librarian[bot] 2025-09-24 20:12:53 +00:00
  • 91401464fd Merge branch 'develop' into kylasa_mdim_pingpong kylasa_mdim_pingpong kylasa 2025-09-24 12:41:22 -07:00
  • e338ee5004 [CK] Fix misc issues in CK examples (#2890) linqunAMD 2025-09-25 02:28:20 +08:00
  • 0c45597a4e [CK] Fix misc issues in CK examples (#2890) linqunAMD 2025-09-25 02:28:20 +08:00
  • f076f207ce [CK] Fix misc issues in CK examples (#2890) linqunAMD 2025-09-25 02:28:20 +08:00
  • 3554a19610 F16 tests working great Aleksander Dudek 2025-09-24 13:10:05 -05:00
  • bef885dc89 Merge commit '8fe3838c65ab4c290423ff0e952e882c19e2c60d' into develop assistant-librarian[bot] 2025-09-24 17:12:28 +00:00
  • c143f0305c Upgrade to ROCm7.0.1 compiler. (#2909) Illia Silin 2025-09-24 10:00:53 -07:00
  • 7e537fd72f Upgrade to ROCm7.0.1 compiler. (#2909) Illia Silin 2025-09-24 10:00:53 -07:00
  • 8fe3838c65 Upgrade to ROCm7.0.1 compiler. (#2909) therock-7.9.0 release/therock-7.9 afagaj/rocm-rel-7.1 Illia Silin 2025-09-24 10:00:53 -07:00
  • a76baedd7d [Jenkins] Remove 'Jenkins - ' prefix dup-status-checks John Robbins 2025-09-24 10:19:31 -06:00
  • 625a78b17b WIP: LDS to global mem transfer using CK tile tensor descriptor and tile distribution encoding. Ville Pietilä 2025-09-24 15:08:01 +00:00
  • 7280df1bc3 Add one more unit test for tensor view. Ville Pietilä 2025-09-24 12:10:26 +00:00
  • 95324c306e Merge commit 'fe0a47a011c2adcb54dfc94a3029feb7b9980deb' into develop assistant-librarian[bot] 2025-09-24 09:13:05 +00:00
  • 8596ce24ca fixes Bartlomiej Kocot 2025-09-24 09:07:42 +00:00
  • 443f2e41fd [CK_TILE] FMHA BWD Add D96 Instances (#2916) Yi DING 2025-09-24 17:04:23 +08:00
  • 02db6094b9 [CK_TILE] FMHA BWD Add D96 Instances (#2916) Yi DING 2025-09-24 17:04:23 +08:00
  • fe0a47a011 [CK_TILE] FMHA BWD Add D96 Instances (#2916) Yi DING 2025-09-24 17:04:23 +08:00
  • 73fb5a026a Initial unit tests for tensor descriptor. Ville Pietilä 2025-09-24 08:31:42 +00:00
  • beb87960ad [CK Tile] Implement Invoker pattern for remaining grouped convolution examples (#2894) Johannes Graner 2025-09-24 10:22:38 +02:00
  • 408b3945c3 [CK Tile] Implement Invoker pattern for remaining grouped convolution examples (#2894) Johannes Graner 2025-09-24 10:22:38 +02:00
  • 15fff74503 [CK Tile] Implement Invoker pattern for remaining grouped convolution examples (#2894) Johannes Graner 2025-09-24 10:22:38 +02:00
  • dff23bcae1 Merge commit '68056847887d7479a6055db6579739f555348c69' into develop assistant-librarian[bot] 2025-09-24 08:14:46 +00:00
  • d5b5e4ef95 add fmha dtype fp32 (#2914) Jingwei Liao 2025-09-24 15:28:39 +08:00
  • e868ffa390 add fmha dtype fp32 (#2914) Jingwei Liao 2025-09-24 15:28:39 +08:00
  • 6805684788 add fmha dtype fp32 (#2914) Jingwei Liao 2025-09-24 15:28:39 +08:00
  • 167e5ab3b5 Merge commit 'dcd33a6ecc30e18cc8491ed03926ab5ac8b6f1c3' into develop assistant-librarian[bot] 2025-09-24 06:15:34 +00:00
  • c5a3d4c765 [CK_TILE] Fix cshuffle epilogue issue with IsLoadableTile (#2903) Sami Remes 2025-09-24 09:08:18 +03:00
  • aac547782b [CK_TILE] Fix cshuffle epilogue issue with IsLoadableTile (#2903) Sami Remes 2025-09-24 09:08:18 +03:00
  • dcd33a6ecc [CK_TILE] Fix cshuffle epilogue issue with IsLoadableTile (#2903) Sami Remes 2025-09-24 09:08:18 +03:00
  • bdea637a15 Fix the gfx950 numerical errors (#2911) Thomas Ning 2025-09-23 22:54:52 -07:00
  • 8a563fc79d Fix the gfx950 numerical errors (#2911) Thomas Ning 2025-09-23 22:54:52 -07:00
  • b159841a06 Fix the gfx950 numerical errors (#2911) Thomas Ning 2025-09-23 22:54:52 -07:00
  • 734f022f60 limit the test case number for uncommon hdim aska-0096 2025-09-24 03:00:44 +00:00
  • e37a555fa8 enable more tests aska-0096 2025-09-24 02:47:08 +00:00
  • de89ea144c enable testing for all hdim supported in fmha_fwd aska-0096 2025-09-24 02:30:30 +00:00
  • a55a7e37ec Merge commit 'f161b5b738781c71bd5f2c191561b81f679ba9ed' into develop assistant-librarian[bot] 2025-09-23 23:11:18 +00:00
  • 0bdc7670eb Grouped Conv Bwd Data index calculation optimizations Bartlomiej Kocot 2025-09-23 21:50:42 +00:00
  • 651a5dd0b9 Revert "[CK-Tile] Add the API to load SGPR (#2878)" (#2904) asleepzzz 2025-09-24 05:33:51 +08:00
  • 5cc40c160f Revert "[CK-Tile] Add the API to load SGPR (#2878)" (#2904) asleepzzz 2025-09-24 05:33:51 +08:00
  • f161b5b738 Revert "[CK-Tile] Add the API to load SGPR (#2878)" (#2904) asleepzzz 2025-09-24 05:33:51 +08:00
  • c39d5ca2c5 Merge commit '959df2a15563155329f1d77b2151c3744ff2d749' into develop assistant-librarian[bot] 2025-09-23 17:11:10 +00:00
  • 173427d877 [FMHA FWD] gfx950 Accuracy enhancement & bug fix (#2900) Haocong WANG 2025-09-24 00:59:41 +08:00
  • add2107be0 [FMHA FWD] gfx950 Accuracy enhancement & bug fix (#2900) Haocong WANG 2025-09-24 00:59:41 +08:00
  • 959df2a155 [FMHA FWD] gfx950 Accuracy enhancement & bug fix (#2900) Haocong WANG 2025-09-24 00:59:41 +08:00
  • bacffa5b90 Add color to inlineDiff test util. John Shumway 2025-09-23 15:38:27 +00:00
  • 20ed980e30 tmp barkocot/tmp Bartlomiej Kocot 2025-09-23 14:46:08 +00:00
  • 7eedf242f1 Merge commit '7b16782d7cbf05be6d03d5c001081fad8df97919' into develop assistant-librarian[bot] 2025-09-23 13:18:39 +00:00
  • e28e95529f [CK_TILE] Fix fmha bwd (#2865) Haocong WANG 2025-09-23 19:59:27 +08:00
  • 0eede5af24 [CK_TILE] Fix fmha bwd (#2865) Haocong WANG 2025-09-23 19:59:27 +08:00
  • 7b16782d7c [CK_TILE] Fix fmha bwd (#2865) Haocong WANG 2025-09-23 19:59:27 +08:00
  • 8048d6ff73 Fix build. Ville Pietilä 2025-09-23 11:17:08 +00:00
  • e6f6c4a6a3 Working baseline for depthwise covolution with merged conv groups. Ville Pietilä 2025-09-23 11:14:10 +00:00
  • c6f8e22ebb Fix mozga-amd/fix_transpose_matrix Mateusz Ozga 2025-09-23 11:12:01 +00:00
  • 91f1e79cb2 Fix bug mozga-amd/fix_bug_transpose Mateusz Ozga 2025-09-23 10:41:57 +00:00
  • 67a6757638 Merge remote-tracking branch 'origin/develop' into 65-grouped-conv-fwd-wmma kiefer 2025-09-23 10:18:33 +00:00
  • 9c79de1b1e add qr_nwarp_sshuffle to generate code ck_tile/fmha_nwarp_example ltqin 2025-09-23 10:13:15 +00:00
  • a1c9274f98 Merge commit '2cbbf5dcb3bf315b9486a2c677ffcd6aa72b5298' into develop assistant-librarian[bot] 2025-09-23 09:13:08 +00:00
  • e2a0df59ea find the root cause error in did not enable the transpose in gfx950 correctly ThomasNing 2025-09-23 03:36:04 -05:00
  • e3702467d5 [CK-Tile] Add the API to load SGPR (#2878) Thomas Ning 2025-09-23 01:23:56 -07:00
  • fb5e953a05 [CK-Tile] Add the API to load SGPR (#2878) Thomas Ning 2025-09-23 01:23:56 -07:00
  • 2cbbf5dcb3 [CK-Tile] Add the API to load SGPR (#2878) Thomas Ning 2025-09-23 01:23:56 -07:00
  • 01a5cf1111 Merge commit 'b6e899438631118ff962a6be12cabc9930366267' into develop assistant-librarian[bot] 2025-09-23 07:13:16 +00:00
  • 65e0a25885 [CK_TILE] FMHA FWD bug fix (#2888) Haocong WANG 2025-09-23 15:00:46 +08:00
  • d85ca87d97 [CK_TILE] FMHA FWD bug fix (#2888) Haocong WANG 2025-09-23 15:00:46 +08:00
  • b6e8994386 [CK_TILE] FMHA FWD bug fix (#2888) Haocong WANG 2025-09-23 15:00:46 +08:00
  • 8d978054ff gfx942 return wjx/fix_moe_gfx942 root 2025-09-23 06:55:57 +00:00
  • 43fa6ccaf7 generate async pipelin code for fp8 ltqin 2025-09-23 06:48:42 +00:00
  • 01c6567d4c FMHA BWD Avoid SetZero (#2799) Yi DING 2025-09-23 14:37:48 +08:00
  • bfa145c418 FMHA BWD Avoid SetZero (#2799) Yi DING 2025-09-23 14:37:48 +08:00
  • ad259eeae2 FMHA BWD Avoid SetZero (#2799) Yi DING 2025-09-23 14:37:48 +08:00
  • 2cba567e5b Changing grid size to (1,1,1) still passes Ali Nouri 2025-09-23 03:19:38 +00:00
  • 73039b1d9a Merge commit '3d29bff2f01edb0c6c4ab628e41d8930081dcba6' into develop assistant-librarian[bot] 2025-09-23 02:37:29 +00:00
  • 0b149c8695 Wmma support for multiple ABD GEMM (#2803) Enrico Degregori 2025-09-23 03:49:06 +02:00
  • 12225ce645 Wmma support for multiple ABD GEMM (#2803) Enrico Degregori 2025-09-23 03:49:06 +02:00
  • 3d29bff2f0 Wmma support for multiple ABD GEMM (#2803) Enrico Degregori 2025-09-23 03:49:06 +02:00
  • cbffc459fd Merge branch 'develop' into aviralgoel/grouped_gemm_bug_fix ThomasNing 2025-09-22 19:41:13 -05:00
  • 468f812250 Update grouped_gemm example and pipeline AviralGoelAMD 2025-09-22 19:32:07 -05:00
  • 2f1baf2074 changes for parallel reduction debug streamk_revert Astha Rai 2025-09-22 19:12:16 +00:00
  • e8d68bb29c working version with atomics, reduction still failing Astha Rai 2025-09-22 18:57:35 +00:00
  • 3fa6b9e1a0 Merge branch 'develop' into kylasa_kdim_pr kylasa 2025-09-22 10:20:25 -07:00
  • 29e3112b9b Epilogue fixes. Ville Pietilä 2025-09-22 15:38:02 +00:00
  • d7da3d5089 Offset fixes. Ville Pietilä 2025-09-22 15:37:46 +00:00
  • fc61c5db9e Add support for V3 pipeline (tested). To be able to support num_loop < 3 we need the fixes from the batched gemm gemm MR which was already merged upstream, so just need to rebase or merge. kiefer 2025-09-22 15:36:23 +00:00
  • 158abfa988 Merge commit 'de47ae2fdf00e2bb31567ae6e3b9729262d93832' into develop assistant-librarian[bot] 2025-09-22 15:12:21 +00:00
  • 7122e27fd8 fixup build for #2871 when multiple device targets are used (#2885) Max Podkorytov 2025-09-22 08:02:41 -07:00
  • 799dc99e55 fixup build for #2871 when multiple device targets are used (#2885) Max Podkorytov 2025-09-22 08:02:41 -07:00
  • de47ae2fdf fixup build for #2871 when multiple device targets are used (#2885) Max Podkorytov 2025-09-22 08:02:41 -07:00
  • 0d9e39874f Merge commit '624c46866eb37c9196c0243b6ddfd15273f2253e' into develop assistant-librarian[bot] 2025-09-22 14:12:58 +00:00
  • ccd54f7c92 [CK_TILE] Add conv bwd weight two stage support (#2855) jakpiase 2025-09-22 15:31:25 +02:00
  • 30403d077b [CK_TILE] Add conv bwd weight two stage support (#2855) jakpiase 2025-09-22 15:31:25 +02:00
  • 624c46866e [CK_TILE] Add conv bwd weight two stage support (#2855) jakpiase 2025-09-22 15:31:25 +02:00