Commit Graph

  • eb72f85509 [CK tests] Extend conv GPU reference (#3539) Johannes Graner 2026-01-27 09:49:42 +01:00
  • 8049ce9be4 [CK tests] Extend conv GPU reference (#3539) Johannes Graner 2026-01-27 09:49:42 +01:00
  • c190d8d61f [CK tests] Extend conv GPU reference (#3539) Johannes Graner 2026-01-27 09:49:42 +01:00
  • 90534e7cad Update moe_flatmm_kernel.hpp moe_xcd_remap Tianxing Wu 2026-01-27 10:49:11 +02:00
  • 1c73533107 Merge branch 'develop' into moe_xcd_remap Tianxing Wu 2026-01-27 10:41:34 +02:00
  • 6361810fb5 Fix merge conflict. Ville Pietilä 2026-01-27 03:08:22 -05:00
  • ed0eadb8c8 [CK TILE] disable tests on gfx950 Cong Ma 2026-01-26 23:11:14 -05:00
  • 6ba8427812 [CK TILE] set proper K_Warp_Tile for quant gemm tests Cong Ma 2026-01-26 22:51:02 -05:00
  • 7a427d0041 Replace O(N) recursive sequence_map_inverse with O(1) pack expansion Max Podkorytov 2026-01-22 19:50:01 +00:00
  • e4378d5857 fix compile error KenSCLin 2026-01-27 03:48:45 +00:00
  • aa3b7866b0 Merge commit 'cc75948d1c7f732d102c8e31dc007a2ccd07761f' into develop assistant-librarian[bot] 2026-01-27 01:42:04 +00:00
  • bca1c78412 test: add labels to ck_tile tests to build and run in single command AviralGoelAMD 2026-01-26 23:22:18 +00:00
  • a7b7eae2a1 [CK_BUILDER] conv bwd weight testing (#3618) Robin Voetter 2026-01-26 23:50:15 +01:00
  • aa35585fc7 [CK_BUILDER] conv bwd weight testing (#3618) Robin Voetter 2026-01-26 23:50:15 +01:00
  • cc75948d1c [CK_BUILDER] conv bwd weight testing (#3618) Robin Voetter 2026-01-26 23:50:15 +01:00
  • 65be39bfd1 Merge commit '8654c0628f83261d3dd64cfb4ec80e9dd2b29fa5' into develop assistant-librarian[bot] 2026-01-26 22:14:16 +00:00
  • daa6cae5f5 Finished testing failure types. Removed testing code. Andrew Clark 2026-01-23 14:29:13 -05:00
  • 2b90408685 Finished testing failure types. Removed testing code. Andrew Clark 2026-01-23 14:29:13 -05:00
  • 8654c0628f Finished testing failure types. Removed testing code. Andrew Clark 2026-01-23 14:29:13 -05:00
  • 858387034e Removed working tests. Validating remaining tests. Andrew Clark 2026-01-23 14:27:18 -05:00
  • c2cfd318da Removed working tests. Validating remaining tests. Andrew Clark 2026-01-23 14:27:18 -05:00
  • 402f21d0a6 Removed working tests. Validating remaining tests. Andrew Clark 2026-01-23 14:27:18 -05:00
  • b555c06b8d Removed working tests. Validating remaining tests. Andrew Clark 2026-01-23 14:25:21 -05:00
  • ec4a6be1ed Removed working tests. Validating remaining tests. Andrew Clark 2026-01-23 14:25:21 -05:00
  • 1397924c21 Removed working tests. Validating remaining tests. Andrew Clark 2026-01-23 14:25:21 -05:00
  • 8136cf5c72 Testing a pattern to support all text variations Andrew Clark 2026-01-23 14:21:06 -05:00
  • 22abf1b0d9 Testing a pattern to support all text variations Andrew Clark 2026-01-23 14:21:06 -05:00
  • 6c596b9553 Testing a pattern to support all text variations Andrew Clark 2026-01-23 14:21:06 -05:00
  • b87853431e Removing working cases to test other failure examples Andrew Clark 2026-01-23 13:56:47 -05:00
  • c3c318c340 Removing working cases to test other failure examples Andrew Clark 2026-01-23 13:56:47 -05:00
  • 58e1d03244 Removing working cases to test other failure examples Andrew Clark 2026-01-23 13:56:47 -05:00
  • 8c91ce81bf Adding forcing failure to test notifications Andrew Clark 2026-01-23 13:02:25 -05:00
  • c490f137b3 Adding forcing failure to test notifications Andrew Clark 2026-01-23 13:02:25 -05:00
  • 95768d1b22 Adding forcing failure to test notifications Andrew Clark 2026-01-23 13:02:25 -05:00
  • 18e95f26aa Fixing Jenkinsfile too large error Andrew Clark 2026-01-23 12:47:27 -05:00
  • 9e7b7fe59a Fixing Jenkinsfile too large error Andrew Clark 2026-01-23 12:47:27 -05:00
  • 786965b95e Fixing Jenkinsfile too large error Andrew Clark 2026-01-23 12:47:27 -05:00
  • e2f587ad01 Updating failure patterns to be more reliable and adding tests to verify they are caught in the logs Andrew Clark 2026-01-23 12:28:59 -05:00
  • 76b261ef00 Updating failure patterns to be more reliable and adding tests to verify they are caught in the logs Andrew Clark 2026-01-23 12:28:59 -05:00
  • 42a731b791 Updating failure patterns to be more reliable and adding tests to verify they are caught in the logs Andrew Clark 2026-01-23 12:28:59 -05:00
  • 89d4d517b5 [CK TIEL] Fix type error Cong Ma 2026-01-26 16:55:49 -05:00
  • 7565ca5310 Add python analysis scripts for Clang's time trace (#3644) John Shumway 2026-01-26 13:44:36 -08:00
  • 1ea438b909 Add python analysis scripts for Clang's time trace (#3644) John Shumway 2026-01-26 13:44:36 -08:00
  • a213ce676b Add python analysis scripts for Clang's time trace (#3644) John Shumway 2026-01-26 13:44:36 -08:00
  • 70bd8f8143 [CK TIEL] Fix a const type qualifier error Cong Ma 2026-01-26 16:26:49 -05:00
  • 63dde06485 Merge commit '2e49b6b2f79d5ab0fe2fca79812affd44de94db7' into develop assistant-librarian[bot] 2026-01-26 21:13:59 +00:00
  • f2c7d07666 Padding support for wave transfer (#3537) Enrico Degregori 2026-01-26 21:57:09 +01:00
  • 6e95bf8179 Padding support for wave transfer (#3537) Enrico Degregori 2026-01-26 21:57:09 +01:00
  • 2e49b6b2f7 Padding support for wave transfer (#3537) Enrico Degregori 2026-01-26 21:57:09 +01:00
  • 1298575103 Merge commit 'bd5fec81afdb6df7f4637128a3ba86dbfd6bcca1' into develop assistant-librarian[bot] 2026-01-26 20:15:40 +00:00
  • ab65977dae Removing [4,64,16] warp tile from Tile Engine (#3643) Thrupti Raj Lakshmana Gowda 2026-01-26 13:56:06 -06:00
  • 7636e64d55 Removing [4,64,16] warp tile from Tile Engine (#3643) Thrupti Raj Lakshmana Gowda 2026-01-26 13:56:06 -06:00
  • bd5fec81af Removing [4,64,16] warp tile from Tile Engine (#3643) Thrupti Raj Lakshmana Gowda 2026-01-26 13:56:06 -06:00
  • 181c075794 fix instance Jakub Piasecki 2026-01-26 19:40:59 +00:00
  • b980f0febe ck: add CK_USE_GFX950 macro (#3636) yinglu 2026-01-27 03:38:45 +08:00
  • 1b369a210f ck: add CK_USE_GFX950 macro (#3636) yinglu 2026-01-27 03:38:45 +08:00
  • 8942a19d5e ck: add CK_USE_GFX950 macro (#3636) yinglu 2026-01-27 03:38:45 +08:00
  • a26adffadf feat: Add Interwave scheduler for aquant memory pipeline (#3540) Aviral Goel 2026-01-27 00:57:42 +05:30
  • 2a17f6e537 feat: Add Interwave scheduler for aquant memory pipeline (#3540) Aviral Goel 2026-01-27 00:57:42 +05:30
  • b8751e505d feat: Add Interwave scheduler for aquant memory pipeline (#3540) Aviral Goel 2026-01-27 00:57:42 +05:30
  • 39405747ab Merge commit '3900e1e7ceacfa32cb8d1522260ed30befd4dae3' into develop assistant-librarian[bot] 2026-01-26 19:16:22 +00:00
  • 0983dea2be Solve the CTAD regression & add up the Shell file for the docker management in testing (#3634) Thomas Ning 2026-01-26 10:29:28 -08:00
  • 8f972ba2d2 Solve the CTAD regression & add up the Shell file for the docker management in testing (#3634) Thomas Ning 2026-01-26 10:29:28 -08:00
  • 3900e1e7ce Solve the CTAD regression & add up the Shell file for the docker management in testing (#3634) Thomas Ning 2026-01-26 10:29:28 -08:00
  • e01c295551 Re enable f8 x bf8 tests on compv3 and compv4 (#3605) SamiAario-AMD 2026-01-26 20:23:26 +02:00
  • b07fbbc33a Re enable f8 x bf8 tests on compv3 and compv4 (#3605) SamiAario-AMD 2026-01-26 20:23:26 +02:00
  • 834642202c Re enable f8 x bf8 tests on compv3 and compv4 (#3605) SamiAario-AMD 2026-01-26 20:23:26 +02:00
  • 4de19a1601 Remove code duplications in batched gemm (multi D) gemm (multi D) wmma (#3617) chris-tsiaousis-hpc 2026-01-26 19:20:30 +01:00
  • ea30b43692 Remove code duplications in batched gemm (multi D) gemm (multi D) wmma (#3617) chris-tsiaousis-hpc 2026-01-26 19:20:30 +01:00
  • 917f35553a Remove code duplications in batched gemm (multi D) gemm (multi D) wmma (#3617) chris-tsiaousis-hpc 2026-01-26 19:20:30 +01:00
  • 06fb853279 Merge commit 'de59c0716c631edfa4742e4309ee11d4379ef6e8' into develop assistant-librarian[bot] 2026-01-26 18:17:51 +00:00
  • bebf8c3720 Optimize sequence metaprogramming utilities to reduce template instantiation depth (#3585) Max Podkorytov 2026-01-26 10:08:55 -08:00
  • 8ae166963e Optimize sequence metaprogramming utilities to reduce template instantiation depth (#3585) Max Podkorytov 2026-01-26 10:08:55 -08:00
  • de59c0716c Optimize sequence metaprogramming utilities to reduce template instantiation depth (#3585) Max Podkorytov 2026-01-26 10:08:55 -08:00
  • 70ffbc577b add dockerfile for manylinux (#3651) Illia Silin 2026-01-26 09:23:19 -08:00
  • 9c3cc098c4 add dockerfile for manylinux (#3651) Illia Silin 2026-01-26 09:23:19 -08:00
  • 054c437dec add dockerfile for manylinux (#3651) Illia Silin 2026-01-26 09:23:19 -08:00
  • 6db9cf9f68 Fix Ding, Yi 2026-01-26 17:12:11 +00:00
  • 70c7fcda43 WIP: debugging... Sami Remes 2026-01-26 11:33:45 -05:00
  • f93e3ac6c9 fix precommit KenSCLin 2026-01-26 16:31:43 +00:00
  • 99b38bd260 Add 5D tensor layout for LDS to global mem copy. Ville Pietilä 2026-01-26 10:09:46 -05:00
  • 6ba16f81b6 Increase the number of reported errors. Ville Pietilä 2026-01-26 10:08:24 -05:00
  • 7f9961b564 Add odd specialization. Ville Pietilä 2026-01-26 10:08:10 -05:00
  • a1a2f05b3c Merge remote-tracking branch 'origin/barkocot/direct-load-conv-wrw' into features/grouped-conv-perf-uplift Ville Pietilä 2026-01-26 09:06:07 -05:00
  • b51d7aec7e Link direct load instances Graner, Johannes 2026-01-26 08:38:47 -05:00
  • a9db3100b8 Implement group merging for bwd_weight and add instances Graner, Johannes 2026-01-22 09:17:14 -05:00
  • 749e83f2fd Update to use BottomRight-Diagonal masking when seqlen_kv is bigger than seqlen_q Qianfeng Zhang 2026-01-25 14:51:53 +00:00
  • 378d4d8430 more directloads fixes Jakub Piasecki 2026-01-26 12:57:54 +00:00
  • fefc7d716a add 8 warp KenSCLin 2026-01-26 12:24:08 +00:00
  • 25cb0283ed Update gridwise_gemm_xdl_cshuffle_conv_v3.hpp Bartłomiej Kocot 2026-01-26 13:16:15 +01:00
  • 6fda7ab9bb fix for directloads on non last dim Jakub Piasecki 2026-01-26 11:38:23 +00:00
  • 20b056ded0 Grouped Conv Bwd Weight Direct Load Bartlomiej Kocot 2026-01-26 10:21:39 +00:00
  • a42af89384 Addition of code for XCD remapping arai/ck_tile/streamk_xcd_remap Astha 2026-01-20 04:01:48 -05:00
  • 0a43ee37b4 Simplify and fix template parameters Matti Eskelinen 2026-01-26 09:25:07 +00:00
  • e9485e0ecb Move reference to host/reference Matti Eskelinen 2026-01-26 09:18:52 +00:00
  • 04f8f3ed5d Add CPU reference computation Matti Eskelinen 2026-01-26 09:14:16 +00:00
  • 391e06e070 tmp save between remotes Jakub Piasecki 2026-01-25 19:53:52 +00:00
  • 8968bceee4 Merge commit '7ac379428408337a231a86f8a8b7353b5b45aa2d' into develop assistant-librarian[bot] 2026-01-25 13:22:29 +00:00
  • e587756695 Add new instances for merging multiple fwd conv groups into a single GEMM batch. Allow group merging for C > 1 when vector load/store size is 1 for the output tensor. (#3639) Ville Pietilä 2026-01-25 14:42:23 +02:00
  • a622665d78 Add new instances for merging multiple fwd conv groups into a single GEMM batch. Allow group merging for C > 1 when vector load/store size is 1 for the output tensor. (#3639) Ville Pietilä 2026-01-25 14:42:23 +02:00