Commit Graph

  • 7ac3794284 Add new instances for merging multiple fwd conv groups into a single GEMM batch. Allow group merging for C > 1 when vector load/store size is 1 for the output tensor. (#3639) kabraham/builder_add_constraints Ville Pietilä 2026-01-25 14:42:23 +02:00
  • 6b7e5d324f [CK_TILE] Fix LDS bank conflicts in CShuffleEpilogue with XOR swizzle cshuffle-epilogue-bank-conflict-tests Max Podkorytov 2026-01-23 22:16:10 -06:00
  • 6a21c125a0 Merge commit 'f5c2f09036cdc22dc8944719215dd47003c50a24' into develop assistant-librarian[bot] 2026-01-24 00:38:47 +00:00
  • bc91bb7dd7 [CK TILE] Apply get_k_warp_tile_for_preshuffle_b in examples and tests Cong Ma 2026-01-23 18:41:35 -05:00
  • b6f1e99074 [CK_TILE] Fix alignment in Stream-K workspace buffer (#3625) Emily Martins 2026-01-23 16:14:22 -07:00
  • d0b16dc545 [CK_TILE] Fix alignment in Stream-K workspace buffer (#3625) Emily Martins 2026-01-23 16:14:22 -07:00
  • f5c2f09036 [CK_TILE] Fix alignment in Stream-K workspace buffer (#3625) Emily Martins 2026-01-23 16:14:22 -07:00
  • e2e058bcbc Merge commit 'e1c46ff548cf7bc8b0e1b41a3d559f05317ec2da' into develop assistant-librarian[bot] 2026-01-23 21:13:05 +00:00
  • 3c247733af Remove code duplications in batched gemm wmma (#3580) chris-tsiaousis-hpc 2026-01-23 21:39:03 +01:00
  • e0af034e82 Remove code duplications in batched gemm wmma (#3580) chris-tsiaousis-hpc 2026-01-23 21:39:03 +01:00
  • e1c46ff548 Remove code duplications in batched gemm wmma (#3580) chris-tsiaousis-hpc 2026-01-23 21:39:03 +01:00
  • f728c90546 Merge branch 'develop' into aviralgoel/memory_pipeline_refactor_2 aviralgoel/memory_pipeline_refactor_2 Thomas Ning 2026-01-23 12:38:05 -08:00
  • c244e1454a Merge commit '67f0b74ec6687192fac14c359c57aca237d3cf2a' into develop assistant-librarian[bot] 2026-01-23 17:17:05 +00:00
  • 90b3476006 Revert "Revert " Fp8 block scale quantization for fmha fwd (#3330)" (#3633)" (#3635) ltqin 2026-01-24 01:03:22 +08:00
  • 2bf0f9a3fc Revert "Revert " Fp8 block scale quantization for fmha fwd (#3330)" (#3633)" (#3635) ltqin 2026-01-24 01:03:22 +08:00
  • 67f0b74ec6 Revert "Revert " Fp8 block scale quantization for fmha fwd (#3330)" (#3633)" (#3635) ltqin 2026-01-24 01:03:22 +08:00
  • 1ea1adcc38 WIP: start kernel implementation + test structure Damien Lejeune 2026-01-23 11:48:52 -05:00
  • d2a7c2f041 compiles again using get_y_sliced_thread_data in warpgemm loop Sami Remes 2026-01-23 11:01:43 -05:00
  • 109bfa1558 [CK TILE] Update get_k_warp_tile_for_preshuffle_b for MI350 Cong MA 2026-01-23 11:00:07 -05:00
  • 078593e052 Merge commit '2e08a7e5ab51b020c90008b45c75dc35c2ba426c' into develop assistant-librarian[bot] 2026-01-23 12:19:23 +00:00
  • 0a10cb3582 Implement group merging for bwd_weight and add instances Graner, Johannes 2026-01-22 09:17:14 -05:00
  • f88efddea4 Add new instances for merging multiple fwd conv groups into a single GEMM batch. Allow group merging for C > 1 when vector load/store size is 1 for the output tensor. Ville Pietilä 2026-01-23 06:07:02 -05:00
  • a0445aff5f WMMA grouped conv fwd large tensor bias bnorm clamp (#3595) Wojciech Laskowski 2026-01-23 12:20:00 +01:00
  • a340b8f8fd WMMA grouped conv fwd large tensor bias bnorm clamp (#3595) Wojciech Laskowski 2026-01-23 12:20:00 +01:00
  • 2e08a7e5ab WMMA grouped conv fwd large tensor bias bnorm clamp (#3595) Wojciech Laskowski 2026-01-23 12:20:00 +01:00
  • ee595ee58a WMMA grouped conv fwd large tensor extra flavors (#3582) Wojciech Laskowski 2026-01-23 12:19:51 +01:00
  • 21ab5fbe49 WMMA grouped conv fwd large tensor extra flavors (#3582) Wojciech Laskowski 2026-01-23 12:19:51 +01:00
  • 81ee19bd2c WMMA grouped conv fwd large tensor extra flavors (#3582) Wojciech Laskowski 2026-01-23 12:19:51 +01:00
  • ebc79763fb Merge commit '7b3db1a878181004fc5db7cdb82840623beaadb5' into develop assistant-librarian[bot] 2026-01-23 10:14:37 +00:00
  • 5c6c0f5ad1 Grouped conv fwd direct load vector=2 (#3632) Bartłomiej Kocot 2026-01-23 10:29:59 +01:00
  • 237d22d6ca Grouped conv fwd direct load vector=2 (#3632) Bartłomiej Kocot 2026-01-23 10:29:59 +01:00
  • 7b3db1a878 Grouped conv fwd direct load vector=2 (#3632) Bartłomiej Kocot 2026-01-23 10:29:59 +01:00
  • 797aff8d49 update test codes kyle/gemm_test kyle-256 2026-01-23 08:59:50 +00:00
  • 36253637d1 Merge commit 'de5a1d730dc77d1471ad53ca18dfd7c1474e9873' into develop assistant-librarian[bot] 2026-01-23 07:17:14 +00:00
  • 4ded7e5984 Revert " Fp8 block scale quantization for fmha fwd (#3330)" (#3633) Po Yen Chen 2026-01-23 13:21:19 +08:00
  • c495edb11c Revert " Fp8 block scale quantization for fmha fwd (#3330)" (#3633) Po Yen Chen 2026-01-23 13:21:19 +08:00
  • de5a1d730d Revert " Fp8 block scale quantization for fmha fwd (#3330)" (#3633) Po Yen Chen 2026-01-23 13:21:19 +08:00
  • e0b24cbecc Merge commit 'f30d04654e6bb9b064cf96c6bb4e3fff960efbd8' into develop assistant-librarian[bot] 2026-01-23 00:39:51 +00:00
  • 10e975e7af Add missing check target in reduce tile engine op (#3631) damien-lejeune 2026-01-23 01:06:02 +01:00
  • b0623aebc2 Add missing check target in reduce tile engine op (#3631) damien-lejeune 2026-01-23 01:06:02 +01:00
  • f30d04654e Add missing check target in reduce tile engine op (#3631) damien-lejeune 2026-01-23 01:06:02 +01:00
  • 88d27d2141 Merge commit 'eb2dc8f466cd2978490ccc3ff794d898cad9535a' into develop assistant-librarian[bot] 2026-01-22 23:13:55 +00:00
  • dc83e285e1 [CK TILE] simplify function GetKBPerLoad Cong Ma 2026-01-22 17:49:39 -05:00
  • 74e270abab Speed up glob recurse. (#3626) Vidyasagar Ananthan 2026-01-22 14:44:47 -08:00
  • 4895336494 Speed up glob recurse. (#3626) Vidyasagar Ananthan 2026-01-22 14:44:47 -08:00
  • eb2dc8f466 Speed up glob recurse. (#3626) Vidyasagar Ananthan 2026-01-22 14:44:47 -08:00
  • 1654f02845 Merge commit 'b9bb1db5d932c4c0445994cfc1d37f66a3744659' into develop assistant-librarian[bot] 2026-01-22 21:15:42 +00:00
  • bfa37887fb Addition of Stream-K tests using Tile Engine (#3514) arai713 2026-01-22 12:53:52 -08:00
  • 2bef359e0e Addition of Stream-K tests using Tile Engine (#3514) arai713 2026-01-22 12:53:52 -08:00
  • b9bb1db5d9 Addition of Stream-K tests using Tile Engine (#3514) arai713 2026-01-22 12:53:52 -08:00
  • 84e961765d Merge commit '31a35ecab4e403f63ec4b76f4a709c21172c39de' into develop assistant-librarian[bot] 2026-01-22 18:17:10 +00:00
  • 16e6a2c696 GEMM Blockscale ABQuant Optimization (#3620) kensclin 2026-01-23 01:39:38 +08:00
  • 81771f8b1e GEMM Blockscale ABQuant Optimization (#3620) kensclin 2026-01-23 01:39:38 +08:00
  • 31a35ecab4 GEMM Blockscale ABQuant Optimization (#3620) kensclin 2026-01-23 01:39:38 +08:00
  • 080fa14140 [CK TILE] Add new function get_k_warp_tile_for_preshuffle_b Cong Ma 2026-01-22 12:36:40 -05:00
  • 7ce0127e8f Adding dispatcher architecture (#3300) Vidyasagar Ananthan 2026-01-22 09:34:33 -08:00
  • 8763bbf6cf Adding dispatcher architecture (#3300) Vidyasagar Ananthan 2026-01-22 09:34:33 -08:00
  • 9e049a32a1 Adding dispatcher architecture (#3300) Vidyasagar Ananthan 2026-01-22 09:34:33 -08:00
  • 25786e718f Add merged groups instances. Ville Pietilä 2026-01-22 11:44:15 -05:00
  • 927d121cb8 WIP: project setup Damien Lejeune 2026-01-22 11:36:29 -05:00
  • 5289967b9b Merge commit '44f481a45ca75b234ba60fdc3dc68974b1b86164' into develop assistant-librarian[bot] 2026-01-22 14:20:23 +00:00
  • 9c3ab51d9b [CK TILE] Fix basic gemm pipelines (#3611) Bartłomiej Kocot 2026-01-22 15:11:18 +01:00
  • 6afa598838 [CK TILE] Fix basic gemm pipelines (#3611) Bartłomiej Kocot 2026-01-22 15:11:18 +01:00
  • 44f481a45c [CK TILE] Fix basic gemm pipelines (#3611) Bartłomiej Kocot 2026-01-22 15:11:18 +01:00
  • d693957f6d Revert "Remove irrelevant instances." Ville Pietilä 2026-01-22 08:43:39 -05:00
  • 2827440507 Revert "Disable irrelevant instances." Ville Pietilä 2026-01-22 08:43:11 -05:00
  • ca9a4a41f0 Add merged groups instance. Ville Pietilä 2026-01-22 08:40:15 -05:00
  • 1ab9c33218 tmp save Jakub Piasecki 2026-01-22 11:59:12 +00:00
  • 655d133f58 Change grouped conv fwd example to run group merging instance. Ville Pietilä 2026-01-22 04:59:53 -05:00
  • 872c034358 Merge commit '8daf6ea3026aebe3481792c03026692631059725' into develop assistant-librarian[bot] 2026-01-22 09:19:13 +00:00
  • 37a3f0fb28 initial draft of split out the permute_n_epilogue epilogue_refactor ThomasNing 2026-01-22 03:08:44 -06:00
  • ec0f5c82ca Grouped conv_fwd_bias_bnorm_clamp instances and tests (#3525) ApoorvaKalyani 2026-01-22 09:53:59 +01:00
  • 513b14c5f2 Grouped conv_fwd_bias_bnorm_clamp instances and tests (#3525) ApoorvaKalyani 2026-01-22 09:53:59 +01:00
  • 8daf6ea302 Grouped conv_fwd_bias_bnorm_clamp instances and tests (#3525) ApoorvaKalyani 2026-01-22 09:53:59 +01:00
  • ef44c88506 initial draft gemm_async_load_opt ThomasNing 2026-01-22 02:50:40 -06:00
  • 8f10da355a Merge commit '0b13697a88e77a733d36b14353df1c0a7ae756df' into develop assistant-librarian[bot] 2026-01-22 08:17:11 +00:00
  • f6fac4cea6 [CK_TILE][FMHA]Add new tile size for async (#3623) Linjun-AMD 2026-01-22 16:07:14 +08:00
  • bdea62e96c [CK_TILE][FMHA]Add new tile size for async (#3623) Linjun-AMD 2026-01-22 16:07:14 +08:00
  • 0b13697a88 [CK_TILE][FMHA]Add new tile size for async (#3623) Linjun-AMD 2026-01-22 16:07:14 +08:00
  • 6e98435943 Fix build errors Andriy Roshchenko 2026-01-22 07:22:41 +00:00
  • ece69df994 Improve execution time of batch prefill kernel with vectorized KV cache layout jeff/batch-prefill-vectorized-performance-fix Jeff Huang 2026-01-19 20:56:23 +08:00
  • 4383004042 [FMHA] Enable page size 16 for batch prefill kernel (#3568) Jeff Huang 2026-01-15 22:11:44 +08:00
  • 45e2275fc4 Merge commit 'dd0b4294afcf188f4a9154b7eea19f8e786c9539' into develop assistant-librarian[bot] 2026-01-22 05:20:00 +00:00
  • 14254656f0 Fp8 block scale quantization for fmha fwd (#3330) ltqin 2026-01-22 12:58:26 +08:00
  • 71e8734c32 Fp8 block scale quantization for fmha fwd (#3330) ltqin 2026-01-22 12:58:26 +08:00
  • dd0b4294af Fp8 block scale quantization for fmha fwd (#3330) ltqin 2026-01-22 12:58:26 +08:00
  • 19a156aa0a Apply clang-format with -style=file mpodkory/find-transform-optimization Max Podkorytov 2026-01-22 03:17:44 +00:00
  • eed4270cb7 Apply clang-format with -style=file mpodkory/generate-tuple-optimizations Max Podkorytov 2026-01-22 03:14:00 +00:00
  • 8d32b38fbd fix format wjx/fix_splitk_moe lalala-sh 2026-01-22 03:06:40 +00:00
  • 4b84f2c38b Merge branch 'develop' into wjx/fix_splitk_moe lalala-sh 2026-01-22 11:03:39 +08:00
  • 5f2f85f374 add memsetasync lalala-sh 2026-01-22 03:02:05 +00:00
  • 8d5da5cd59 Add inline documentation for search helper optimizations Max Podkorytov 2026-01-22 02:56:10 +00:00
  • 4d2856612c Merge commit '4c2c18ef486641d1493f3dc272a1e0e079676308' into develop assistant-librarian[bot] 2026-01-22 02:55:52 +00:00
  • f0655784fb Add inline documentation for container and tuple helper optimizations Max Podkorytov 2026-01-22 02:52:27 +00:00
  • 25f759fc22 mxfp4 non reduce version dev/gemm_reduce_seperate_moe yadaish 2026-01-22 02:29:37 +00:00
  • 04f7e1fce4 [CK][Examples] Extending support for rdna3/4 part 4: (#3264) Michał Kulikowski 2026-01-22 03:10:16 +01:00
  • 4077353b6a [CK][Examples] Extending support for rdna3/4 part 4: (#3264) Michał Kulikowski 2026-01-22 03:10:16 +01:00
  • 4c2c18ef48 [CK][Examples] Extending support for rdna3/4 part 4: (#3264) Michał Kulikowski 2026-01-22 03:10:16 +01:00
  • 99872ec8b3 Add container and tuple optimization helpers Max Podkorytov 2026-01-22 01:47:46 +00:00
  • 483c7696c0 Setup build environment. Format source code. Andriy Roshchenko 2026-01-22 01:22:51 +00:00