Commit Graph

  • c0b68c8a85 Add more instances. Ville Pietilä 2025-10-16 14:18:40 +00:00
  • 6c5531a4ae Disqualify benchmarking results from kernels that do not pass validation. Ville Pietilä 2025-10-16 12:22:51 +00:00
  • 76ffa1bf0a Add more instances. Ville Pietilä 2025-10-16 11:33:06 +00:00
  • 9940bd07f6 fix order in mask caller Juuso Korhonen 2025-10-16 11:23:46 +00:00
  • 044bcfcb1e Take universal GEMM pipeline into use for grouped convolutions. Ville Pietilä 2025-10-16 11:03:14 +00:00
  • 75be611080 Merge commit 'e980d4351c43396398a5171e943771624a5a51eb' into develop assistant-librarian[bot] 2025-10-16 10:13:27 +00:00
  • c6f40b6147 re-enable batched transpose test on gfx942 (#3035) Max Podkorytov 2025-10-16 03:12:15 -07:00
  • 79f0324752 re-enable batched transpose test on gfx942 (#3035) Max Podkorytov 2025-10-16 03:12:15 -07:00
  • e980d4351c re-enable batched transpose test on gfx942 (#3035) Max Podkorytov 2025-10-16 03:12:15 -07:00
  • 8654f7be8d [DOCS] Documentation Addition (Readme updates) (#2495) Vidyasagar Ananthan 2025-10-16 03:10:57 -07:00
  • b26a7f9b84 [DOCS] Documentation Addition (Readme updates) (#2495) Vidyasagar Ananthan 2025-10-16 03:10:57 -07:00
  • 92c67a824f [DOCS] Documentation Addition (Readme updates) (#2495) Vidyasagar Ananthan 2025-10-16 03:10:57 -07:00
  • 553c05e9bf Merge branch 'develop' into ck_tile_batched_contraction_kernel_generelizing Mohsen Saffari 2025-10-16 09:26:17 +00:00
  • b161cd94cc Refactore the compute reference batched contraction to manage stride-aware calculation and some code cleanings Mohsen Saffari 2025-10-16 09:24:39 +00:00
  • 072de3842f comment Juuso Korhonen 2025-10-16 09:23:39 +00:00
  • aa4908ac14 fix mask Juuso Korhonen 2025-10-16 09:18:38 +00:00
  • 62932576c4 use correct mask in kernel Juuso Korhonen 2025-10-16 09:02:08 +00:00
  • 498a97aa1d merge Juuso Korhonen 2025-10-16 08:57:14 +00:00
  • 63c17b7236 correct masking by transforming y_idx = y_idx / num_queries_per_kv Juuso Korhonen 2025-10-16 08:54:07 +00:00
  • e99b5a8c28 Merge remote-tracking branch 'origin/develop' into vpietila/ck-vs-ck-tile-conv-benchmarking Ville Pietilä 2025-10-16 07:33:08 +00:00
  • 9b3c61cac2 Add more instances. Ville Pietilä 2025-10-16 07:32:52 +00:00
  • 19fac39880 Enable vector loads in grouped conv bwd weight kernels. Ville Pietilä 2025-10-16 07:17:12 +00:00
  • 6cc6e09774 Merge commit '013ba3c7372d8e6befeacc2551f9cb34180bf72f' into develop assistant-librarian[bot] 2025-10-16 06:15:58 +00:00
  • be21506985 Merge branch 'develop' into wjx/wp_gemm_fix wjx/wp_gemm_fix Yi DING 2025-10-16 14:05:31 +08:00
  • cbae1418f7 Enable storelse for fmha_fwd_trload kernel (#3023) Haocong WANG 2025-10-16 13:51:23 +08:00
  • 8925d9b1a3 Enable storelse for fmha_fwd_trload kernel (#3023) Haocong WANG 2025-10-16 13:51:23 +08:00
  • 013ba3c737 Enable storelse for fmha_fwd_trload kernel (#3023) Haocong WANG 2025-10-16 13:51:23 +08:00
  • 62ebfb8e66 Merge commit '0dbd17350095daaef9923439c20736f5934f161b' into develop assistant-librarian[bot] 2025-10-16 03:27:30 +00:00
  • 80527458b0 Fix compiler noreturn error for ck tile permute test (#3036) Emily Martins 2025-10-15 20:42:02 -06:00
  • 8ac10a4aca Fix compiler noreturn error for ck tile permute test (#3036) Emily Martins 2025-10-15 20:42:02 -06:00
  • 0dbd173500 Fix compiler noreturn error for ck tile permute test (#3036) Emily Martins 2025-10-15 20:42:02 -06:00
  • a17e7e734b Merge commit '232523d9fa70a9ce85573bc714078181f2ffc11e' into develop assistant-librarian[bot] 2025-10-16 01:40:26 +00:00
  • 43ccad2bcf docs: add quant mode comparison to readme (#3032) Aviral Goel 2025-10-15 21:35:06 -04:00
  • 18dd814ab3 docs: add quant mode comparison to readme (#3032) Aviral Goel 2025-10-15 21:35:06 -04:00
  • 232523d9fa docs: add quant mode comparison to readme (#3032) Aviral Goel 2025-10-15 21:35:06 -04:00
  • ec55fdf2b3 Adding in bf8 streamk example in CK Tile sk_ex_fp8_bf8_cktile Astha Rai 2025-09-22 18:49:21 +00:00
  • d10f7d00a4 Merge commit '87d0a3ac17286eefc1cf8291dccbc19495d87236' into develop assistant-librarian[bot] 2025-10-15 23:11:25 +00:00
  • 92fb36e73d Merge branch 'develop' into ck-tile-docs Aviral Goel 2025-10-15 18:46:51 -04:00
  • 57a340cc4c use branch develop to test hipTensor (#3034) Illia Silin 2025-10-15 15:40:34 -07:00
  • 2daa8b642a use branch develop to test hipTensor (#3034) Illia Silin 2025-10-15 15:40:34 -07:00
  • 87d0a3ac17 use branch develop to test hipTensor (#3034) Illia Silin 2025-10-15 15:40:34 -07:00
  • bcc168f1f9 Merge commit '3348f01e6fc65a7afcea3ea4167cc70e902e854a' into develop assistant-librarian[bot] 2025-10-15 15:12:32 +00:00
  • fd12e33f27 re-enable clang-format by default (#3030) Illia Silin 2025-10-15 07:43:11 -07:00
  • 0b8e15dc2e re-enable clang-format by default (#3030) Illia Silin 2025-10-15 07:43:11 -07:00
  • 3348f01e6f re-enable clang-format by default (#3030) Illia Silin 2025-10-15 07:43:11 -07:00
  • a5b60ed2f2 Add more instances. Ville Pietilä 2025-10-15 14:33:01 +00:00
  • d19ba3362d Merge commit 'bde5f26db35a0295efb1a90ad9ea2aeb27ba7ab8' into develop assistant-librarian[bot] 2025-10-15 14:12:40 +00:00
  • 356b50fadc Add help for example Mohsen Saffari 2025-10-15 14:09:32 +00:00
  • fe50f6e177 Disable streamk extended regression tests for now (#3016) Christopher Millette 2025-10-15 09:05:47 -05:00
  • d540b5244c Disable streamk extended regression tests for now (#3016) Christopher Millette 2025-10-15 09:05:47 -05:00
  • bde5f26db3 Disable streamk extended regression tests for now (#3016) Christopher Millette 2025-10-15 09:05:47 -05:00
  • 96a7c26a0b Better split-K handling in the template instantiation. Ville Pietilä 2025-10-15 13:47:04 +00:00
  • bbe13f4635 Add more instances. Ville Pietilä 2025-10-15 13:23:55 +00:00
  • 23aa650172 Add min blocks per CU to invoker name. Ville Pietilä 2025-10-15 13:21:29 +00:00
  • 57dbd2f4a4 Remove unnecessary compilations. Ville Pietilä 2025-10-15 13:20:58 +00:00
  • 853fa21566 Example boostrap Tianxing Wu 2025-10-15 11:58:44 +00:00
  • 3c08ce1e64 Improve the grouped conv kernel name generation in CK Tile. Ville Pietilä 2025-10-15 11:02:21 +00:00
  • 709395c1ab Update fmha_fwd.py lj/whole_k_pipeline Linjun-AMD 2025-10-15 17:10:13 +08:00
  • 3a50a9b871 Update fmha_fwd.py Linjun-AMD 2025-10-15 16:13:19 +08:00
  • eefbe3c8d8 Merge branch 'develop' into lj/whole_k_pipeline Linjun-AMD 2025-10-15 16:01:37 +08:00
  • 28221cf01f Value constraint concept example. vpietila/convolution-builder Ville Pietilä 2025-10-15 07:46:47 +00:00
  • 35096ae2ee fix gfx11 lalala-sh 2025-10-15 05:50:33 +00:00
  • 69e95239f4 Fix crash on small M mtgu/cktile_mxfp4_flatmm_dev Ding, Yi 2025-10-15 05:11:05 +00:00
  • b94f095958 add documentation about flush_cache and rotating_buffer functionality in ck_tile AviralGoelAMD 2025-10-15 02:44:22 +00:00
  • d098b68285 fix valarLip 2025-10-15 02:18:20 +00:00
  • 261f499709 Merge commit '4c826abfff5a348e48e650e39766171346a442c8' into develop assistant-librarian[bot] 2025-10-15 01:40:41 +00:00
  • b6f6b7cd2a Felix/opt sorting (#2902) felix 2025-10-15 09:24:03 +08:00
  • 7b584fd2d2 Felix/opt sorting (#2902) felix 2025-10-15 09:24:03 +08:00
  • 4c826abfff Felix/opt sorting (#2902) felix 2025-10-15 09:24:03 +08:00
  • 93699b81f5 Gate FMHA padding features behind build flag gate-fmha-padding Jeff Huang 2025-10-14 12:52:05 +08:00
  • b03940ab5f Merge commit 'ca1ab083a7da42a76a40f8a6802b72b61963efc1' into develop assistant-librarian[bot] 2025-10-14 22:12:52 +00:00
  • f0b0b1e838 test(grouped_gemm_multi_d): add unit test for bf16 support AviralGoelAMD 2025-10-09 17:26:55 +00:00
  • c1670a80db test(grouped_gemm_multi_d): add unit test for bf16 support AviralGoelAMD 2025-10-09 17:26:55 +00:00
  • ca1ab083a7 test(grouped_gemm_multi_d): add unit test for bf16 support AviralGoelAMD 2025-10-09 17:26:55 +00:00
  • 49286aab3f feat(grouped_gemm_multi_d): add support for bf16 AviralGoelAMD 2025-10-09 17:10:20 +00:00
  • e8653f314d feat(grouped_gemm_multi_d): add support for bf16 AviralGoelAMD 2025-10-09 17:10:20 +00:00
  • 8d8b49dec2 feat(grouped_gemm_multi_d): add support for bf16 AviralGoelAMD 2025-10-09 17:10:20 +00:00
  • e787672715 Merge commit '706c2b281caa201d2c9064e8940e0eb6c9e6710b' into develop assistant-librarian[bot] 2025-10-14 16:13:22 +00:00
  • 266ab45f7a fixing group id (#3002) Geo Min 2025-10-14 08:51:52 -07:00
  • 50aee57dea fixing group id (#3002) Geo Min 2025-10-14 08:51:52 -07:00
  • 706c2b281c fixing group id (#3002) Geo Min 2025-10-14 08:51:52 -07:00
  • ed83bcb9a2 update s_barrier's logic in gfx12 architecture (#3003) joyeamd 2025-10-14 23:49:34 +08:00
  • 2592957760 update s_barrier's logic in gfx12 architecture (#3003) joyeamd 2025-10-14 23:49:34 +08:00
  • b9d74e7746 update s_barrier's logic in gfx12 architecture (#3003) joyeamd 2025-10-14 23:49:34 +08:00
  • 3a9bd7c1ff Revert "[CK_TILE] Non-K Major from old CK to CK-Tile (#2442)" (#3017) Illia Silin 2025-10-14 08:43:14 -07:00
  • fc0f6be56d Revert "[CK_TILE] Non-K Major from old CK to CK-Tile (#2442)" (#3017) Illia Silin 2025-10-14 08:43:14 -07:00
  • e4298e55c7 Revert "[CK_TILE] Non-K Major from old CK to CK-Tile (#2442)" (#3017) Illia Silin 2025-10-14 08:43:14 -07:00
  • 3d0db2ca63 Fix transferring data back to host for validation. Ville Pietilä 2025-10-14 15:02:51 +00:00
  • b02dcd474b Merge remote-tracking branch 'origin/barkocot/explicit-string-out' into cderb/prefetch_tuning_251014 cderb/prefetch_tuning_251014 Christopher Erb 2025-10-14 09:40:50 -05:00
  • cbd97934ce Merge remote-tracking branch 'origin/barkocot/conv-instances-removal' into cderb/prefetch_tuning_251014 Christopher Erb 2025-10-14 09:40:30 -05:00
  • db6db740c4 Merge commit '6deaaa92cc561f5bc29d956d6f6de903db19a079' into develop assistant-librarian[bot] 2025-10-14 14:13:13 +00:00
  • c72bd792c8 Fix remaining images Vidyasagar 2025-10-14 07:09:40 -07:00
  • 72a1a1ca59 [CK_TILE] Switch into universal gemms for conv bwds (#2981) jakpiase 2025-10-14 16:09:16 +02:00
  • 4643cdd962 [CK_TILE] Switch into universal gemms for conv bwds (#2981) jakpiase 2025-10-14 16:09:16 +02:00
  • 6deaaa92cc [CK_TILE] Switch into universal gemms for conv bwds (#2981) jakpiase 2025-10-14 16:09:16 +02:00
  • bbed3a62dc Fully functional CK Tile profiler. Ville Pietilä 2025-10-14 13:35:37 +00:00
  • 72fe8b311c merge Juuso Korhonen 2025-10-14 12:35:33 +00:00
  • 4d232d59cc fix seq_len -> cur_batch_query_len Juuso Korhonen 2025-10-14 12:34:33 +00:00
  • b940a75328 Comments Tianxing Wu 2025-10-14 12:19:20 +00:00
  • 4c0b5201eb Merge commit '589e242eda730958b36c4f78bfad1991c499b0d2' into develop assistant-librarian[bot] 2025-10-14 12:17:41 +00:00