Commit Graph

  • b8b56d5cc6 Add multi-dimensional non-contiguous stride support to batched contraction, num_d = 0 Mohsen Saffari 2025-10-20 13:15:39 +00:00
  • 2efd174b33 Add new kernel instances. Ville Pietilä 2025-10-20 12:01:57 +00:00
  • 307ca52156 Improve benchmarking and analysis script. Ville Pietilä 2025-10-20 10:50:34 +00:00
  • 2ecb0bfb3e Add descriptor-based architecture for batched contraction multi-dimensional stride support Mohsen Saffari 2025-10-20 10:30:23 +00:00
  • ae31308051 Improve build infrastructure for generating doc - Add CMake documentation infrastructure with auto Python venv management - Enable streamlined docs build: cmake --build . --target docs origin/philipm/documentation-cleanup-5 Philip Maybank 2025-10-20 11:08:02 +01:00
  • 3c2cb9812d update yadai 2025-10-20 09:35:23 +00:00
  • 1247133245 Parallel compilation of the CK Tile instances. Ville Pietilä 2025-10-20 09:31:53 +00:00
  • cfc8fed858 update yadai 2025-10-20 09:27:53 +00:00
  • 5e1d1fd45d update yadai 2025-10-20 09:21:13 +00:00
  • 96b89dc4ae fix mixed_prec_int4 for flatmm zanzhang 2025-10-20 16:52:40 +08:00
  • 3aec8b5493 Improve profiler output. Ville Pietilä 2025-10-20 08:04:59 +00:00
  • 9712939beb update yadai 2025-10-20 07:37:41 +00:00
  • 5e8dab6eaf Merge commit 'fb1d090f3c475907fbcbdaf9dcfd2829f92d3c26' into develop assistant-librarian[bot] 2025-10-20 07:13:15 +00:00
  • 182c4404b5 [CK_TILE] Patch for pk_fp4 ref check and buffer load. (#3044) Gino Lu 2025-10-20 14:47:04 +08:00
  • b7e5da5e83 [CK_TILE] Patch for pk_fp4 ref check and buffer load. (#3044) Gino Lu 2025-10-20 14:47:04 +08:00
  • fb1d090f3c [CK_TILE] Patch for pk_fp4 ref check and buffer load. (#3044) Gino Lu 2025-10-20 14:47:04 +08:00
  • 3f963d4074 modified cmake files at unified attention example. Now cmake works, but getting compile errors (expected atm) Juuso Korhonen 2025-10-20 06:15:13 +00:00
  • 5962722a1a update yadai 2025-10-20 04:11:37 +00:00
  • 69ed924c36 test async load tile for fp4 Gino Lu 2025-10-13 04:30:49 -05:00
  • 129002270b Rebase to fp4_patch branch Gino Lu 2025-10-08 01:40:27 -05:00
  • 4dedce8149 Merge branch 'develop' into ginolu/fp4_patch Haocong WANG 2025-10-20 10:49:00 +08:00
  • 8be9318b16 Merge commit 'af3786fe0814a75646ff3194f86eab0e24b047e6' into develop assistant-librarian[bot] 2025-10-19 23:11:19 +00:00
  • fcd2b00c83 Add dvc pull step (#3056) BrianHarrisonAMD 2025-10-19 17:09:21 -06:00
  • 0fb13588bb Add dvc pull step (#3056) BrianHarrisonAMD 2025-10-19 17:09:21 -06:00
  • af3786fe08 Add dvc pull step (#3056) BrianHarrisonAMD 2025-10-19 17:09:21 -06:00
  • 4293a36db6 Add compile-time reflection system for CK device kernel instances John Shumway 2025-10-19 20:50:31 +00:00
  • 1b113c73a3 Fix CMake build infrastructure for experimental builder John Shumway 2025-10-17 21:21:30 +00:00
  • 367247fbae Fix clang formatting. John Shumway 2025-10-17 15:21:28 +00:00
  • 23c7773128 Add experimental builder infrastructure for composable_kernel John Shumway 2025-10-17 03:40:54 +00:00
  • 921ea9f6eb Merge commit 'd88ea05c844cd159a14213b73a5818a43c5b79e6' into develop assistant-librarian[bot] 2025-10-18 03:20:18 +00:00
  • 17261b3fb8 disable aiter test gemm_a8w8_blockscale (#3049) Illia Silin 2025-10-17 19:52:22 -07:00
  • 525ea9dd3f disable aiter test gemm_a8w8_blockscale (#3049) Illia Silin 2025-10-17 19:52:22 -07:00
  • d88ea05c84 disable aiter test gemm_a8w8_blockscale (#3049) Illia Silin 2025-10-17 19:52:22 -07:00
  • f2f7a548cb Merge commit 'b03764ca5a917752845ddbb5da8886051a16d9be' into develop assistant-librarian[bot] 2025-10-17 17:11:18 +00:00
  • 21e65bbb22 docs: add inline comments about flush_cache and rotating buffer AviralGoelAMD 2025-10-15 02:39:04 +00:00
  • 48b0e60e14 docs: add inline comments about flush_cache and rotating buffer AviralGoelAMD 2025-10-15 02:39:04 +00:00
  • b03764ca5a docs: add inline comments about flush_cache and rotating buffer AviralGoelAMD 2025-10-15 02:39:04 +00:00
  • 89fb435ce2 fix identity values in Max and AbsMax (#3048) Yashvardhan Agarwal 2025-10-17 19:49:21 +03:00
  • c5eda13381 fix identity values in Max and AbsMax (#3048) Yashvardhan Agarwal 2025-10-17 19:49:21 +03:00
  • 889ffc0b1d fix identity values in Max and AbsMax (#3048) Yashvardhan Agarwal 2025-10-17 19:49:21 +03:00
  • cdb6bd372b Fix CK Tile Stream-K BF16 Validation Errors (#3039) Emily Martins 2025-10-17 10:33:38 -06:00
  • 6157673c39 Fix CK Tile Stream-K BF16 Validation Errors (#3039) Emily Martins 2025-10-17 10:33:38 -06:00
  • 352dee5225 Fix CK Tile Stream-K BF16 Validation Errors (#3039) Emily Martins 2025-10-17 10:33:38 -06:00
  • 7fec9695d2 Pre-commit in CI (#3029) Johannes Graner 2025-10-17 18:28:38 +02:00
  • f7ffb12123 Pre-commit in CI (#3029) Johannes Graner 2025-10-17 18:28:38 +02:00
  • 8a4cd32d86 Pre-commit in CI (#3029) Johannes Graner 2025-10-17 18:28:38 +02:00
  • dc65dc98e1 Optimize calculation of the CPU reference. Ville Pietilä 2025-10-17 14:48:12 +00:00
  • 949bf1149f Add back BF16 instances. Ville Pietilä 2025-10-17 14:47:39 +00:00
  • 697dd2e6f1 Create runner script to runs CK and CK Tile profilers. Ville Pietilä 2025-10-17 14:27:52 +00:00
  • fec833263c Add stride vector arguments in example code for testing non-contiguous batched contraction inputs Mohsen Saffari 2025-10-17 13:29:10 +00:00
  • 99ccb97fad Merge commit '7e44b845b5dd4bcc28d55b4b2764e2be6418a35a' into develop assistant-librarian[bot] 2025-10-17 13:18:45 +00:00
  • 28055fdd9a Improve profiler output. Ville Pietilä 2025-10-17 13:17:29 +00:00
  • 7722f901df Fix validation. Ville Pietilä 2025-10-17 13:07:06 +00:00
  • 6789c219c1 Add missing header. Ville Pietilä 2025-10-17 12:43:49 +00:00
  • bc3a91d23f Fixed handling of split-K autodeduce argument for grouped convolution (#3024) Ville Pietilä 2025-10-17 15:36:39 +03:00
  • 71ecd257a4 Fixed handling of split-K autodeduce argument for grouped convolution (#3024) Ville Pietilä 2025-10-17 15:36:39 +03:00
  • 7e44b845b5 Fixed handling of split-K autodeduce argument for grouped convolution (#3024) Ville Pietilä 2025-10-17 15:36:39 +03:00
  • 52579ad98c Merge branch 'develop' of github.com:ROCm/composable_kernel into barkocot/conv-instances-removal barkocot/conv-instances-removal Bartlomiej Kocot 2025-10-17 11:26:41 +00:00
  • 2195cfaa52 Merge commit '0a30c3063068dcefea2291309fbe269812d06956' into conv_bwd_weight_wmma kiefer 2025-10-17 11:10:47 +00:00
  • a92c965667 Fix fwd layouts. Ville Pietilä 2025-10-17 11:07:39 +00:00
  • f4e8f791fd fixing args Tianxing Wu 2025-10-17 11:03:39 +00:00
  • ef3e871e6e Add grouped conv fwd direction profiling into CK Tile profiler. Ville Pietilä 2025-10-17 10:47:23 +00:00
  • 9fc1a8c365 Add -num_d argument for runtime D tensor count selection in batched contraction Mohsen Saffari 2025-10-17 09:18:12 +00:00
  • 995c6701d3 Merge branch 'tianxing/unified-attention' of https://github.com/ROCm/composable_kernel into tianxing/unified-attention Tianxing Wu 2025-10-17 09:05:12 +00:00
  • af9167abad example Tianxing Wu 2025-10-17 09:05:10 +00:00
  • 4027a92579 Add stride-aware reference for batched contraction with independent D tensor layouts Mohsen Saffari 2025-10-17 08:53:03 +00:00
  • 0e0fb54b9f Rename conv factory. Ville Pietilä 2025-10-17 06:26:41 +00:00
  • a708b177fc Add double smem buffer instances. Ville Pietilä 2025-10-17 06:24:11 +00:00
  • bb42ff9dee fix clang-format Gino Lu 2025-10-17 00:47:07 -05:00
  • 3f76780884 Patch for pk_fp4_raw_t buffer load and ref check Gino Lu 2025-10-17 00:39:48 -05:00
  • b2c7bef128 Merge commit 'd40b50b9d5b5b60c56b5e6b3837882442c882074' into develop assistant-librarian[bot] 2025-10-16 23:11:41 +00:00
  • 580a54b400 Update pre-commit to fixed versions, run remod for ck_tile (#2895) Johannes Graner 2025-10-17 00:29:17 +02:00
  • 8af66c65d0 Update pre-commit to fixed versions, run remod for ck_tile (#2895) Johannes Graner 2025-10-17 00:29:17 +02:00
  • d40b50b9d5 Update pre-commit to fixed versions, run remod for ck_tile (#2895) Johannes Graner 2025-10-17 00:29:17 +02:00
  • 8750af5dd9 Merge commit '440358c16851de74575798c539feca1b0be0799f' into develop assistant-librarian[bot] 2025-10-16 19:13:20 +00:00
  • 6066662785 Wave Tile Transfer supporting global load with transpose (#3027) Enrico Degregori 2025-10-16 20:33:56 +02:00
  • 1d9320c8f3 Wave Tile Transfer supporting global load with transpose (#3027) Enrico Degregori 2025-10-16 20:33:56 +02:00
  • 440358c168 Wave Tile Transfer supporting global load with transpose (#3027) Enrico Degregori 2025-10-16 20:33:56 +02:00
  • 980f5036da Merge commit 'c4b2da9cbd979eb9e32b4f20878d220b4f435a69' into develop assistant-librarian[bot] 2025-10-16 18:14:40 +00:00
  • b085a51b44 implement device batched gemm b scale for wmma (#2825) kabrahamAMD 2025-10-16 20:00:42 +02:00
  • 06d76b160e implement device batched gemm b scale for wmma (#2825) kabrahamAMD 2025-10-16 20:00:42 +02:00
  • c4b2da9cbd implement device batched gemm b scale for wmma (#2825) kabrahamAMD 2025-10-16 20:00:42 +02:00
  • 3f629b1a41 Merge branch 'develop' into jzhou/pre-load-ds Illia Silin 2025-10-16 10:49:17 -07:00
  • 1b7c5502e2 Merge commit 'd7278cc664c20613e0b7c45f249f6e7613550ca2' into develop assistant-librarian[bot] 2025-10-16 16:13:55 +00:00
  • 9988a46af2 WIP: trying to figure out tile dstr and/or indexing for scale matrix Sami Remes 2025-10-16 15:53:32 +00:00
  • 2b0c25dc28 [TheRock CI] Updating SHA for build image and TheRock SHA (#3033) Geo Min 2025-10-16 08:13:10 -07:00
  • 62afd9eb14 [TheRock CI] Updating SHA for build image and TheRock SHA (#3033) Geo Min 2025-10-16 08:13:10 -07:00
  • d7278cc664 [TheRock CI] Updating SHA for build image and TheRock SHA (#3033) Geo Min 2025-10-16 08:13:10 -07:00
  • 36020b389c Style updates and cleanup Emily Martins 2025-10-15 18:35:55 +00:00
  • 95805e674a Style updates and cleanup Emily Martins 2025-10-15 18:35:55 +00:00
  • cb83d52301 Style updates and cleanup Emily Martins 2025-10-15 18:35:55 +00:00
  • 1d1f8af58b Addition of the derived structs for the new Stream-K TilePartitioner Astha 2025-10-06 15:01:10 -04:00
  • 47e002fc27 Addition of the derived structs for the new Stream-K TilePartitioner Astha 2025-10-06 15:01:10 -04:00
  • 8f75d7cea6 Addition of the derived structs for the new Stream-K TilePartitioner Astha 2025-10-06 15:01:10 -04:00
  • 64e6fef4ba Stream-K Tile Partitioner Base Class with Tests Emily Martins 2025-10-08 15:53:19 +00:00
  • 48ebcbe898 Stream-K Tile Partitioner Base Class with Tests Emily Martins 2025-10-08 15:53:19 +00:00
  • f87f768d16 Stream-K Tile Partitioner Base Class with Tests Emily Martins 2025-10-08 15:53:19 +00:00
  • d4f55309a2 Revert "Enable storelse for fmha_fwd_trload kernel (#3023)" (#3037) Illia Silin 2025-10-16 07:19:34 -07:00
  • b4edf184a7 Revert "Enable storelse for fmha_fwd_trload kernel (#3023)" (#3037) Illia Silin 2025-10-16 07:19:34 -07:00
  • 2d1c9e28e2 Revert "Enable storelse for fmha_fwd_trload kernel (#3023)" (#3037) Illia Silin 2025-10-16 07:19:34 -07:00