Commit Graph

  • 83a76d74de Add unit tests for sequence_find_value and find_in_tuple_of_sequences Max Podkorytov 2026-01-21 23:50:02 +00:00
  • bc802ffe3a Apply same optimization pattern to TensorAdaptor Max Podkorytov 2026-01-16 16:37:56 -06:00
  • 1159278d12 Replace generate_tuple lambda with pack expansion in InitializeElementSize Max Podkorytov 2026-01-16 16:06:37 -06:00
  • f65e9e4c99 Replace nested static_for lambdas with compile-time search helper Max Podkorytov 2026-01-16 15:49:59 -06:00
  • aadd581b8d Merge commit '1040d9b1f53945867d78d0bbcf03de65ee01aea3' into develop assistant-librarian[bot] 2026-01-21 18:24:44 +00:00
  • 2b54a86c04 [CK_BUILDER] Replace reference conv with old ck implementation (#3604) Robin Voetter 2026-01-21 19:18:47 +01:00
  • 2cff6c74fc [CK_BUILDER] Replace reference conv with old ck implementation (#3604) Robin Voetter 2026-01-21 19:18:47 +01:00
  • 1040d9b1f5 [CK_BUILDER] Replace reference conv with old ck implementation (#3604) Robin Voetter 2026-01-21 19:18:47 +01:00
  • 5a27de45e5 Sanitizing URL-encoded characters from the image file name (#3622) andrew clark 2026-01-21 11:00:53 -07:00
  • b66bbac9ea Sanitizing URL-encoded characters from the image file name (#3622) andrew clark 2026-01-21 11:00:53 -07:00
  • 0fbb3bb8c4 Sanitizing URL-encoded characters from the image file name (#3622) andrew clark 2026-01-21 11:00:53 -07:00
  • 579d2eb5fb Merge commit 'f41f37da969d8f0dbcf590b72e5ac8e74e8846b6' into develop assistant-librarian[bot] 2026-01-21 16:34:17 +00:00
  • 5f1d79b42d Add basic SinkhornKnoppKernelDummyNonStochastic implementation Damien Lejeune 2026-01-21 11:04:40 -05:00
  • 0bb1c90674 Add CMakePresets.json (#3284) Yi DING 2026-01-22 00:04:24 +08:00
  • f7d5c3a34c Add CMakePresets.json (#3284) Yi DING 2026-01-22 00:04:24 +08:00
  • f41f37da96 Add CMakePresets.json (#3284) Yi DING 2026-01-22 00:04:24 +08:00
  • a82db41c88 Opt instances for group merging. Ville Pietilä 2026-01-21 10:47:51 -05:00
  • 9a743139af Fix kBlockSize Damien Lejeune 2026-01-21 07:42:59 -05:00
  • e548a3f280 reproduce tolerance gfx11 jeongkim/grouped-conv-bias-bnorm-clamp-tolerance-gfx11 jeongkim 2026-01-21 11:34:41 +00:00
  • 557a8d3f21 Get the test to compile dlejeune/sinkhorn Damien Lejeune 2026-01-21 04:57:01 -05:00
  • 4b26eac92f Merge branch 'develop' into LWPCK-3549-cleanups SamiAario-AMD 2026-01-21 10:32:45 +02:00
  • 69ae939950 Fix instance generation script. Ville Pietilä 2026-01-21 03:12:51 -05:00
  • 8fbde9114b Merge commit 'fcc9372c009c8e0a23fece77b582da83b04a654f' into develop assistant-librarian[bot] 2026-01-21 02:52:11 +00:00
  • a0935f7669 [CK_TILE] Fix Int32 Overflow in Deterministic FMHA BWD (#3615) Yi DING 2026-01-21 09:54:46 +08:00
  • f3b962ecc4 [CK_TILE] Fix Int32 Overflow in Deterministic FMHA BWD (#3615) Yi DING 2026-01-21 09:54:46 +08:00
  • fcc9372c00 [CK_TILE] Fix Int32 Overflow in Deterministic FMHA BWD (#3615) Yi DING 2026-01-21 09:54:46 +08:00
  • b2c76ff10f Merge commit 'd5ae81b2922773f7cdf4a02a2e1fd57d0e4df851' into develop assistant-librarian[bot] 2026-01-20 22:14:29 +00:00
  • b079841b10 Implement batched gemm add relu gemm add for rdna4 (#3391) Erwin Terpstra 2026-01-20 22:06:59 +01:00
  • fa56471c91 Implement batched gemm add relu gemm add for rdna4 (#3391) Erwin Terpstra 2026-01-20 22:06:59 +01:00
  • d5ae81b292 Implement batched gemm add relu gemm add for rdna4 (#3391) Erwin Terpstra 2026-01-20 22:06:59 +01:00
  • d121abc9f5 Fix grouped_conv_fwd_bias_bnorm_clamp tolerance for RDNA3 (gfx11) jeongkim 2026-01-20 19:43:48 +00:00
  • 5f61470a1f Merge commit '91b4102a59c6013d3faeb54f250cf577b2f129ce' into develop assistant-librarian[bot] 2026-01-20 19:35:23 +00:00
  • 8b842250da Add persistent async input scheduler for GEMM kernels (#3520) Max Podkorytov 2026-01-20 10:37:09 -08:00
  • b8595c5684 Add persistent async input scheduler for GEMM kernels (#3520) Max Podkorytov 2026-01-20 10:37:09 -08:00
  • 91b4102a59 Add persistent async input scheduler for GEMM kernels (#3520) Max Podkorytov 2026-01-20 10:37:09 -08:00
  • 58bb88f499 Merge commit '8f75869408210cb85e9eb7ff639c4c9dad1331cb' into develop assistant-librarian[bot] 2026-01-20 18:17:53 +00:00
  • 44a43f68b6 [CK] Adding CK Tile to the doc (#3621) spolifroni-amd 2026-01-20 12:44:47 -05:00
  • e227e837be Revert "[CK_TILE][FMHA] Add new tile size for async (#3586)" (#3613) Linjun-AMD 2026-01-21 01:40:54 +08:00
  • 30ac278911 Revert "[CK_TILE][FMHA] Add new tile size for async (#3586)" (#3613) Linjun-AMD 2026-01-21 01:40:54 +08:00
  • 8f75869408 Revert "[CK_TILE][FMHA] Add new tile size for async (#3586)" (#3613) Linjun-AMD 2026-01-21 01:40:54 +08:00
  • 8e5475654b Add support to fp16 + compute fp16 and bf16 + compute bf16 contractions (#3598) Estevan Vedovelli 2026-01-20 12:39:57 -05:00
  • 4db6fcdf65 Add support to fp16 + compute fp16 and bf16 + compute bf16 contractions (#3598) Estevan Vedovelli 2026-01-20 12:39:57 -05:00
  • 7d8bca7ddc Add support to fp16 + compute fp16 and bf16 + compute bf16 contractions (#3598) Estevan Vedovelli 2026-01-20 12:39:57 -05:00
  • 6c28037024 [CK_TILE] Fix Int32 Overflow in Deterministic FMHA BWD yewang12/rocm_flash_attn_cherrypick_PR3615 Ding, Yi 2026-01-20 05:18:11 +00:00
  • aa2a5778e6 Apply clang-format to sequence helper tests mpodkory/template-optimization-tests Max Podkorytov 2026-01-20 11:20:03 -06:00
  • 6a0cbcb01d Merge commit '4d58c70e6cf76ce6cb40aa6035ebccbb28493f71' into develop assistant-librarian[bot] 2026-01-20 17:18:34 +00:00
  • af1f05efad Add 7D broadcast test with non-adjacent dimensions Max Podkorytov 2026-01-20 10:52:54 -06:00
  • 364ad3d521 [CK TILE GEMM] Add bf8 support to tile engine streamk generator (#3543) Cong Ma 2026-01-20 10:01:33 -07:00
  • 2df8d912eb [CK TILE GEMM] Add bf8 support to tile engine streamk generator (#3543) Cong Ma 2026-01-20 10:01:33 -07:00
  • 4d58c70e6c [CK TILE GEMM] Add bf8 support to tile engine streamk generator (#3543) Cong Ma 2026-01-20 10:01:33 -07:00
  • e1f2d45ef4 Add 8D tensor descriptor tests Max Podkorytov 2026-01-20 10:43:49 -06:00
  • e3efa236ec Add the structure for testing the Sinkhorn-Knopp kernel Damien Lejeune 2026-01-20 11:42:10 -05:00
  • 0705a917c6 Update rocm-docs-core to 1.31.3 ROCm Docs Automation 2026-01-20 11:38:37 -05:00
  • 1a5d956be7 Use unique prime values in container helper tests Max Podkorytov 2026-01-20 10:37:36 -06:00
  • 1be9e524ce Use unique input values in sequence helper tests Max Podkorytov 2026-01-20 10:36:10 -06:00
  • a7320b9717 Merge commit '6300ad3c62298dc6fdddfcf19ecd074f7f08fa96' into develop assistant-librarian[bot] 2026-01-20 16:18:17 +00:00
  • 4b96a1952e Add script to test if potential instance can be generated. Ville Pietilä 2026-01-20 11:04:26 -05:00
  • 750bd72b3d Batched gemm softmax gemm descriptor fix (#3564) music-dino 2026-01-20 16:25:30 +01:00
  • 5827d0d892 Batched gemm softmax gemm descriptor fix (#3564) music-dino 2026-01-20 16:25:30 +01:00
  • 6300ad3c62 Batched gemm softmax gemm descriptor fix (#3564) music-dino 2026-01-20 16:25:30 +01:00
  • 9c2547aa65 Remove irrelevant instances. Ville Pietilä 2026-01-20 09:27:50 -05:00
  • 4010341092 Add max error metric barkocot/basic-v1-interwave Bartlomiej Kocot 2026-01-20 08:47:47 -05:00
  • 51214187a1 Merge branch 'develop' of github.com:ROCm/composable_kernel into barkocot/basic-v1-interwave Bartlomiej Kocot 2026-01-20 08:45:08 -05:00
  • e3e36e51f4 [CK TILE] Add gemm basic v1 interwave pipeline Bartlomiej Kocot 2026-01-20 08:44:52 -05:00
  • 053442a95a fixes Bartlomiej Kocot 2026-01-20 11:24:57 +00:00
  • cd0036813f [CK TILE] Fix basic pipelines Bartlomiej Kocot 2026-01-19 23:27:21 +00:00
  • 1851c05cd9 Disable irrelevant instances. Ville Pietilä 2026-01-20 06:00:53 -05:00
  • e77c272156 Add static assert. Ville Pietilä 2026-01-20 06:00:31 -05:00
  • 40b7772083 Add more fwd instances. Ville Pietilä 2026-01-20 06:00:13 -05:00
  • 43058803dc Merge commit 'b09121f86066381f3662fdbdee6a810849a8a1a7' into develop assistant-librarian[bot] 2026-01-20 10:16:09 +00:00
  • 6ad65bc855 WMMA support for batched_gemm_reduce (#3332) Wojciech Laskowski 2026-01-20 10:50:46 +01:00
  • f9a06ea114 WMMA support for batched_gemm_reduce (#3332) Wojciech Laskowski 2026-01-20 10:50:46 +01:00
  • b09121f860 WMMA support for batched_gemm_reduce (#3332) Wojciech Laskowski 2026-01-20 10:50:46 +01:00
  • 4dfbc9de68 update gg configs for mi355 kyle-256 2026-01-20 09:09:02 +00:00
  • 38c7251ed1 Merge commit '0727e85e523aac7a1e82af00f44081cc67f5cde0' into develop assistant-librarian[bot] 2026-01-20 06:20:32 +00:00
  • 97873bc0d5 Expand tensor descriptor test coverage Max Podkorytov 2026-01-19 23:41:39 -06:00
  • b60d14ba89 Address review feedback on tensor descriptor helper tests Max Podkorytov 2026-01-19 23:31:13 -06:00
  • 85c5741492 [CK_BUILDER] Add grouped conv fwd ck tile profiler (#3518) Bartłomiej Kocot 2026-01-20 06:29:01 +01:00
  • d15cc593ea [CK_BUILDER] Add grouped conv fwd ck tile profiler (#3518) Bartłomiej Kocot 2026-01-20 06:29:01 +01:00
  • 0727e85e52 [CK_BUILDER] Add grouped conv fwd ck tile profiler (#3518) Bartłomiej Kocot 2026-01-20 06:29:01 +01:00
  • 895404d62b Merge commit '0517d43d312356c62cc33bea4f0ecc5613e87079' into develop assistant-librarian[bot] 2026-01-20 00:37:44 +00:00
  • 0c8188374a Add unit tests for template optimization helpers Max Podkorytov 2026-01-19 16:35:39 -06:00
  • eb3011c525 Merge branch 'develop' into tianxing/unified-attention Illia Silin 2026-01-19 15:33:37 -08:00
  • c42cd28370 [CK TILE] remove dependency on std chrono (#3599) Cong Ma 2026-01-19 16:31:02 -07:00
  • 1a5d3590ef [CK TILE] remove dependency on std chrono (#3599) Cong Ma 2026-01-19 16:31:02 -07:00
  • 0517d43d31 [CK TILE] remove dependency on std chrono (#3599) Cong Ma 2026-01-19 16:31:02 -07:00
  • ecda0fe2e9 [CK_TILE][FMHA] Add new tile size for async (#3586) Linjun-AMD 2026-01-20 07:22:33 +08:00
  • a0e77e4329 [CK_TILE][FMHA] Add new tile size for async (#3586) Linjun-AMD 2026-01-20 07:22:33 +08:00
  • f3aafb9555 [CK_TILE][FMHA] Add new tile size for async (#3586) Linjun-AMD 2026-01-20 07:22:33 +08:00
  • 05d9befe90 Document sequence_map_inverse and element_space_size optimizations Max Podkorytov 2026-01-19 15:45:52 -06:00
  • 52fa8f6c2c Add build time optimization documentation Max Podkorytov 2026-01-19 13:17:42 -06:00
  • b5bde883eb Merge commit '98abfa4ade0f7b5204adf4da00e95be9453dce74' into develop assistant-librarian[bot] 2026-01-19 21:13:18 +00:00
  • 8bd33c4a35 Optimize clang-format check in Jenkins CI (#3597) Max Podkorytov 2026-01-19 12:23:06 -08:00
  • 44434d33d5 Optimize clang-format check in Jenkins CI (#3597) Max Podkorytov 2026-01-19 12:23:06 -08:00
  • 98abfa4ade Optimize clang-format check in Jenkins CI (#3597) Max Podkorytov 2026-01-19 12:23:06 -08:00
  • 17b4f104b2 Merge commit '66d6a1cfa6807866487becc87cba95a0965f51f9' into develop assistant-librarian[bot] 2026-01-19 16:15:25 +00:00
  • ae64f66966 Bump rocm-docs-core[api_reference] from 1.31.2 to 1.31.3 in /docs/sphinx (#3602) dependabot[bot] 2026-01-19 07:41:59 -08:00
  • 56b7aca81d Bump rocm-docs-core[api_reference] from 1.31.2 to 1.31.3 in /docs/sphinx (#3602) dependabot[bot] 2026-01-19 07:41:59 -08:00
  • 66d6a1cfa6 Bump rocm-docs-core[api_reference] from 1.31.2 to 1.31.3 in /docs/sphinx (#3602) dependabot[bot] 2026-01-19 07:41:59 -08:00
  • 6e7f79d3dc add irregular tail vectorloads japiasec/ck_tile/irregular_tail_vectorloads Jakub Piasecki 2026-01-19 15:25:40 +00:00