Commit Graph

  • d71fe5b98c Merge branch 'develop' into LWPCK-3549-cleanups SamiAario-AMD 2026-01-19 17:16:37 +02:00
  • 9e79b07298 WIP with checks passing Matti Eskelinen 2026-01-19 09:40:32 -05:00
  • 5eeb83abb0 Improve runner script. Ville Pietilä 2026-01-19 09:39:09 -05:00
  • 5a0fea7f5a WIP Matti Eskelinen 2026-01-19 08:55:00 -05:00
  • 1c35e916f0 remove n, unnecessary Matti Eskelinen 2026-01-19 08:36:50 -05:00
  • 2c7fb73c2c Split into dummy and reduce impls Matti Eskelinen 2026-01-19 08:35:29 -05:00
  • 86c47d3a9d Add example script on implementing the algoithm using matmul Matti Eskelinen 2026-01-19 08:06:54 -05:00
  • c934043433 Align conv bwd data profiler with bwd weigth profiler. Ville Pietilä 2026-01-19 07:36:19 -05:00
  • a5c81472d0 Fix profiler command conversion and execution scripts. Ville Pietilä 2026-01-19 07:35:41 -05:00
  • 95ca0c373a Merge branch 'develop' into meskelin/refactor-makegemmtensorviews meskelin/refactor-makegemmtensorviews Matti Eskelinen 2026-01-19 14:33:08 +02:00
  • 5f2f7b5ee2 Disable more profilers. Ville Pietilä 2026-01-19 06:21:55 -05:00
  • 1db3473ff7 Convert PyTorch output to CK profiler commands. Ville Pietilä 2026-01-19 06:10:00 -05:00
  • 2625758522 Build only CK conv profilers. Ville Pietilä 2026-01-19 06:09:29 -05:00
  • 3d4bb495f8 Merge commit '1a6d1b59ef7358e4f07afcc0a163af7aa4b985a9' into develop assistant-librarian[bot] 2026-01-19 10:16:14 +00:00
  • a9ff38bc89 [CK_BUILDER] Convolution forward transfer concepts. (#3535) Adam Osewski 2026-01-19 10:54:10 +01:00
  • c03fa25f6f [CK_BUILDER] Convolution forward transfer concepts. (#3535) Adam Osewski 2026-01-19 10:54:10 +01:00
  • 1a6d1b59ef [CK_BUILDER] Convolution forward transfer concepts. (#3535) Adam Osewski 2026-01-19 10:54:10 +01:00
  • 63051dc6d9 initialize gemm reduce seperate gemm yadaish 2026-01-19 09:00:41 +00:00
  • e60d79a9a1 Merge commit 'fe40a5d13941b64162cffce9496d1d94a90f80a5' into develop assistant-librarian[bot] 2026-01-17 08:14:43 +00:00
  • 9c660bfbe3 Implement batched gemm bias permute for RDNA4 (#3534) Erwin Terpstra 2026-01-17 08:30:27 +01:00
  • beffadc5a0 Implement batched gemm bias permute for RDNA4 (#3534) Erwin Terpstra 2026-01-17 08:30:27 +01:00
  • fe40a5d139 Implement batched gemm bias permute for RDNA4 (#3534) Erwin Terpstra 2026-01-17 08:30:27 +01:00
  • a565d87e08 Apply same optimization pattern to TensorAdaptor Max Podkorytov 2026-01-16 16:37:56 -06:00
  • bbf5c5e926 Replace generate_tuple lambda with pack expansion in InitializeElementSize Max Podkorytov 2026-01-16 16:06:37 -06:00
  • 1d7c221c95 Replace nested static_for lambdas with compile-time search helper Max Podkorytov 2026-01-16 15:49:59 -06:00
  • 9942fd6ab9 Replace sequence_merge O(log N) recursion with O(1) fold expression Max Podkorytov 2026-01-16 14:01:19 -06:00
  • e74b611c14 Replace O(N) recursive element space size with O(1) fold expression Max Podkorytov 2026-01-16 11:26:46 -06:00
  • a8c9be9378 Rewrite sequence_map_inverse using O(1) depth pack expansion Max Podkorytov 2026-01-16 11:19:35 -06:00
  • 02e42dcaa1 Replace lambdas with named functors in container_concat Max Podkorytov 2026-01-16 01:23:21 -06:00
  • 0a1e1cc66f Add container_product helper for O(1) depth fold expression Max Podkorytov 2026-01-16 01:05:44 -06:00
  • 22a409be00 Add make_uniform_tuple helper for repeated value patterns Max Podkorytov 2026-01-16 00:42:19 -06:00
  • 00849ac2e2 Replace lambdas with named functors in transform_tensor_descriptor mpodkory/transform-tensor-descriptor-optimization Max Podkorytov 2026-01-16 00:10:23 -06:00
  • d7e7fbdcff Add generate_identity_sequences helper for common pattern tenpercent/generate-identity-sequences Max Podkorytov 2026-01-15 23:22:16 -06:00
  • 3d46680be0 Optimize sequence_merge using direct concatenation for small cases Max Podkorytov 2026-01-15 21:32:18 -06:00
  • 94b9e4b635 Optimize sequence_gen and uniform_sequence_gen using __make_integer_seq Max Podkorytov 2026-01-15 21:15:57 -06:00
  • 9c4010cd17 Merge commit 'f9104ef9b3b794f8e02757cbf2935818f5389dac' into develop assistant-librarian[bot] 2026-01-17 00:38:39 +00:00
  • 487f1beee9 [CK TILE QUANT GEMM] use OverrideADataType in aquant pipeline (#3584) Cong Ma 2026-01-16 16:27:39 -07:00
  • 80bc8aaf76 [CK TILE QUANT GEMM] use OverrideADataType in aquant pipeline (#3584) Cong Ma 2026-01-16 16:27:39 -07:00
  • f9104ef9b3 [CK TILE QUANT GEMM] use OverrideADataType in aquant pipeline (#3584) Cong Ma 2026-01-16 16:27:39 -07:00
  • 0568a7a03c Add static_for_indexed for reduced template instantiation mpodkory/static-for-indexed Max Podkorytov 2026-01-16 15:08:41 -06:00
  • ee426bea45 still debugging: speculating soemthing with cshuffle epilogue lwpck-4181 khuagarw 2026-01-16 20:53:14 +00:00
  • aef254ca0d Rewrite StaticallyIndexedArray to use C-array instead of Tuple tenpercent/statically-indexed-array-rewrite Max Podkorytov 2026-01-15 21:54:58 -06:00
  • f5ada17eed Replace sequence_merge O(log N) recursion with O(1) fold expression Max Podkorytov 2026-01-16 14:01:19 -06:00
  • 003d9952fe Replace O(N) recursive element space size with O(1) fold expression Max Podkorytov 2026-01-16 11:26:46 -06:00
  • 84b0d6b99b Rewrite sequence_map_inverse using O(1) depth pack expansion Max Podkorytov 2026-01-16 11:19:35 -06:00
  • 887bdf21fb Replace lambdas with named functors in container_concat Max Podkorytov 2026-01-16 01:23:21 -06:00
  • 60770012ed Add container_product helper for O(1) depth fold expression Max Podkorytov 2026-01-16 01:05:44 -06:00
  • 0f14af8ad1 Add make_uniform_tuple helper for repeated value patterns Max Podkorytov 2026-01-16 00:42:19 -06:00
  • 0791bade3c Replace lambdas with named functors in transform_tensor_descriptor Max Podkorytov 2026-01-16 00:10:23 -06:00
  • 7c37209c94 Add generate_identity_sequences helper for common pattern Max Podkorytov 2026-01-15 23:22:16 -06:00
  • 73b0cfde4e Merge commit '3f735c127b8e78b702a31e19cb6e0e35eda3588a' into develop assistant-librarian[bot] 2026-01-16 19:13:41 +00:00
  • b12d70ae04 [CK Profiler] Restore CPU tensor initialization when verification is not done on GPU (#3594) Johannes Graner 2026-01-16 19:56:53 +01:00
  • 74c4b5df53 [CK Profiler] Restore CPU tensor initialization when verification is not done on GPU (#3594) Johannes Graner 2026-01-16 19:56:53 +01:00
  • 3f735c127b [CK Profiler] Restore CPU tensor initialization when verification is not done on GPU (#3594) Johannes Graner 2026-01-16 19:56:53 +01:00
  • fb918acff9 Remove unnecessary hip_fp16 include from stream_config (#3549) logicat 2026-01-17 02:40:05 +08:00
  • 2f59f74334 Remove unnecessary hip_fp16 include from stream_config (#3549) logicat 2026-01-17 02:40:05 +08:00
  • fec81109f1 Remove unnecessary hip_fp16 include from stream_config (#3549) logicat 2026-01-17 02:40:05 +08:00
  • 0b3ee64c89 Disable CK Builder for SLES15 in Jenkins CI (#3581) John Shumway 2026-01-16 10:36:23 -08:00
  • c4dce7cb69 Disable CK Builder for SLES15 in Jenkins CI (#3581) John Shumway 2026-01-16 10:36:23 -08:00
  • 2d233c838a Disable CK Builder for SLES15 in Jenkins CI (#3581) John Shumway 2026-01-16 10:36:23 -08:00
  • f7614e006b CK Tile: fix some issues (#3557) spolifroni-amd 2026-01-16 13:34:44 -05:00
  • b56d46606d CK Tile: fix some issues (#3557) spolifroni-amd 2026-01-16 13:34:44 -05:00
  • 427d4fb9e9 CK Tile: fix some issues (#3557) spolifroni-amd 2026-01-16 13:34:44 -05:00
  • 43428f2fd5 ROCm 7.2 (Clang 22) compilation hotfix ipanfilo/clang22_build_hotfix Ilya Panfilov 2026-01-16 13:17:34 -05:00
  • f9ff023328 Fixing GEMM Multi D on Tile Engine (#3583) Thrupti Raj Lakshmana Gowda 2026-01-16 12:17:21 -06:00
  • 01adec72bf Fixing GEMM Multi D on Tile Engine (#3583) Thrupti Raj Lakshmana Gowda 2026-01-16 12:17:21 -06:00
  • de8ee379ad Fixing GEMM Multi D on Tile Engine (#3583) Thrupti Raj Lakshmana Gowda 2026-01-16 12:17:21 -06:00
  • f09e10936d fixed vector load siz for fp4 Sami Remes 2026-01-16 12:04:34 -05:00
  • 003c0b3cc7 Merge branch 'vpietila/ckb-refactor-warp-gemm-descriptors' of github.com:ROCm/composable_kernel into vpietila/ckb-refactor-warp-gemm-descriptors vpietila/ckb-refactor-warp-gemm-descriptors Ville Pietilä 2026-01-16 11:48:59 -05:00
  • c4d9d16dea sketch algorithm Sami Remes 2026-01-16 10:14:40 -05:00
  • 727af14aad WIP Damien Lejeune 2026-01-16 09:33:47 -05:00
  • 244048fc52 WIP Eskelinen 2026-01-16 09:08:45 -05:00
  • a89e7522e3 WIP Sami Remes 2026-01-16 08:44:17 -05:00
  • 16ca5cb532 WIP Sami Remes 2026-01-16 08:22:11 -05:00
  • 5937ab0d00 Sinkhorn-Knopp: WIP Damien Lejeune 2026-01-16 08:21:39 -05:00
  • e5be038a5e Fix GEMM pipeline concept usage. Ville Pietilä 2026-01-16 05:32:29 -05:00
  • 84c0f2c880 Merge branch 'develop' into jeonghyun/ckb-almiopen-522-descriptor-init JH-Leon-KIM-AMD 2026-01-16 10:42:46 +02:00
  • 2ab79ebe39 Merge branch 'develop' into LWPCK-3549-cleanups SamiAario-AMD 2026-01-16 09:47:35 +02:00
  • f6330af670 updating test kyle-256 2026-01-16 07:05:05 +00:00
  • e2311b8dc7 update test kernels kyle-256 2026-01-12 08:06:52 +00:00
  • 2e00471b10 udpating tests on mi300 kyle-256 2026-01-09 10:37:27 +00:00
  • 726ddd64ad update test config kyle-256 2026-01-09 08:24:15 +00:00
  • a12d808505 add async test; result not good kyle-256 2026-01-08 02:29:39 +00:00
  • 84f4255e9e update config Your Name 2026-01-06 08:10:37 +00:00
  • 57c8cb19d2 Optimize sequence_merge using direct concatenation for small cases Max Podkorytov 2026-01-15 21:32:18 -06:00
  • 991274aaaf Optimize sequence_gen and uniform_sequence_gen using __make_integer_seq Max Podkorytov 2026-01-15 21:15:57 -06:00
  • d9030f5343 Merge commit '644cdbe3c92f9af16067e539edb4a13e6b9e7c86' into develop assistant-librarian[bot] 2026-01-16 02:52:08 +00:00
  • d4990deb79 Merge pull request #3573 from ROCm/jshumway/builder-readme John Shumway 2026-01-15 17:55:04 -08:00
  • 4faf012d92 Merge pull request #3573 from ROCm/jshumway/builder-readme John Shumway 2026-01-15 17:55:04 -08:00
  • 644cdbe3c9 Merge pull request #3573 from ROCm/jshumway/builder-readme John Shumway 2026-01-15 17:55:04 -08:00
  • cece6c0c2c debugging permuteN khuagarw 2026-01-15 22:46:53 +00:00
  • 7b405e44b0 Merge commit '086a1f8861ef8c81db854e7f2749458b69121617' into develop assistant-librarian[bot] 2026-01-15 17:20:33 +00:00
  • ef0227d255 added reflection for grouped_conv_bwd weight_cshuffleV3 kabraham/describe-fn-wmma Kevin Abraham 2026-01-15 16:55:52 +00:00
  • f6d1bb77e0 Add LLM-agnostic Docker and build analysis tools (#3576) Max Podkorytov 2026-01-15 08:30:23 -08:00
  • 79139825a9 Add LLM-agnostic Docker and build analysis tools (#3576) Max Podkorytov 2026-01-15 08:30:23 -08:00
  • 086a1f8861 Add LLM-agnostic Docker and build analysis tools (#3576) Max Podkorytov 2026-01-15 08:30:23 -08:00
  • f0f4dbbffc Merge commit 'f57395689b92ca1f644e6e549e763f6c293ced22' into develop assistant-librarian[bot] 2026-01-15 16:19:30 +00:00
  • 1ecb8b0081 Merge branch 'develop' into meskelin/refactor-makegemmtensorviews Matti Eskelinen 2026-01-15 17:51:56 +02:00
  • fcdc0f7fee Bump rocm-docs-core[api_reference] from 1.31.1 to 1.31.2 in /docs/sphinx (#3577) dependabot[bot] 2026-01-15 07:49:06 -08:00
  • 48becfa5ad Bump rocm-docs-core[api_reference] from 1.31.1 to 1.31.2 in /docs/sphinx (#3577) dependabot[bot] 2026-01-15 07:49:06 -08:00