Commit Graph

  • f57395689b Bump rocm-docs-core[api_reference] from 1.31.1 to 1.31.2 in /docs/sphinx (#3577) dependabot[bot] 2026-01-15 07:49:06 -08:00
  • eb0080ab85 [CK][Examples] Fixing stride issues in ck examples 14/65/68/69 by workaround - Bypassing hostTensor validation -Fixing args num in ck examples 68/69 Michal Kulikowski 2026-01-14 17:24:07 +01:00
  • fc889be2a5 [CK][Examples] Fixing stride issues in ck examples 14/65/68/69 by workaround - Bypassing hostTensor validation -Fixing args num in ck examples 68/69 Michal Kulikowski 2026-01-14 17:24:07 +01:00
  • e1f2a44096 [CK][Examples] Fixing stride issues in ck examples 14/65/68/69 by workaround - Bypassing hostTensor validation -Fixing args num in ck examples 68/69 Michal Kulikowski 2026-01-14 17:24:07 +01:00
  • 767530856a Add overloads of MakeA/B/C/DBlockWindows that accept descriptors Matti Eskelinen 2026-01-15 14:19:27 +00:00
  • 97f2fa2912 Implement device_gemm_universal_preshuffle_instance for RDNA4 (#3429) Yung-sheng Tu 2026-01-15 16:19:31 +01:00
  • 1bf5861e43 Implement device_gemm_universal_preshuffle_instance for RDNA4 (#3429) Yung-sheng Tu 2026-01-15 16:19:31 +01:00
  • 6df2d70143 Implement device_gemm_universal_preshuffle_instance for RDNA4 (#3429) Yung-sheng Tu 2026-01-15 16:19:31 +01:00
  • 43e3e63175 Merge commit 'e30207985aa5d9d0b53dc837904bf2ac3063a412' into develop assistant-librarian[bot] 2026-01-15 15:14:37 +00:00
  • 6d24f25ca5 Merge branch 'develop' into vpietila/ckb-refactor-warp-gemm-descriptors Ville Pietilä 2026-01-15 17:06:02 +02:00
  • 09d084bfb4 Fix error when building with -DCMAKE_BUILD_TYPE=Debug (#3541) Estevan Vedovelli 2026-01-15 09:35:24 -05:00
  • e71d3df441 Fix error when building with -DCMAKE_BUILD_TYPE=Debug (#3541) Estevan Vedovelli 2026-01-15 09:35:24 -05:00
  • e30207985a Fix error when building with -DCMAKE_BUILD_TYPE=Debug (#3541) Estevan Vedovelli 2026-01-15 09:35:24 -05:00
  • 850ca52a91 [CK_BUILDER] Update owners file for more reviews for CK Builder (#3572) jshumway/analyze-build John Shumway 2026-01-14 12:43:55 -08:00
  • c59346f03a Disable ActiveWorkgroupsPerCU for different arch in wmma kernels (#3566) Bartłomiej Kocot 2026-01-14 21:37:12 +01:00
  • 33d4ae859e Fix grouped conv bwd data wmma check (#3562) Bartłomiej Kocot 2026-01-14 20:04:37 +01:00
  • a669f8c969 [CK_Tile] Support for group size 128 for Preshuffle quant for 2d block scale gemm (#3462) Khushbu Agarwal 2026-01-14 10:00:19 -08:00
  • 08b82b66fb Build CK on Windows (#3458) Ville Pietilä 2026-01-14 17:31:45 +02:00
  • 162399f3b1 [CK] Refactor GPU verification kernel to gather error stats on GPU (#3551) Johannes Graner 2026-01-14 16:04:50 +01:00
  • 615178aed9 [CK Profiler] Initialize tensors on GPU in CK profiler (#3550) Johannes Graner 2026-01-14 16:04:14 +01:00
  • 4f6c596a7a [CK_TILE][FMHA] Enable gpt-oss sink (#3490) Linjun-AMD 2026-01-14 21:32:06 +08:00
  • e79a263198 Add support for direct store in epilogue and padding support for wave transfer without transpose (#3465) Enrico Degregori 2026-01-14 11:02:19 +01:00
  • bc5139167f [CK TILE ENGINE] CI fix for Basic Tile Engine (#3554) Thrupti Raj Lakshmana Gowda 2026-01-13 18:20:30 -06:00
  • 42a8d972c3 Shuffle fix for gfx950 (#3491) Thomas Ning 2026-01-14 01:21:29 +08:00
  • c6ed4fa489 [CK_BUILDER] Add bwd weight factories (#3509) Ville Pietilä 2026-01-13 18:12:38 +02:00
  • e0026b2937 fix incorrect List import in reduce_parameter.py (#3555) Po Yen Chen 2026-01-13 22:33:05 +08:00
  • 2e6b7641ef Implement grouped gemm tile loop for RDNA4 (#3304) Erwin Terpstra 2026-01-13 07:14:23 +01:00
  • 8bf8bb30d6 [CK Tile] Fix FMHA LSE calculation and potential division by zero (#3326) Jeff Huang 2026-01-13 13:52:26 +08:00
  • 36d60760e4 [FMHA] Support page_size=1 (linear layout) in batch prefill pipeline (#3545) Jeff Huang 2026-01-13 12:04:43 +08:00
  • 837531d24c fix mxfp8-gemm example failure (#3531) ZheWang 2026-01-13 10:26:45 +08:00
  • d99f0a929f WIP: extract MakeALdsDescriptor() from child to parent class for code readability (#3392) Aviral Goel 2026-01-12 23:21:58 +05:30
  • d50221f65c refactor: remove Default scheduler implementation as it not used anymore (#3542) Aviral Goel 2026-01-12 23:21:06 +05:30
  • 6b951b09aa [CK profiler] Perform verification on GPU when using GPU reference (#3482) Johannes Graner 2026-01-12 12:12:41 +01:00
  • 862139b765 adressed review comments from PR3459 (#3526) kabrahamAMD 2026-01-12 09:47:00 +01:00
  • da5574b616 ck-builder: tensor input/output reflection (#3536) Robin Voetter 2026-01-12 09:45:53 +01:00
  • 0f6a441af2 moe fp8 blockscale use nt (#3524) yadaish 2026-01-12 10:48:10 +08:00
  • a96e998c9e Dlejeune/ck tile 2d multiple reductions (#3147) damien-lejeune 2026-01-09 11:16:37 +01:00
  • e9b3d5ee82 [CK_BUILDER] Debug utilities (#3528) Robin Voetter 2026-01-08 10:14:13 +01:00
  • a9d0954656 Removing memop from chshuffle (#3530) Thrupti Raj Lakshmana Gowda 2026-01-08 01:34:43 -06:00
  • af7a509621 [CK] Allow tensors larger than 2GB in grouped conv bwd weight (#3169) Johannes Graner 2026-01-08 08:02:02 +01:00
  • ea4976513a [CK TILE] Fix grouped conv kernels splitk and double lds (#3527) Bartłomiej Kocot 2026-01-08 07:59:38 +01:00
  • 455be8c071 Disable fp32 atomic adds on gfx11 (#3510) Bartłomiej Kocot 2026-01-08 00:32:04 +01:00
  • 327fb9c6e9 Wmma support for gemm_bias_add_reduce (#3316) Enrico Degregori 2026-01-07 19:27:16 +01:00
  • b029c1aff8 Implement grouped gemm fastgelu for RDNA4 (#3303) Erwin Terpstra 2026-01-07 19:20:44 +01:00
  • 275e45d597 Add unit test coverage for conversion to convolution traits (#3515) John Shumway 2026-01-07 07:44:21 -08:00
  • a9f573dc5f [CI, CK examples] Disable time_kernel for CI tests and examples (#3464) Johannes Graner 2026-01-07 16:30:57 +01:00
  • fd8e96061c Enable offload-compress for Windows if avaliable (#3521) BrianHarrisonAMD 2026-01-07 08:05:03 -07:00
  • 6b47e45b78 [CK TILE] Refactor function amd_buffer_load_invalid_element_return_zero (#3512) Cong Ma 2026-01-07 01:05:56 -07:00
  • 1cfc816cc7 [CK_Tile] Support for various group sizes Preshuffle quant for 2d block scale gemm (#3445) Khushbu Agarwal 2026-01-06 12:46:59 -08:00
  • d257eafa15 [CKTILE] Support A/B Quantization in Blockscale Grouped Gemm (#3452) kyle-256 2026-01-07 04:36:04 +08:00
  • 726224168d [CK_TILE] add preshuffleB mode for ABQuant GEMM (#3495) kensclin 2026-01-07 04:35:01 +08:00
  • 6f06f02d61 Fix build error from extra comma (#3516) John Shumway 2026-01-06 11:08:54 -08:00
  • 8c211bbb71 add tabulate package to aiter docker (#3519) Illia Silin 2026-01-06 09:36:54 -08:00
  • 3d85c0f0f5 [CK_BUILDER] Integrate reference conv with testing (#3511) Robin Voetter 2026-01-06 09:29:06 +01:00
  • e5930ae340 Merge some updates for ck_tile headers (#3342) joyeamd 2026-01-06 15:39:00 +08:00
  • 230fe628f3 Joye/revise wp pipeline (#3493) joyeamd 2026-01-06 05:49:26 +08:00
  • cbc03887c9 Add support to gfx1153 and fix gfx115X WMMA config (#3496) Estevan Vedovelli 2026-01-05 13:03:30 -05:00
  • f65eeab1df Fix large tensor grouped conv bwd data test (#3513) Bartłomiej Kocot 2026-01-05 18:42:02 +01:00
  • 1bb9b749c7 [CK_BUILDER] validation (#3471) Robin Voetter 2026-01-05 13:57:34 +01:00
  • c1670e40a7 [FMHA] Batch Prefill Support Improvements: Change KV Cache Layout & Large Page Size Support (#3442) Jeff Huang 2026-01-05 18:41:47 +08:00
  • 2b2250a852 [CK-Tile] move out memory operation from cshuffle epilogue class (#3359) Max Podkorytov 2026-01-04 03:28:14 -08:00
  • fc3835528e Generate the descriptors explicitly as separate tuples Matti Eskelinen 2026-01-15 13:48:44 +00:00
  • 4e0fd5241a Separate tensor descriptor creation from the tensor view creation Matti Eskelinen 2026-01-15 11:32:50 +00:00
  • 445ec888ba [FMHA] Enable page size 16 for batch prefill kernel (#3568) Jeff Huang 2026-01-15 22:11:44 +08:00
  • e3eda32062 [FMHA] Enable page size 16 for batch prefill kernel (#3568) Jeff Huang 2026-01-15 22:11:44 +08:00
  • 993d3e2f0e [FMHA] Enable page size 16 for batch prefill kernel (#3568) Jeff Huang 2026-01-15 22:11:44 +08:00
  • 7c6cac26d9 first implementation with working bdw_weight description Kevin Abraham 2026-01-15 11:58:37 +00:00
  • 3b784ca603 clang-format Ville Pietilä 2026-01-15 05:31:52 -05:00
  • dc6f8f1067 Fix WMMA conv algorithms hierarchy. Ville Pietilä 2026-01-15 05:27:22 -05:00
  • c8030c98c8 added tests for bwd wei Kevin Abraham 2026-01-15 10:05:09 +00:00
  • 6abcb1d5cf refactored helpers to support bwd conv Kevin Abraham 2026-01-15 10:01:05 +00:00
  • ae55803b39 [CK_BUILDER] ALMIOPEN-522: Testing-specific descriptor initialization JH-Leon-KIM-AMD 2026-01-14 12:16:25 +00:00
  • 2451657a14 Merge remote-tracking branch 'origin/develop' into vpietila/ckb-refactor-warp-gemm-descriptors Ville Pietilä 2026-01-15 04:36:46 -05:00
  • eb83c23157 Merge commit '51226372156901aa20a34ed5146d6bd57c63e519' into develop assistant-librarian[bot] 2026-01-15 09:16:31 +00:00
  • 753043b27a [CK_BUILDER] Convert convolution traits to a struct with factory functions (#3547) John Shumway 2026-01-15 01:03:21 -08:00
  • 6ae6b01721 [CK_BUILDER] Convert convolution traits to a struct with factory functions (#3547) John Shumway 2026-01-15 01:03:21 -08:00
  • 5122637215 [CK_BUILDER] Convert convolution traits to a struct with factory functions (#3547) John Shumway 2026-01-15 01:03:21 -08:00
  • c756e421db Update README.md files to match recent code changes John Shumway 2026-01-14 16:41:34 -05:00
  • b4a7cc7524 Update README.md files to match recent code changes John Shumway 2026-01-14 16:41:34 -05:00
  • df7ee270a6 Update README.md files to match recent code changes John Shumway 2026-01-14 16:41:34 -05:00
  • 35ec0097e5 Merge commit '8705fdcb0c738907fea74b7ed39c9f73fb9a5892' into develop assistant-librarian[bot] 2026-01-14 22:14:05 +00:00
  • 3827441343 add aiter test_batch_prefill and simplify jenkins file a bit (#3570) Illia Silin 2026-01-14 14:07:47 -08:00
  • 8b415db3d6 add aiter test_batch_prefill and simplify jenkins file a bit (#3570) Illia Silin 2026-01-14 14:07:47 -08:00
  • 8705fdcb0c add aiter test_batch_prefill and simplify jenkins file a bit (#3570) Illia Silin 2026-01-14 14:07:47 -08:00
  • 5386db55e1 Merge commit '7f912909ca2c3cedfa1c6397d75daba4903a6d0d' into develop assistant-librarian[bot] 2026-01-14 21:07:55 +00:00
  • 8661ee5a16 Disable CK Tile Stream-K reduction tests (#3559) Emily Martins 2026-01-14 14:02:21 -07:00
  • c07c2fa0ab Disable CK Tile Stream-K reduction tests (#3559) Emily Martins 2026-01-14 14:02:21 -07:00
  • 7f912909ca Disable CK Tile Stream-K reduction tests (#3559) Emily Martins 2026-01-14 14:02:21 -07:00
  • c744f9015e [CK_BUILDER] Update owners file for more reviews for CK Builder (#3572) John Shumway 2026-01-14 12:43:55 -08:00
  • 1e9ccb17d1 [CK_BUILDER] Update owners file for more reviews for CK Builder (#3572) John Shumway 2026-01-14 12:43:55 -08:00
  • f08fb3f748 [CK_BUILDER] Update owners file for more reviews for CK Builder (#3572) John Shumway 2026-01-14 12:43:55 -08:00
  • 8c72adabeb Disable ActiveWorkgroupsPerCU for different arch in wmma kernels (#3566) Bartłomiej Kocot 2026-01-14 21:37:12 +01:00
  • 343e0fb21f Disable ActiveWorkgroupsPerCU for different arch in wmma kernels (#3566) Bartłomiej Kocot 2026-01-14 21:37:12 +01:00
  • a346cfa960 Disable ActiveWorkgroupsPerCU for different arch in wmma kernels (#3566) Bartłomiej Kocot 2026-01-14 21:37:12 +01:00
  • 0211adaff3 Merge commit 'a07c8e38bd5152f2582dd0c8c1f8eef72f1086e5' into develop assistant-librarian[bot] 2026-01-14 19:12:59 +00:00
  • 9aea6a52ed Fix grouped conv bwd data wmma check (#3562) Bartłomiej Kocot 2026-01-14 20:04:37 +01:00
  • ced94149c8 Fix grouped conv bwd data wmma check (#3562) Bartłomiej Kocot 2026-01-14 20:04:37 +01:00
  • a07c8e38bd Fix grouped conv bwd data wmma check (#3562) Bartłomiej Kocot 2026-01-14 20:04:37 +01:00
  • 7da4e47a5f [CK_Tile] Support for group size 128 for Preshuffle quant for 2d block scale gemm (#3462) Khushbu Agarwal 2026-01-14 10:00:19 -08:00
  • 13eb0113c0 [CK_Tile] Support for group size 128 for Preshuffle quant for 2d block scale gemm (#3462) Khushbu Agarwal 2026-01-14 10:00:19 -08:00