Commit Graph

  • 0e72f1836b Add more FP8 instances and layouts to ckProfiler. Ville Pietilä 2025-12-11 11:06:57 -05:00
  • d5645ff481 Bf16*fp4 gemm (#2801) eliotwang 2025-12-11 23:20:29 +08:00
  • 4b881deb39 Bf16*fp4 gemm (#2801) eliotwang 2025-12-11 23:20:29 +08:00
  • 715671e419 Bf16*fp4 gemm (#2801) eliotwang 2025-12-11 23:20:29 +08:00
  • 47c2e95b20 Add more fp16 instances. Align ckProfiler and ckTileProfiler. Ville Pietilä 2025-12-11 10:11:49 -05:00
  • 4dacd3340c dispatcher Tianxing Wu 2025-12-11 14:14:54 +00:00
  • 284de6f126 Gemm dispatch Tianxing Wu 2025-12-11 14:13:18 +00:00
  • e13494cc3a refactor Tianxing Wu 2025-12-11 13:34:27 +00:00
  • f6d2243288 [CK_TILE] Add pooling to ckTileEngine part4 fix suppported configurations Aleksander Dudek 2025-12-11 12:11:42 +00:00
  • 1ccc76e92d Update gitignore. Ville Pietilä 2025-12-11 06:19:12 -05:00
  • db39b44bab Update in the implementation of GetAlignmentQ/GetAlignmentK/GetAlignmentV Qianfeng Zhang 2025-12-11 10:47:54 +00:00
  • 73aed1b57c remove if statements Tianxing Wu 2025-12-11 09:21:55 +00:00
  • b4c3a1bbcf Added working fp16 and int8 instances. Ville Pietilä 2025-12-11 03:58:34 -05:00
  • b69f9eb589 Merge commit 'ce99cab6056d1ffef5acb6f4ad7ede87a46a3cfc' into develop assistant-librarian[bot] 2025-12-11 08:17:07 +00:00
  • 53dc636c6e Wmma support for gemm_ab_scale (#3314) Enrico Degregori 2025-12-11 09:06:20 +01:00
  • 87cf3a4fe2 Wmma support for gemm_ab_scale (#3314) Enrico Degregori 2025-12-11 09:06:20 +01:00
  • ce99cab605 Wmma support for gemm_ab_scale (#3314) Enrico Degregori 2025-12-11 09:06:20 +01:00
  • fe0fe6f4ad [CK_BUILDER] Improve CK Builder and CK Builder tests (#3382) Ville Pietilä 2025-12-11 09:50:00 +02:00
  • 1742ecb78c [CK_BUILDER] Improve CK Builder and CK Builder tests (#3382) Ville Pietilä 2025-12-11 09:50:00 +02:00
  • d66e5f667c [CK_BUILDER] Improve CK Builder and CK Builder tests (#3382) Ville Pietilä 2025-12-11 09:50:00 +02:00
  • a1037bfc3c Merge commit '6d25525adc2344d5b62b12b9ffddee50f89cd0ff' into develop assistant-librarian[bot] 2025-12-11 07:16:06 +00:00
  • c82e638522 python3 op_tests/test_moe_2stage.py -t 16 -e 1 -k 1 -dim 256,256 ready origin/zan/moe_a8w4 Zzz9990 2025-12-10 19:30:47 -06:00
  • d810876d63 feat(precommit-hooks): add check for correct copyright header (#3302) Aviral Goel 2025-12-11 10:50:43 +04:00
  • e044db7202 feat(precommit-hooks): add check for correct copyright header (#3302) Aviral Goel 2025-12-11 10:50:43 +04:00
  • 6d25525adc feat(precommit-hooks): add check for correct copyright header (#3302) Aviral Goel 2025-12-11 10:50:43 +04:00
  • f38b64ae67 docs: add notes on tile distribution and inline comments (#3297) Aviral Goel 2025-12-11 10:47:19 +04:00
  • 7a514148e7 docs: add notes on tile distribution and inline comments (#3297) Aviral Goel 2025-12-11 10:47:19 +04:00
  • fbbdd36ea8 docs: add notes on tile distribution and inline comments (#3297) Aviral Goel 2025-12-11 10:47:19 +04:00
  • 341d0e31b3 debugging medium grained Agarwal 2025-12-10 20:54:55 -05:00
  • 72cc7dfc77 Merge commit '8270900d606398868e747b7f9097484ee73a4cb4' into develop assistant-librarian[bot] 2025-12-11 01:41:32 +00:00
  • f2a77cf0bd [ci] Bumping TheRock commit hash (#3385) Geo Min 2025-12-10 17:34:41 -08:00
  • bba647c933 [ci] Bumping TheRock commit hash (#3385) Geo Min 2025-12-10 17:34:41 -08:00
  • 8270900d60 [ci] Bumping TheRock commit hash (#3385) Geo Min 2025-12-10 17:34:41 -08:00
  • 988b7e109d Merge commit '15ed65db35e6702593cd8ed1d603222fb11684e4' into develop assistant-librarian[bot] 2025-12-10 21:13:52 +00:00
  • c868964f6a Improve sequence sorting and add unit tests (#3376) John Shumway 2025-12-10 12:25:23 -08:00
  • f55a1bca99 Improve sequence sorting and add unit tests (#3376) John Shumway 2025-12-10 12:25:23 -08:00
  • 15ed65db35 Improve sequence sorting and add unit tests (#3376) John Shumway 2025-12-10 12:25:23 -08:00
  • 9daab9664d Merge commit 'b15df372553e0f80a660124f1b558d9cb276bd08' into develop assistant-librarian[bot] 2025-12-10 16:15:45 +00:00
  • cb83826b52 Add test shapes. Ville Pietilä 2025-12-10 10:34:55 -05:00
  • 813ad5a2ca Enable running int8 instances. Ville Pietilä 2025-12-10 10:34:38 -05:00
  • 224b2b5a94 Fwd conv profiler improvements. Ville Pietilä 2025-12-10 10:10:32 -05:00
  • 737c80d47d fix: python 3.8 compatibility in fmha codegen (#3388) Po Yen Chen 2025-12-10 23:08:41 +08:00
  • fb5c7e0314 fix: python 3.8 compatibility in fmha codegen (#3388) Po Yen Chen 2025-12-10 23:08:41 +08:00
  • b15df37255 fix: python 3.8 compatibility in fmha codegen (#3388) Po Yen Chen 2025-12-10 23:08:41 +08:00
  • 6c381b161d Enable only FP16 and INT8 instances. Ville Pietilä 2025-12-10 09:21:40 -05:00
  • a3ca1e75eb Rename single-character variable jograner/grouped-gemm-issue Graner, Johannes 2025-12-10 14:05:04 +00:00
  • 12a2420994 New gemms tianxing/unified-attention-quantization Tianxing Wu 2025-12-10 13:40:42 +00:00
  • 1b401ca21c Merge remote-tracking branch 'origin/barkocot/ck_tile_conv_benchmark2' into vpietila/int8-perf-on-navi4x Ville Pietilä 2025-12-10 07:52:17 -05:00
  • 26d51907c7 AICK-441 Graner, Johannes 2025-12-10 12:40:55 +00:00
  • 03890e0590 Fix crash for two stage kernel Graner, Johannes 2025-12-10 12:38:47 +00:00
  • 148f1c37c0 update Zzz9990 2025-12-10 06:37:41 -06:00
  • 6f3e040a33 Disable kernel for split-k > 1 with non-contiguous strides Graner, Johannes 2025-12-10 12:13:59 +00:00
  • 8d0951f5e2 Fix clang format for Two Stage implementation kiefer 2025-12-10 11:07:49 +00:00
  • a2359443be fix pre-commit error KenSCLin 2025-12-10 10:11:19 +00:00
  • 9c57ec93ea Merge branch 'develop' into ck_tile/gemm_blockscale_abquant kensclin 2025-12-10 18:10:14 +08:00
  • 4f207de1b8 Sync with develop KenSCLin 2025-12-10 09:59:00 +00:00
  • 636a90b531 Merge commit 'fc22320d783a6b73798a23d8d20fb24e3a5e4040' into develop assistant-librarian[bot] 2025-12-10 08:15:28 +00:00
  • df9d6c2628 update Zzz9990 2025-12-10 02:10:03 -06:00
  • d719c09343 [CK_TILE] Split-K autodeduction (#3351) Ville Pietilä 2025-12-10 09:30:30 +02:00
  • fbf53fb970 [CK_TILE] Split-K autodeduction (#3351) Ville Pietilä 2025-12-10 09:30:30 +02:00
  • fc22320d78 [CK_TILE] Split-K autodeduction (#3351) Ville Pietilä 2025-12-10 09:30:30 +02:00
  • 490d6daf13 Merge commit '1aa93ef551a31405aef5c8c14e869241ba96639d' into develop assistant-librarian[bot] 2025-12-10 02:46:30 +00:00
  • 822da5d3a7 [CK_TILE MOE] add NT & preshuffle permute to cktile MOE (#3377) Zzz9990 2025-12-10 10:03:28 +08:00
  • 09e81b46ba [CK_TILE MOE] add NT & preshuffle permute to cktile MOE (#3377) Zzz9990 2025-12-10 10:03:28 +08:00
  • 1aa93ef551 [CK_TILE MOE] add NT & preshuffle permute to cktile MOE (#3377) Zzz9990 2025-12-10 10:03:28 +08:00
  • ec044f50b6 rebase with develop khushbu agarwal 2025-12-09 19:12:29 -05:00
  • dfeb7a11b9 Merge commit '934ba1208ab7cfc82c20f73b14994b64c3843d2d' into develop assistant-librarian[bot] 2025-12-09 23:12:58 +00:00
  • ee0d92f8fc use hipTensor from monorepo for daily builds (#3386) Illia Silin 2025-12-09 14:39:08 -08:00
  • 2185fc59cb use hipTensor from monorepo for daily builds (#3386) Illia Silin 2025-12-09 14:39:08 -08:00
  • 934ba1208a use hipTensor from monorepo for daily builds (#3386) Illia Silin 2025-12-09 14:39:08 -08:00
  • 6112b22756 updated other invokers Jakub Piasecki 2025-12-09 22:26:02 +00:00
  • d6fe69e6fd Merge commit '0d8259affd4f59eb8b1143b658d83d3800270f43' into develop assistant-librarian[bot] 2025-12-09 20:14:23 +00:00
  • 5f4c14b336 temporarily disable daily builds on gfx1010 and gfx908 (#3384) Illia Silin 2025-12-09 10:37:13 -08:00
  • 25918f26a2 temporarily disable daily builds on gfx1010 and gfx908 (#3384) Illia Silin 2025-12-09 10:37:13 -08:00
  • 0d8259affd temporarily disable daily builds on gfx1010 and gfx908 (#3384) Illia Silin 2025-12-09 10:37:13 -08:00
  • 636bc57ab2 Merge commit '7582c9e73fc3e580a2255988310cb25391f80162' into develop assistant-librarian[bot] 2025-12-09 16:14:29 +00:00
  • ddfea2a784 Merge branch 'develop' into ck_tile/gemm_blockscale_abquant kensclin 2025-12-09 23:54:40 +08:00
  • 6d0b4ea055 Implement review suggested changes KenSCLin 2025-12-09 10:14:21 +00:00
  • cdacf1d5f5 Upgrade to ROCm7.1.1 as default compiler. (#3370) Illia Silin 2025-12-09 07:35:32 -08:00
  • 43b4ec3209 Upgrade to ROCm7.1.1 as default compiler. (#3370) Illia Silin 2025-12-09 07:35:32 -08:00
  • 7582c9e73f Upgrade to ROCm7.1.1 as default compiler. (#3370) Illia Silin 2025-12-09 07:35:32 -08:00
  • e2dd65218d moved ck tile profiler to experimental Jakub Piasecki 2025-12-09 15:12:29 +00:00
  • 821b976ead Bump rocm-docs-core[api_reference] from 1.20.1 to 1.31.0 in /docs/sphinx (#3374) dependabot[bot] 2025-12-09 07:10:34 -08:00
  • e416856bf0 Bump rocm-docs-core[api_reference] from 1.20.1 to 1.31.0 in /docs/sphinx (#3374) dependabot[bot] 2025-12-09 07:10:34 -08:00
  • 50ca3f83eb Bump rocm-docs-core[api_reference] from 1.20.1 to 1.31.0 in /docs/sphinx (#3374) dependabot[bot] 2025-12-09 07:10:34 -08:00
  • 2f5eb26839 compile pass Zzz9990 2025-12-09 07:25:26 -06:00
  • 07c078d5ef [CK_TILE] Add pooling to ckTileEngine part3 Aleksander Dudek 2025-12-09 11:59:37 +00:00
  • 1e173e2ab9 Merge commit '6f0966e1e9fca5c513d16a729237d676b583e266' into develop assistant-librarian[bot] 2025-12-09 10:14:29 +00:00
  • 94496c1737 Implement review suggested changes KenSCLin 2025-12-09 10:14:21 +00:00
  • 77f9a0a615 fix a16w4 moe bugs (#3373) lalala-sh 2025-12-09 17:54:55 +08:00
  • 9691ccf03c fix a16w4 moe bugs (#3373) lalala-sh 2025-12-09 17:54:55 +08:00
  • 6f0966e1e9 fix a16w4 moe bugs (#3373) lalala-sh 2025-12-09 17:54:55 +08:00
  • 04d3a0ced0 Some refactoring for this 1 channel per block kernel ck_tile_batch_norm_forward_blockwise_welford Mohsen Saffari 2025-12-09 09:13:31 +00:00
  • 616ad45cef Print number of valid instances in profiler and tests. kiefer 2025-12-05 13:37:07 +00:00
  • d201572ae4 Actually print the reason when a device implementation is not supported. kiefer 2025-12-04 09:57:55 +00:00
  • 1a822947eb Fix bug in various bwd wei device implementations / profiler where the occupancy based split_k value could not be found because the Argument did not derive from ArgumentSplitK, leading to incorrect error tolerances. kiefer 2025-12-03 15:54:22 +00:00
  • 4cf3e61954 Grab device and gridwise files from bkp branch, this should enable splitK support for convolution and also we no longer ForceThreadTileTransfer for explicit gemm. Also grab some updates from 7e7243783008b11e904f127ecf1df55ef95e9af2 to fix building on clang20. kiefer 2025-10-24 14:26:31 +00:00
  • 3e27e627bb Always ForceThreadTileTransfer for now, WaveTileTransfer does not work for convolution yet. kiefer 2025-10-23 12:18:59 +00:00
  • 29265aa82f Fix add_test_executable Enrico Degregori 2025-10-09 13:25:49 +00:00
  • 4c09ae57bc Disable splitk for 2stage xdl on rdna (bug to be fixed) Enrico Degregori 2025-10-06 12:32:01 +00:00