Commit Graph

40 Commits

Author SHA1 Message Date
Ville Pietilä
fdfbd1e770 Check that number of convolution groups is multiple of merged groups. 2025-10-07 15:05:09 +00:00
Ville Pietilä
d9e9f19ca4 Run clang-formatting. 2025-10-07 13:41:54 +00:00
Ville Pietilä
faf07cc3ab Code clean-up. 2025-10-07 10:41:18 +00:00
Ville Pietilä
9519405b4a Merge remote-tracking branch 'origin/develop' into vpietila/merge-multiple-conv-groups-into-single-wg-in-ck-tile 2025-10-07 08:05:32 +00:00
Ville Pietilä
a3458d38c9 Remove debug output. 2025-10-06 12:58:16 +00:00
Ville Pietilä
24fe5e4f80 Fix bugs in merged conv groups tensor descriptors. 2025-10-06 10:22:23 +00:00
Ville Pietilä
48d22d2b9b Remove the obsolete template parameters. 2025-10-03 14:36:48 +00:00
Ville Pietilä
c3f0c1a866 Add additional check for non-supported c > 1 case. 2025-09-30 07:46:24 +00:00
Ville Pietilä
db835e065c Make MPerGroup and NPerGroup template parameters. 2025-09-30 07:14:28 +00:00
Ville Pietilä
1a6f602c65 Remove debug code. 2025-09-30 05:53:28 +00:00
Ville Pietilä
193907fd85 Fix case k > 1 and c=1. 2025-09-29 16:02:00 +00:00
Ville Pietilä
1764c77fb2 Enable running multiple GEMM batches of merged conv groups. 2025-09-26 07:51:29 +00:00
Khushbu Agarwal
b56e5d1d79 Fix for Add the API to load SGPR (#2913)
* Revert "Revert "[CK-Tile] Add the API to load SGPR  (#2878)" (#2904)"

This reverts commit f161b5b738.

* Fix: sgpr minor issue

* cyclic dependency resolved

* clang formatted

* removing unused variable

* clang formatted

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
2025-09-25 10:32:42 -07:00
Ville Pietilä
0ea3268d5d Remove debug and other dead code. 2025-09-25 09:41:33 +00:00
Ville Pietilä
cc7433efc6 Add more comments, disable debug code. 2025-09-25 09:37:15 +00:00
Ville Pietilä
97f842f2c6 Fully functional LDS to global mem transfer using tensor descriptor and tile distribution encoding. 2025-09-25 09:30:50 +00:00
Ville Pietilä
625a78b17b WIP: LDS to global mem transfer using CK tile tensor descriptor and tile distribution encoding. 2025-09-24 15:08:01 +00:00
asleepzzz
f161b5b738 Revert "[CK-Tile] Add the API to load SGPR (#2878)" (#2904)
This reverts commit 2cbbf5dcb3.
2025-09-23 14:33:51 -07:00
Ville Pietilä
8048d6ff73 Fix build. 2025-09-23 11:17:08 +00:00
Ville Pietilä
e6f6c4a6a3 Working baseline for depthwise covolution with merged conv groups. 2025-09-23 11:14:10 +00:00
Thomas Ning
2cbbf5dcb3 [CK-Tile] Add the API to load SGPR (#2878)
* Have a workable version for SGPR

* have a workable version for atomic add

* Revert "have a workable version for atomic add"

This reverts commit 792377a590c26cfff9c8f545d9a9e8484a7422eb.

* substitute with the new sgpr read api

* update the CHANGELOG

* have a workable version for atomic add

* Revert "have a workable version for atomic add"

This reverts commit 792377a590c26cfff9c8f545d9a9e8484a7422eb.

* change to static for logic

* have a workable version for atomic add

* Revert "have a workable version for atomic add"

This reverts commit 792377a590c26cfff9c8f545d9a9e8484a7422eb.
2025-09-23 01:23:56 -07:00
Ville Pietilä
d7da3d5089 Offset fixes. 2025-09-22 15:37:46 +00:00
jakpiase
624c46866e [CK_TILE] Add conv bwd weight two stage support (#2855)
* resolved conflicts

* add conv bwd weight twostage

* fix one file

* fixes after review

* fixes

* fixes

* Fix

---------

Co-authored-by: Bartlomiej Kocot <barkocot@amd.com>
2025-09-22 15:31:25 +02:00
Ville Pietilä
7f52f84167 Fix tile window size for c block. 2025-09-19 08:08:19 +00:00
Ville Pietilä
6bcdb0947e LDS to global memory copy. 2025-09-18 14:59:32 +00:00
Ville Pietilä
4ec81cb95c Add more logging. 2025-09-17 12:27:51 +00:00
JH-Leon-KIM-AMD
804065a36b [CK Tile] Grouped conv fwd splitn support (#2776)
## What's New
  Add Split-N support for grouped convolution forward to handle tensors >2GB by splitting the batch dimension.

  ## Bug Fix
  Fixed 32-bit integer overflow that caused crashes with 6+ splits:
  - Use `long_index_t` for batch offset calculations
  - Remove redundant GemmM initialization in constructors

  ## How It Works
  - Automatically splits batch dimension when tensor exceeds 2GB
  - Uses grid.z dimension for parallel processing of splits
  - Each split processes a subset of batches independently

  ## Testing
  Verified with tile_example_grouped_conv_fwd:
  - n=3000 (6 splits) ✓
  - n=3500 (7 splits) ✓
  - n=10480 (40 splits) ✓
2025-09-16 16:56:11 +03:00
Ville Pietilä
6d318ab481 Enable running multiple conv groups per batch. 2025-09-12 14:03:04 +00:00
Ville Pietilä
0d5c1b9638 WIP: Merged conv groups epilogue. 2025-09-11 15:24:36 +00:00
linqunAMD
60d3e8f504 [CK_TILE] Fix example batched_gemm, grouped_gemm, gemm_multi_d, convolution on gfx11 & gfx12 (#2808)
* [CK_TILE] Fix example batched_gemm, grouped_gemm, gemm_multi_d, convolution on gfx11 & gfx12

* fix gemm_splitk_two_stage

* revert .pre-commit-config.yaml
2025-09-11 07:27:33 -07:00
Ville Pietilä
970b40aa6c WIP: Merged conv groups offset calculation. 2025-09-09 11:33:31 +00:00
Ville Pietilä
83f607e2a6 [CK Tile] Fix building grouped conv examples in CK Tile (#2777)
* Fix compilation of the grouped conv examples.

* Fix grouped conv bwd weight example output in CK Tile.
2025-09-05 09:14:21 +03:00
Ville Pietilä
61b3c96273 Add number of groups to merge to ck tile grouped gemm example. 2025-09-04 14:24:23 +00:00
Ville Pietilä
2b1908a375 Fix compilation of the grouped conv examples. 2025-09-04 12:01:49 +00:00
linqunAMD
4a49dac7c6 [Regression] Fix CK_TILE build error in grouped_convolution, copy_basic and fused_moegemm_kernel (#2728)
* fix copy basic build error

* fix other ck tile test build error
2025-08-28 20:30:30 +08:00
Bartłomiej Kocot
4212bbc170 [CK Tile] Grouped convolution backward data (#2652)
* base working version for single groupped conv bwd data

* Fix 2d descriptor

* fix groups

* Add 3d support

* fixes

* fixes

* fixes

---------

Co-authored-by: Jakub Piasecki <jakpia21@gmail.com>
2025-08-20 05:29:57 -07:00
linqunAMD
9fcc1ee9fd Support Wave32 in CK_TILE - Part 1 (#2594)
* Support wave32/wave64 in CK_TILE - Part 1

* remove blocksize in kernel launch

* fix build error

* fix clang format

* fix clang format 2

* fix clang format 3

* fix fmha build error

* fix fmha build 2

* fix fmha build 3

* fix build error 4

* address review comment

* update change log

* replace KernelBlockSize with kBlockSize

* fix CI fail

* fix clang format

* address review comment and rebase code.

* fix universal test fail

---------

Co-authored-by: Lin, Qun <Quentin.Lin+amdeng@amd.com>
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>
2025-08-18 10:08:31 -07:00
Illia Silin
504b101da3 upgrade from clang-format-12 to clang-format-18 (#2568)
* upgrade to clang-format-18

* update to clang-format-18 in pre-commit-config
2025-07-28 11:34:07 -07:00
jakpiase
6681593864 [CK_TILE] Grouped Convolution Backward Weight Kernel (#2357)
* [CK TILE] Grouped Convolution Forward Kernel

* custom vector size

* fixes

* refactor

* resolved conflicts

* rebase fixes

* fixes

* tmp

* add working support for splitk

* minor fix

* fixes

* fixes

* minor fix

* small fix

* Split K and preprocessing fixes

---------

Co-authored-by: Bartlomiej Kocot <barkocot@amd.com>
2025-07-24 10:41:35 +02:00
Bartłomiej Kocot
cebdee4d9e [CK TILE] Grouped Convolution Forward Kernel (#2188)
* [CK TILE] Grouped Convolution Forward Kernel

* custom vector size

* fixes

* refactor

* rebase fixes

* fixes

* fixes
2025-06-20 15:44:36 -07:00