Ville Pietilä
99fe3df99a
Fix tensor descriptors.
2025-10-03 14:23:04 +00:00
Ville Pietilä
9510171377
WIP: Put back the generic tensor descriptors for convolutions.
2025-10-02 15:06:30 +00:00
Ville Pietilä
c3f0c1a866
Add additional check for non-supported c > 1 case.
2025-09-30 07:46:24 +00:00
Ville Pietilä
db835e065c
Make MPerGroup and NPerGroup template parameters.
2025-09-30 07:14:28 +00:00
Ville Pietilä
1a6f602c65
Remove debug code.
2025-09-30 05:53:28 +00:00
Ville Pietilä
193907fd85
Fix case k > 1 and c=1.
2025-09-29 16:02:00 +00:00
Ville Pietilä
558054eadb
WIP: Simplify conv to gemm transformations and handle K > 1 and C > 1 cases.
2025-09-26 13:38:24 +00:00
Ville Pietilä
8babf7195a
Fix strides in 1D conv to gemm transformation.
2025-09-26 09:38:11 +00:00
Ville Pietilä
354dd5039c
Add compile check for assumed row-mjor layout.
2025-09-26 08:39:39 +00:00
Ville Pietilä
1764c77fb2
Enable running multiple GEMM batches of merged conv groups.
2025-09-26 07:51:29 +00:00
Ville Pietilä
b864c077ed
Code clean-up for bwd tensor transformations.
2025-09-25 15:09:08 +00:00
Ville Pietilä
0ea3268d5d
Remove debug and other dead code.
2025-09-25 09:41:33 +00:00
Ville Pietilä
cc7433efc6
Add more comments, disable debug code.
2025-09-25 09:37:15 +00:00
Ville Pietilä
97f842f2c6
Fully functional LDS to global mem transfer using tensor descriptor and tile distribution encoding.
2025-09-25 09:30:50 +00:00
Ville Pietilä
625a78b17b
WIP: LDS to global mem transfer using CK tile tensor descriptor and tile distribution encoding.
2025-09-24 15:08:01 +00:00
Ville Pietilä
7280df1bc3
Add one more unit test for tensor view.
2025-09-24 12:10:26 +00:00
Ville Pietilä
8048d6ff73
Fix build.
2025-09-23 11:17:08 +00:00
Ville Pietilä
e6f6c4a6a3
Working baseline for depthwise covolution with merged conv groups.
2025-09-23 11:14:10 +00:00
Ville Pietilä
29e3112b9b
Epilogue fixes.
2025-09-22 15:38:02 +00:00
Ville Pietilä
d7da3d5089
Offset fixes.
2025-09-22 15:37:46 +00:00
Ville Pietilä
7dfbac5d0b
WIP: Separate epilogue for merged conv groups.
2025-09-19 13:52:33 +00:00
Ville Pietilä
af6838e5dc
Integration test for CShuffle epilogue.
2025-09-19 12:09:08 +00:00
Ville Pietilä
7f52f84167
Fix tile window size for c block.
2025-09-19 08:08:19 +00:00
Ville Pietilä
6bcdb0947e
LDS to global memory copy.
2025-09-18 14:59:32 +00:00
Ville Pietilä
0e09504057
WIP: merged conv groups GEMM epilogue changes.
2025-09-17 14:25:02 +00:00
Ville Pietilä
27a2ceb4f7
Increase the max number of reported errors.
2025-09-17 12:29:12 +00:00
Ville Pietilä
4ec81cb95c
Add more logging.
2025-09-17 12:27:51 +00:00
Ville Pietilä
6d318ab481
Enable running multiple conv groups per batch.
2025-09-12 14:03:04 +00:00
Ville Pietilä
0d5c1b9638
WIP: Merged conv groups epilogue.
2025-09-11 15:24:36 +00:00
Ville Pietilä
970b40aa6c
WIP: Merged conv groups offset calculation.
2025-09-09 11:33:31 +00:00
Ville Pietilä
d9f0a9cdd0
Fully working conv group merging for TransformConvBwdWeightToGemm.
2025-09-09 09:58:43 +00:00
Ville Pietilä
8845b23254
WIP: Tensor transformations.
2025-09-08 15:41:54 +00:00
Ville Pietilä
61b3c96273
Add number of groups to merge to ck tile grouped gemm example.
2025-09-04 14:24:23 +00:00
Ville Pietilä
2b1908a375
Fix compilation of the grouped conv examples.
2025-09-04 12:01:49 +00:00
arai713
0282d98412
[CK TILE] Stream-K tile partitioner ( #2708 )
...
* initial commit for skeleton code
* replaced skeleton code with old streamk b2c map functions from old CK, still need to clean up the code
* fixed up code to match CK Tile convention: data type changes, naming changes, etc.
* change for num_sk_blocks data type
* formatting fix
* minor fixes
* moved reduction argument to template
* resolved comments from PR review: standardizing naming, pruning unneeded code
* resolve errors from merge of device op PR: moved enum to common file
* switching to uint32_t due to implementation constraints: divmod only takes uint32_t and mixing signed and unsigned types causes problems
* unsigned type fix
* add const qualifier
* added documentation for template parameters
* documentation edit
2025-09-03 13:38:17 -07:00
msaffari-amd
47d020a993
refactor: use snake_case naming in ck_tile/core components ( #2766 )
2025-09-03 09:34:11 +02:00
Cong Ma
e1ab460d2d
[CK TILE GEMM] Fix building issues ( #2772 )
...
- Add `WarpGemmMfma_f32_16x16x128_[fp8|bf8]_[fp8|bf8]_CTransposed`
- Replace `__gfx950__` with `CK_GFX950_SUPPORT`
2025-09-02 22:40:18 -07:00
Po Yen Chen
9f35cde374
[CK_TILE] Fix fmha_fwd_v3() Default2DEpilogue usage ( #2765 )
...
* Fix Default2DEpilogue usage
* Fix Default2DEpilogue usage for batch_prefill
2025-09-02 09:51:56 -07:00
Sami Remes
4419fc34a2
Fix formatting problem ( #2768 )
2025-09-02 14:14:10 +03:00
Michael Mcminn
022f369deb
Adding fix for the gfx908 to the GEMM MFMA implementaitons of WarpGem… ( #2751 )
...
* Adding fix for the gfx908 to the GEMM MFMA implementaitons of WarpGemmMfmaBf16Bf16F32M4N64K16 WarpGemmMfmaBf16Bf16F32M64N4K16
* Adding support for offload target gfx9-4-generic
* This duplication here isn't ideal
2025-09-02 10:35:07 +02:00
Haocong WANG
33418b201f
Fix naming issue ( #2762 )
2025-09-02 11:18:53 +08:00
Po Yen Chen
d876e87fe4
[CK_TILE] Add FAv3 fwd pipeline ( #2731 )
...
* Add FAv3 fwd pipeline
* Unpack v_pk_mul to hide v_mov
* Avoid compiler moving l compute across phase
* Sync sched_group_barrier() setting for masking cases
2025-09-01 09:16:45 +08:00
Aviral Goel
fcff0043ae
chore(gemm): clang format to pass CI ( #2758 )
2025-08-29 00:38:46 -07:00
Vijay Krish
4208e28988
ck_tile kernel for gemm with groupwise quantized B tensor. ( #2663 )
...
* This change introduces new pipelines with Intrawave scheduler and block gemm primitives that loads the scale tensor to registers to perform dequantization post MFMA on C tensor in registers.
Scale tensor data, BQ is spliced across threads in registers and not stored in LDS.
Current support is for the following combinations, but it should be fairly straightforward to extend support to more formats.
fp8, fp8 -> f32
bf8, bf8 -> f32
fp8, i4 -> f32
bf8, i4 -> f32
Group size can go down to as low as K length of underlying WarpGemm primitive.
* Solve merge conflict
* [CK TILE] Update CHANGELOG.md
---------
Co-authored-by: Vijay Krishnamoorthy <vjkrish@fb.com >
Co-authored-by: ThomasNing <thomas.ning@amd.com >
Co-authored-by: Cong Ma <congma13@amd.com >
2025-08-28 23:43:02 -07:00
Cong Ma
428090f749
Support transposed C tile in Aquant ( #2679 )
...
The performance of Aquant has increased after enabling transposed C.
Do not need to exchange AQ elements among lanes after enabling
transposed C as one thread only holds data from one row.
2025-08-28 13:28:09 -07:00
Mateusz Ozga
0758883fa4
[CK-TILE] Default2DEpilogue, example and adding nullptr_t type for D ( #2752 )
...
* Init commit
* Quick fix, CI fails
* Remove CDElementWise
* Add CDEELementWise
---------
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com >
2025-08-28 12:45:50 -07:00
asleepzzz
038ea82315
Revert "[CK_TILE] FMHA BWD Enable Tile 16x192 ( #2741 )" ( #2757 )
...
This reverts commit ead4447b20 .
2025-08-28 22:50:42 +08:00
linqunAMD
4a49dac7c6
[Regression] Fix CK_TILE build error in grouped_convolution, copy_basic and fused_moegemm_kernel ( #2728 )
...
* fix copy basic build error
* fix other ck tile test build error
2025-08-28 20:30:30 +08:00
Yi DING
ead4447b20
[CK_TILE] FMHA BWD Enable Tile 16x192 ( #2741 )
...
* 16x192
* Use buffer_load_lds for lse/d
* Dispatch & cleanup
* Avoid zeroing dq & fix
* fix
2025-08-28 18:54:18 +08:00
Linjun-AMD
bf7b458e6e
use iglp to improve dim256 fmha fwd in qr_ks_vs pipeline ( #2711 )
...
* add k_lds padding and iglp to improve dim256 fmha fwd
* Update include/ck_tile/ops/fmha/pipeline/block_fmha_pipeline_qr_ks_vs.hpp
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* update block_fmha_pipeline_qr_ks_vs.hpp
Signed-off-by: JL-underdog <Jun.Lin@amd.com >
* Update block_fmha_pipeline_qx_ks_vs_custom_policy.hpp
* clang format
Signed-off-by: JL-underdog <Jun.Lin@amd.com >
* use same naming style
---------
Signed-off-by: JL-underdog <Jun.Lin@amd.com >
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2025-08-28 11:39:39 +08:00