Ville Pietilä
99fe3df99a
Fix tensor descriptors.
2025-10-03 14:23:04 +00:00
Ville Pietilä
9510171377
WIP: Put back the generic tensor descriptors for convolutions.
2025-10-02 15:06:30 +00:00
Ville Pietilä
c3f0c1a866
Add additional check for non-supported c > 1 case.
2025-09-30 07:46:24 +00:00
Ville Pietilä
db835e065c
Make MPerGroup and NPerGroup template parameters.
2025-09-30 07:14:28 +00:00
Ville Pietilä
1a6f602c65
Remove debug code.
2025-09-30 05:53:28 +00:00
Ville Pietilä
193907fd85
Fix case k > 1 and c=1.
2025-09-29 16:02:00 +00:00
Ville Pietilä
558054eadb
WIP: Simplify conv to gemm transformations and handle K > 1 and C > 1 cases.
2025-09-26 13:38:24 +00:00
Ville Pietilä
8babf7195a
Fix strides in 1D conv to gemm transformation.
2025-09-26 09:38:11 +00:00
Ville Pietilä
354dd5039c
Add compile check for assumed row-mjor layout.
2025-09-26 08:39:39 +00:00
Ville Pietilä
1764c77fb2
Enable running multiple GEMM batches of merged conv groups.
2025-09-26 07:51:29 +00:00
Ville Pietilä
b864c077ed
Code clean-up for bwd tensor transformations.
2025-09-25 15:09:08 +00:00
Ville Pietilä
0ea3268d5d
Remove debug and other dead code.
2025-09-25 09:41:33 +00:00
Ville Pietilä
cc7433efc6
Add more comments, disable debug code.
2025-09-25 09:37:15 +00:00
Ville Pietilä
97f842f2c6
Fully functional LDS to global mem transfer using tensor descriptor and tile distribution encoding.
2025-09-25 09:30:50 +00:00
Ville Pietilä
625a78b17b
WIP: LDS to global mem transfer using CK tile tensor descriptor and tile distribution encoding.
2025-09-24 15:08:01 +00:00
Ville Pietilä
7280df1bc3
Add one more unit test for tensor view.
2025-09-24 12:10:26 +00:00
Ville Pietilä
73fb5a026a
Initial unit tests for tensor descriptor.
2025-09-24 08:31:42 +00:00
Ville Pietilä
8048d6ff73
Fix build.
2025-09-23 11:17:08 +00:00
Ville Pietilä
e6f6c4a6a3
Working baseline for depthwise covolution with merged conv groups.
2025-09-23 11:14:10 +00:00
Ville Pietilä
29e3112b9b
Epilogue fixes.
2025-09-22 15:38:02 +00:00
Ville Pietilä
d7da3d5089
Offset fixes.
2025-09-22 15:37:46 +00:00
Ville Pietilä
dafcb39496
Tile example parameters changes to match depthwise conv.
2025-09-22 12:12:58 +00:00
Ville Pietilä
7dfbac5d0b
WIP: Separate epilogue for merged conv groups.
2025-09-19 13:52:33 +00:00
Ville Pietilä
437599c517
Improved CShuffle test.
2025-09-19 13:45:39 +00:00
Ville Pietilä
af6838e5dc
Integration test for CShuffle epilogue.
2025-09-19 12:09:08 +00:00
Ville Pietilä
7f52f84167
Fix tile window size for c block.
2025-09-19 08:08:19 +00:00
Ville Pietilä
6bcdb0947e
LDS to global memory copy.
2025-09-18 14:59:32 +00:00
Ville Pietilä
0e09504057
WIP: merged conv groups GEMM epilogue changes.
2025-09-17 14:25:02 +00:00
Ville Pietilä
27a2ceb4f7
Increase the max number of reported errors.
2025-09-17 12:29:12 +00:00
Ville Pietilä
4ec81cb95c
Add more logging.
2025-09-17 12:27:51 +00:00
Ville Pietilä
9db02f2564
Added integration test for cshuffle epilogue LDS tile distribution.
2025-09-17 11:45:00 +00:00
Ville Pietilä
4eba92c290
Improve test for accessing diagonal blocks.
2025-09-17 08:27:50 +00:00
Ville Pietilä
9175bef679
Improved integration test.
2025-09-16 15:29:01 +00:00
Ville Pietilä
0d802a305f
Integration test for reading diagonal blocks from grouped distributed tensor.
2025-09-16 13:57:57 +00:00
Ville Pietilä
13e4ad093e
Add more tests for tensor view.
2025-09-15 15:32:51 +00:00
Ville Pietilä
e21ce62e53
Change example to match optimally depthwise convolution with merged groups.
2025-09-15 12:56:03 +00:00
Ville Pietilä
ff9732b937
Add tests for tile_distribution_encoding.
2025-09-12 14:03:37 +00:00
Ville Pietilä
6d318ab481
Enable running multiple conv groups per batch.
2025-09-12 14:03:04 +00:00
Ville Pietilä
0d5c1b9638
WIP: Merged conv groups epilogue.
2025-09-11 15:24:36 +00:00
Ville Pietilä
419fd88494
Adde unit tests for tensor view.
2025-09-10 13:33:08 +00:00
Ville Pietilä
970b40aa6c
WIP: Merged conv groups offset calculation.
2025-09-09 11:33:31 +00:00
Ville Pietilä
d9f0a9cdd0
Fully working conv group merging for TransformConvBwdWeightToGemm.
2025-09-09 09:58:43 +00:00
Ville Pietilä
bc63757ad6
Add unit tests for coordinate transforms.
2025-09-09 07:09:07 +00:00
Ville Pietilä
8845b23254
WIP: Tensor transformations.
2025-09-08 15:41:54 +00:00
Ville Pietilä
1a2b0dcb44
Added unit tests for TransformConvBwdWeightToGemm conv groups are merged.
2025-09-05 13:05:25 +00:00
Ville Pietilä
81a617c108
Initial set of tests for TransformConvBwdWeightToGemm.
2025-09-05 12:32:38 +00:00
Ville Pietilä
61b3c96273
Add number of groups to merge to ck tile grouped gemm example.
2025-09-04 14:24:23 +00:00
Ville Pietilä
e8d0c04a1b
Fix grouped conv bwd weight example output in CK Tile.
2025-09-04 12:02:42 +00:00
Ville Pietilä
2b1908a375
Fix compilation of the grouped conv examples.
2025-09-04 12:01:49 +00:00
linqunAMD
e2d28a92af
Extend XDL kernel to Support RDNA3/4 - Part 2 ( #2722 )
...
Update Blockwise and Gridwise files to support both wave32 & wave64.
1. Calculate WaveSize from template parameter, instead of hard code it to 64, some "64" is also replace with WaveSize
2. Move BN0Shuffled and BK0Shuffled to device side. we can't get correct mfma inst info in host side.
3. Update b_thread_offset_n and b_thread_offset_k in gridwise_gemm_xdl_cshuffle_v3_b_scale.hpp for gfx11. in gfx11, input data is duplicated for each 16 threads, it is different with all of others.
4. Modify a1_threadwise_copy in gridwise_batched_*gemm*gemm for gfx11. for gfx11, we need duplicate input and swizzle A if transposeC isn't enabled.
2025-09-04 08:33:40 +08:00