Commit Graph

2330 Commits

Author SHA1 Message Date
Ville Pietilä
99fe3df99a Fix tensor descriptors. 2025-10-03 14:23:04 +00:00
Ville Pietilä
9510171377 WIP: Put back the generic tensor descriptors for convolutions. 2025-10-02 15:06:30 +00:00
Ville Pietilä
c3f0c1a866 Add additional check for non-supported c > 1 case. 2025-09-30 07:46:24 +00:00
Ville Pietilä
db835e065c Make MPerGroup and NPerGroup template parameters. 2025-09-30 07:14:28 +00:00
Ville Pietilä
1a6f602c65 Remove debug code. 2025-09-30 05:53:28 +00:00
Ville Pietilä
193907fd85 Fix case k > 1 and c=1. 2025-09-29 16:02:00 +00:00
Ville Pietilä
558054eadb WIP: Simplify conv to gemm transformations and handle K > 1 and C > 1 cases. 2025-09-26 13:38:24 +00:00
Ville Pietilä
8babf7195a Fix strides in 1D conv to gemm transformation. 2025-09-26 09:38:11 +00:00
Ville Pietilä
354dd5039c Add compile check for assumed row-mjor layout. 2025-09-26 08:39:39 +00:00
Ville Pietilä
1764c77fb2 Enable running multiple GEMM batches of merged conv groups. 2025-09-26 07:51:29 +00:00
Ville Pietilä
b864c077ed Code clean-up for bwd tensor transformations. 2025-09-25 15:09:08 +00:00
Ville Pietilä
0ea3268d5d Remove debug and other dead code. 2025-09-25 09:41:33 +00:00
Ville Pietilä
cc7433efc6 Add more comments, disable debug code. 2025-09-25 09:37:15 +00:00
Ville Pietilä
97f842f2c6 Fully functional LDS to global mem transfer using tensor descriptor and tile distribution encoding. 2025-09-25 09:30:50 +00:00
Ville Pietilä
625a78b17b WIP: LDS to global mem transfer using CK tile tensor descriptor and tile distribution encoding. 2025-09-24 15:08:01 +00:00
Ville Pietilä
7280df1bc3 Add one more unit test for tensor view. 2025-09-24 12:10:26 +00:00
Ville Pietilä
73fb5a026a Initial unit tests for tensor descriptor. 2025-09-24 08:31:42 +00:00
Ville Pietilä
8048d6ff73 Fix build. 2025-09-23 11:17:08 +00:00
Ville Pietilä
e6f6c4a6a3 Working baseline for depthwise covolution with merged conv groups. 2025-09-23 11:14:10 +00:00
Ville Pietilä
29e3112b9b Epilogue fixes. 2025-09-22 15:38:02 +00:00
Ville Pietilä
d7da3d5089 Offset fixes. 2025-09-22 15:37:46 +00:00
Ville Pietilä
dafcb39496 Tile example parameters changes to match depthwise conv. 2025-09-22 12:12:58 +00:00
Ville Pietilä
7dfbac5d0b WIP: Separate epilogue for merged conv groups. 2025-09-19 13:52:33 +00:00
Ville Pietilä
437599c517 Improved CShuffle test. 2025-09-19 13:45:39 +00:00
Ville Pietilä
af6838e5dc Integration test for CShuffle epilogue. 2025-09-19 12:09:08 +00:00
Ville Pietilä
7f52f84167 Fix tile window size for c block. 2025-09-19 08:08:19 +00:00
Ville Pietilä
6bcdb0947e LDS to global memory copy. 2025-09-18 14:59:32 +00:00
Ville Pietilä
0e09504057 WIP: merged conv groups GEMM epilogue changes. 2025-09-17 14:25:02 +00:00
Ville Pietilä
27a2ceb4f7 Increase the max number of reported errors. 2025-09-17 12:29:12 +00:00
Ville Pietilä
4ec81cb95c Add more logging. 2025-09-17 12:27:51 +00:00
Ville Pietilä
9db02f2564 Added integration test for cshuffle epilogue LDS tile distribution. 2025-09-17 11:45:00 +00:00
Ville Pietilä
4eba92c290 Improve test for accessing diagonal blocks. 2025-09-17 08:27:50 +00:00
Ville Pietilä
9175bef679 Improved integration test. 2025-09-16 15:29:01 +00:00
Ville Pietilä
0d802a305f Integration test for reading diagonal blocks from grouped distributed tensor. 2025-09-16 13:57:57 +00:00
Ville Pietilä
13e4ad093e Add more tests for tensor view. 2025-09-15 15:32:51 +00:00
Ville Pietilä
e21ce62e53 Change example to match optimally depthwise convolution with merged groups. 2025-09-15 12:56:03 +00:00
Ville Pietilä
ff9732b937 Add tests for tile_distribution_encoding. 2025-09-12 14:03:37 +00:00
Ville Pietilä
6d318ab481 Enable running multiple conv groups per batch. 2025-09-12 14:03:04 +00:00
Ville Pietilä
0d5c1b9638 WIP: Merged conv groups epilogue. 2025-09-11 15:24:36 +00:00
Ville Pietilä
419fd88494 Adde unit tests for tensor view. 2025-09-10 13:33:08 +00:00
Ville Pietilä
970b40aa6c WIP: Merged conv groups offset calculation. 2025-09-09 11:33:31 +00:00
Ville Pietilä
d9f0a9cdd0 Fully working conv group merging for TransformConvBwdWeightToGemm. 2025-09-09 09:58:43 +00:00
Ville Pietilä
bc63757ad6 Add unit tests for coordinate transforms. 2025-09-09 07:09:07 +00:00
Ville Pietilä
8845b23254 WIP: Tensor transformations. 2025-09-08 15:41:54 +00:00
Ville Pietilä
1a2b0dcb44 Added unit tests for TransformConvBwdWeightToGemm conv groups are merged. 2025-09-05 13:05:25 +00:00
Ville Pietilä
81a617c108 Initial set of tests for TransformConvBwdWeightToGemm. 2025-09-05 12:32:38 +00:00
Ville Pietilä
61b3c96273 Add number of groups to merge to ck tile grouped gemm example. 2025-09-04 14:24:23 +00:00
Ville Pietilä
e8d0c04a1b Fix grouped conv bwd weight example output in CK Tile. 2025-09-04 12:02:42 +00:00
Ville Pietilä
2b1908a375 Fix compilation of the grouped conv examples. 2025-09-04 12:01:49 +00:00
linqunAMD
e2d28a92af Extend XDL kernel to Support RDNA3/4 - Part 2 (#2722)
Update Blockwise and Gridwise files to support both wave32 & wave64.

1. Calculate WaveSize from template parameter, instead of hard code it to 64, some "64" is also replace with WaveSize
2. Move BN0Shuffled and BK0Shuffled to device side. we can't get correct mfma inst info in host side.
3. Update b_thread_offset_n and b_thread_offset_k in gridwise_gemm_xdl_cshuffle_v3_b_scale.hpp for gfx11. in gfx11, input data is duplicated for each 16 threads, it is different with all of others.
4. Modify a1_threadwise_copy in gridwise_batched_*gemm*gemm for gfx11.  for gfx11, we need duplicate input and swizzle A if transposeC isn't enabled.
2025-09-04 08:33:40 +08:00