Files
composable_kernel/include/ck/tensor_operation/gpu/device
linqunAMD 1749c0409e [CK][CONV] Support NCHW in class DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle (#2375)
1. When conv spec is 1x1 stride1 pad0, nchw is equal with matrix A + column major, we only need minor change in conv transformer to support it.
2. when out is NKHW, it is equal with matrix C with column major. we need swap A & B to get best performance.
3. Add new instance device_grouped_conv_fwd_xdl_f16_nchw_instances for nchw.
2025-06-26 08:32:39 +08:00
..
2024-03-08 17:11:51 -08:00
2025-03-10 11:16:44 +08:00
2023-08-15 02:25:28 +08:00
2024-06-25 16:37:35 -05:00