mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-14 18:17:44 +00:00
## Proposed changes Added new FP16/BF16 instances that are optimized for group size = 1. The new instance use the compute optimized block GEMM pipeline. | CK prof command | Baseline (TFLOPs) | New V3 instances (TFLOPs) | |:-----|:------:|------:| | grouped_conv_fwd 1 1 1 0 1 0 1 2 1 32 2376 256 3 3 100 100 1 1 1 1 1 1 1 1 | 858.818 | 962.293 | | grouped_conv_fwd 1 1 1 0 1 0 1 2 1 32 256 256 3 3 100 100 1 1 1 1 1 1 1 1 | 979.987 | 1121.11 | | grouped_conv_fwd 1 1 1 0 1 0 1 2 1 32 2376 256 3 3 50 50 1 1 1 1 1 1 1 1 | 945.951 | 1091.66 | --- 🔁 Imported from [ROCm/composable_kernel#3670](https://github.com/ROCm/composable_kernel/pull/3670) 🧑💻 Originally authored by @vpietila-amd --------- Co-authored-by: Ville Pietilä <> Co-authored-by: Bartłomiej Kocot <barkocot@amd.com> Co-authored-by: Ville Pietilä <188998872+vpietila-amd@users.noreply.github.com> Co-authored-by: systems-assistant[bot] <systems-assistant[bot]@users.noreply.github.com>