Files
composable_kernel/library
assistant-librarian[bot] 676ed06e53 [CK] Add new fwd conv fp16/bf16 instances optimized for unit group size. (#4275)
## Proposed changes

Added new FP16/BF16 instances that are optimized for group size = 1. The
new instance use the compute optimized block GEMM pipeline.

| CK prof command | Baseline (TFLOPs) | New V3 instances (TFLOPs) |
|:-----|:------:|------:|
| grouped_conv_fwd 1 1 1 0 1 0 1 2 1 32 2376 256 3 3 100 100 1 1 1 1 1 1
1 1 | 858.818 | 962.293 |
| grouped_conv_fwd 1 1 1 0 1 0 1 2 1 32 256 256 3 3 100 100 1 1 1 1 1 1
1 1 | 979.987 | 1121.11 |
| grouped_conv_fwd 1 1 1 0 1 0 1 2 1 32 2376 256 3 3 50 50 1 1 1 1 1 1 1
1 | 945.951 | 1091.66 |



---
🔁 Imported from
[ROCm/composable_kernel#3670](https://github.com/ROCm/composable_kernel/pull/3670)
🧑‍💻 Originally authored by @vpietila-amd

---------

Co-authored-by: Ville Pietilä <>
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>
Co-authored-by: Ville Pietilä <188998872+vpietila-amd@users.noreply.github.com>
Co-authored-by: systems-assistant[bot] <systems-assistant[bot]@users.noreply.github.com>
2026-02-17 17:58:11 -07:00
..