Files
composable_kernel/library/include
Ville Pietilä 2b2a39be98 [rocm-libraries] ROCm/rocm-libraries#4275 (commit 2e07a39)
[CK] Add new fwd conv fp16/bf16 instances optimized for unit
 group size. (#4275)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

## Proposed changes

Added new FP16/BF16 instances that are optimized for group size = 1. The
new instance use the compute optimized block GEMM pipeline.

| CK prof command | Baseline (TFLOPs) | New V3 instances (TFLOPs) |
|:-----|:------:|------:|
| grouped_conv_fwd 1 1 1 0 1 0 1 2 1 32 2376 256 3 3 100 100 1 1 1 1 1 1
1 1 | 858.818 | 962.293 |
| grouped_conv_fwd 1 1 1 0 1 0 1 2 1 32 256 256 3 3 100 100 1 1 1 1 1 1
1 1 | 979.987 | 1121.11 |
| grouped_conv_fwd 1 1 1 0 1 0 1 2 1 32 2376 256 3 3 50 50 1 1 1 1 1 1 1
1 | 945.951 | 1091.66 |
2026-02-18 00:59:15 +00:00
..