# Client Example: Grouped Convolution with Activation and Fusion

## Theory

This client example demonstrates **grouped convolution fused with various activation and elementwise operations**. Grouped convolution splits the input and weights into groups and applies convolution independently to each group, while fusion with activation and scaling improves efficiency.

**Mathematical Formulation:**
For each group $g$:
- Convolution: $Y^g = \text{Conv}(X^g, W^g)$
- Fused operations: $E^g = f(Y^g, D_0^g, D_1^g, ...)$
  - $f$ can be bilinear, scale, add, relu, etc.

**Algorithmic Background:**
- Grouped convolution is used in efficient CNNs, depthwise separable convolutions, and expert models.
- Fused epilogue operations (scale, add, relu, reduce) are performed in registers before writing to memory.
- Supports 1D, 2D, and 3D grouped convolutions and a variety of fusion patterns.

## How to Run

### Prerequisites

Please follow the instructions in the main [Build Guide](../../README.md#building-ck) section as a prerequisite to building and running this example.

### Build and run
```bash
cd composable_kernel/client_example/24_grouped_conv_activation
mkdir build && cd build
cmake -DCMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc ..
make -j

# Example run (grouped conv + scale)
./grouped_convnd_fwd_scale/grouped_convnd_fwd_scale

# Example run (grouped conv + bilinear)
./grouped_convnd_fwd_bilinear/grouped_convnd_fwd_bilinear

# Example run (grouped conv + scale + relu)
./grouped_convnd_fwd_convscale_relu/grouped_convnd_fwd_convscale_relu

# Example run (grouped conv + scale + add + relu)
./grouped_convnd_fwd_scaleadd_scaleadd_relu/grouped_convnd_fwd_scaleadd_scaleadd_relu
```

## Source Code Structure

### Directory Layout
```
client_example/24_grouped_conv_activation/
├── grouped_convnd_fwd_scale/                  # Grouped conv + scale
├── grouped_convnd_fwd_bilinear/               # Grouped conv + bilinear
├── grouped_convnd_fwd_convscale/              # Grouped conv + scale (convscale)
├── grouped_convnd_fwd_convscale_add/          # Grouped conv + scale + add
├── grouped_convnd_fwd_convscale_reduce/       # Grouped conv + scale + reduce
├── grouped_convnd_fwd_convscale_relu/         # Grouped conv + scale + relu
├── grouped_convnd_fwd_convinvscale/           # Grouped conv + inverse scale
├── grouped_convnd_fwd_scaleadd_ab/            # Grouped conv + scale + add (A/B)
├── grouped_convnd_fwd_scaleadd_scaleadd_relu/ # Grouped conv + scale + add + relu
├── grouped_convnd_bwd_data_bilinear/          # Grouped conv bwd data + bilinear
├── grouped_convnd_bwd_data_scale/             # Grouped conv bwd data + scale
├── grouped_convnd_bwd_weight_bilinear/        # Grouped conv bwd weight + bilinear
├── grouped_convnd_bwd_weight_scale/           # Grouped conv bwd weight + scale
├── CMakeLists.txt                             # Build configuration for the example
```

### Key Functions

- **main()** (in each subdirectory's `.cpp`):  
  Sets up input tensors, configures grouped convolution and fusion parameters, launches the kernel, and verifies the result.
- **Grouped convolution kernel invocation**:  
  Uses the Composable Kernel device API to launch grouped convolution with various fused epilogue operations.

---

## Additional Details

- Supports a wide range of fusion patterns (bilinear, scale, add, relu, reduce, etc.).
- Example parameters can be adjusted in the source for different workloads.

---

## Related Examples

- [10_grouped_convnd_bwd_data](../10_grouped_convnd_bwd_data/README.md): Grouped convolution backward data
- [11_grouped_conv_bwd_weight](../11_grouped_conv_bwd_weight/README.md): Grouped convolution backward weight
- [30_grouped_conv_fwd_multiple_d](../../example/30_grouped_conv_fwd_multiple_d/README.md): Grouped convolution forward with multiple D

---
[Back to Client Examples](../README.md)