Add listing of all fwd and bwd device ops and instances.

This commit is contained in:
Ville Pietilä
2025-10-29 13:32:03 +00:00
parent ad8fca0253
commit e391b9d659
13 changed files with 60977 additions and 0 deletions

View File

@@ -0,0 +1,72 @@
# Backward Convolution Device Operations - Comprehensive Summary
**Generated:** October 29, 2025
**Namespace:** `ck::tensor_operation::device`
**Location:** `/library/include` and `/library/src` directories
---
## Overview
This document provides a comprehensive list of all device operations used for backward convolutions (both weight gradients and data gradients) in the Composable Kernel library.
### Total Statistics
- **Unique Device Operation Types:** 15
- **Backward Weight Operations:** 9 types (885 instantiations)
- **Backward Data Operations:** 7 types (1046 instantiations)
- **Total Template Instantiations:** 1827
---
## Backward Weight Device Operations
| # | Device Operation Name | Instantiations | Files | Primary Location |
|---|----------------------|----------------|-------|------------------|
| 1 | `DeviceGroupedConvBwdWeight` | 342 | 157 | `grouped_convolution_backward_weight.hpp` |
| 2 | `DeviceGroupedConvBwdWeightMultipleD_Xdl_CShuffle` | 161 | 2 | `device_grouped_conv_bwd_weight_xdl_bilinear_instance.hpp` |
| 3 | `DeviceGroupedConvBwdWeight_Xdl_CShuffle` | 108 | 1 | `device_grouped_conv_bwd_weight_xdl_instance.hpp` |
| 4 | `DeviceConv2dBwdDataXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K` | 104 | 4 | `device_conv2d_bwd_data_xdl_nhwc_kyxc_nhwk_bf16_instance.cpp` |
| 5 | `DeviceGroupedConvBwdWeightTwoStage_Xdl_CShuffle` | 96 | 1 | `device_grouped_conv_bwd_weight_two_stage_xdl_instance.hpp` |
| 6 | `DeviceGroupedConvBwdWeight_Wmma_CShuffle` | 41 | 1 | `device_grouped_conv_bwd_weight_wmma_instance.hpp` |
| 7 | `DeviceGroupedConvBwdWeight_Xdl_CShuffleV3` | 16 | 1 | `device_grouped_conv_bwd_weight_v3_xdl_instance.hpp` |
| 8 | `DeviceGroupedConvBwdWeightMultipleD` | 14 | 2 | `grouped_convolution_backward_weight_bilinear.hpp` |
| 9 | `DeviceGroupedConvBwdWeight_Dl` | 3 | 1 | `device_grouped_conv_bwd_weight_dl_instance.hpp` |
| **Total** | **Backward Weight Operations** | **885** | **170** | |
---
## Backward Data Device Operations
| # | Device Operation Name | Instantiations | Files | Primary Location |
|---|----------------------|----------------|-------|------------------|
| 1 | `DeviceGroupedConvBwdDataMultipleD_Xdl_CShuffle_v1` | 370 | 4 | `device_grouped_conv_bwd_data_transpose_xdl_instance.hpp` |
| 2 | `DeviceConvNdBwdDataNwcKxcNwk_Xdl` | 312 | 12 | `device_conv1d_bwd_data_xdl_nwc_kxc_nwk_bf16_instance.cpp` |
| 3 | `DeviceGroupedConvBwdDataMultipleD` | 156 | 76 | `grouped_convolution_backward_data.hpp` |
| 4 | `DeviceConv2dBwdDataXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K` | 104 | 4 | `device_conv2d_bwd_data_xdl_nhwc_kyxc_nhwk_bf16_instance.cpp` |
| 5 | `DeviceGroupedConvBwdDataMultipleD_Wmma_CShuffle` | 66 | 2 | `device_grouped_conv_bwd_data_wmma_f16_instance.hpp` |
| 6 | `DeviceConvBwdData` | 32 | 16 | `convolution_backward_data.hpp` |
| 7 | `DeviceConvNdBwdDataNwcKxcNwk_Dl` | 6 | 3 | `device_conv2d_bwd_data_dl_nhwc_kyxc_nhwk_f16_instance.cpp` |
| **Total** | **Backward Data Operations** | **1046** | **117** | |
### Grand Total: 1827 Template Instantiations across 15 Device Operation Types
---
## Output Files
### Complete Instantiation Files
1. **`backward_conv_all_instantiations.txt`**
- COMPLETE listing of ALL template instantiations
- Human-readable format with line numbers
- Organized by operation type, then by file
2. **`backward_conv_all_instantiations.json`**
- Structured JSON for programmatic instantiation generation
- Separated into `backward_weight_operations` and `backward_data_operations`
- Each instantiation includes full text and parsed parameters
- Ready for automated code generation
3. **`BACKWARD_CONVOLUTION_DEVICE_OPS_SUMMARY.md`** (This file)
- Executive summary with tables
- Quick reference for all backward operations

View File

@@ -0,0 +1,377 @@
# Forward Convolution Device Operations - Comprehensive Summary
**Generated:** October 29, 2025
**Namespace:** `ck::tensor_operation::device`
**Location:** `/library/include` and `/library/src` directories
---
## Overview
This document provides a comprehensive list of all device operations used for forward convolutions in the Composable Kernel library, along with their template instantiations.
### Total Statistics
- **Unique Device Operation Types:** 8
- **Total Template Instantiations:** 1,000
- **Files Analyzed:** 420
- **Header Files:** Located in `/library/include/ck/library/tensor_operation_instance/gpu/`
- **Source Files:** Located in `/library/src/tensor_operation_instance/gpu/`
---
## Device Operations Table
### Grouped Convolution Device Operations
| # | Device Operation Name | Instantiations | Files | Description |
|---|----------------------|----------------|-------|-------------|
| 1 | `DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle` | 570 | 9 | XDL-based with multiple ABD inputs, CShuffle optimization |
| 2 | `DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3` | 141 | 2 | Version 3 with advanced pipeline scheduling |
| 3 | `DeviceGroupedConvFwdMultipleD_Wmma_CShuffle` | 42 | 1 | WMMA-based (Wave Matrix Multiply Accumulate) |
| 4 | `DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor` | 12 | 1 | Optimized for large tensor dimensions |
| 5 | `DeviceGroupedConvFwdDlMultipleD_NHWC_KYXC_NHWK` | 6 | 1 | Direct Load implementation |
| **Total** | **Grouped Convolution Operations** | **771** | **14** | |
### Non-Grouped Convolution Device Operations
| # | Device Operation Name | Instantiations | Files | Description |
|---|----------------------|----------------|-------|-------------|
| 1 | `DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K` | 156 | 4 | Standard 2D convolution, explicit layout |
| 2 | `DeviceConv2dFwdXdl_C_Shuffle_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K` | 61 | 1 | 2D convolution with C-Shuffle optimization |
| 3 | `DeviceConvFwd` | 12 | 6 | Generic interface type |
| **Total** | **Non-Grouped Operations** | **229** | **11** | |
### Grand Total: 1,000 Template Instantiations across 8 Device Operation Types
---
## Device Operations List
### 1. DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle
**Most commonly used device operation for forward convolutions**
- **Template Instantiations:** 570
- **Files:** 9 header files
- **Primary Header:** `grouped_conv_fwd/device_grouped_conv_fwd_xdl_instance.hpp`
- **Description:** XDL-based grouped convolution with multiple auxiliary inputs/outputs and CShuffle optimization
- **Variants:**
- Standard instances (BF16, F16, F32, INT8, F8, BF8)
- 16x16 MFMA instances
- NCHW layout instances
- Generic instances
- Compute-friendly instances with FP8
**Key Files:**
- `device_grouped_conv_fwd_xdl_instance.hpp` - Main instantiations (193 instances)
- `device_grouped_conv_fwd_xdl_bilinear_instance.hpp` - With bilinear fusion (80 instances)
- `device_grouped_conv_fwd_xdl_scale_instance.hpp` - With scale operation (80 instances)
- `device_grouped_conv_fwd_xdl_dynamic_op_instance.hpp` - With dynamic operations (64 instances)
- `device_grouped_conv_fwd_xdl_outelementop_instance.hpp` - With custom output ops (80 instances)
- `device_grouped_conv_fwd_xdl_merged_groups_instance.hpp` - Merged groups optimization (21 instances)
- `device_grouped_conv_fwd_xdl_scaleadd_ab_instance.hpp` - ScaleAdd operations (20 instances)
- `device_grouped_conv_fwd_xdl_scaleadd_scaleadd_relu_instance.hpp` - Fused operations (16 instances)
- `device_grouped_conv_fwd_xdl_binary_outelementop_instance.hpp` - Binary output ops (16 instances)
---
### 2. DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K
**Standard 2D convolution forward operation**
- **Template Instantiations:** 156
- **Files:** 4 source files
- **Description:** XDL-based 2D convolution with explicit NHWC layout
**Instantiation Files:**
- `conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_bf16_instance.cpp` (39 instances)
- `conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f16_instance.cpp` (39 instances)
- `conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f32_instance.cpp` (39 instances)
- `conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_int8_instance.cpp` (39 instances)
---
### 3. DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3
**Version 3 of XDL CShuffle implementation**
- **Template Instantiations:** 141
- **Files:** 2 header files
- **Description:** Advanced XDL implementation with compute-friendly scheduling and memory optimizations
**Key Files:**
- `device_grouped_conv_fwd_xdl_comp_instance.hpp` (49 instances) - Compute-optimized variants
- `device_grouped_conv_fwd_xdl_mem_instance.hpp` (92 instances) - Memory-optimized variants
**Features:**
- BlockGemmPipelineScheduler variants (Interwave, Intrawave)
- Multiple pipeline versions (v1, v3, v4, v5)
- Support for BF16, F16, F32, TF32, INT8
---
### 4. DeviceConv2dFwdXdl_C_Shuffle_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K
**C-Shuffle variant for 2D convolution**
- **Template Instantiations:** 61
- **Files:** 1 source file
- **File:** `conv2d_fwd/device_conv2d_fwd_xdl_c_shuffle_nhwc_kyxc_nhwk_f16_instance.cpp`
- **Data Type:** F16 only
- **Description:** CShuffle optimization for channel dimension
---
### 5. DeviceGroupedConvFwdMultipleD_Wmma_CShuffle
**WMMA-based grouped convolution**
- **Template Instantiations:** 42
- **Files:** 1 header file
- **File:** `grouped_conv_fwd/device_grouped_conv_fwd_wmma_instance.hpp`
- **Description:** Uses Wave Matrix Multiply Accumulate (WMMA) instructions
- **Data Types:** F16, INT8
- **Block Sizes:** 32, 64, 128, 256
---
### 6. DeviceConvFwd
**Generic convolution forward interface**
- **Template Instantiations:** 12
- **Files:** 6 files (headers and sources)
- **Description:** High-level interface type for convolution forward operations
- **Usage:** Factory pattern and API definitions
---
### 7. DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor
**Large tensor optimization**
- **Template Instantiations:** 12
- **Files:** 1 header file
- **File:** `grouped_conv_fwd/device_grouped_conv_fwd_xdl_large_tensor_instance.hpp`
- **Description:** Optimized for large tensor dimensions
- **Data Types:** BF16, F16, F32, TF32, INT8
---
### 8. DeviceGroupedConvFwdDlMultipleD_NHWC_KYXC_NHWK
**Direct Load (DL) implementation**
- **Template Instantiations:** 6
- **Files:** 1 header file
- **File:** `grouped_conv_fwd/device_grouped_conv_fwd_dl_instance.hpp`
- **Description:** Direct load convolution implementation for specific layouts
- **Data Types:** F16, F32
---
## File Organization
### Header Files (`/library/include/ck/library/tensor_operation_instance/gpu/`)
#### Grouped Convolution Forward (`grouped_conv_fwd/`)
1. `device_grouped_conv_fwd_dl_instance.hpp` - Direct Load implementations
2. `device_grouped_conv_fwd_wmma_instance.hpp` - WMMA implementations
3. `device_grouped_conv_fwd_xdl_instance.hpp` - **Main XDL implementations**
4. `device_grouped_conv_fwd_xdl_bilinear_instance.hpp` - With bilinear fusion
5. `device_grouped_conv_fwd_xdl_binary_outelementop_instance.hpp` - Binary output ops
6. `device_grouped_conv_fwd_xdl_comp_instance.hpp` - Compute-optimized
7. `device_grouped_conv_fwd_xdl_dynamic_op_instance.hpp` - Dynamic operations
8. `device_grouped_conv_fwd_xdl_large_tensor_instance.hpp` - Large tensor optimization
9. `device_grouped_conv_fwd_xdl_mem_instance.hpp` - Memory-optimized
10. `device_grouped_conv_fwd_xdl_merged_groups_instance.hpp` - Merged groups
11. `device_grouped_conv_fwd_xdl_outelementop_instance.hpp` - Custom output operations
12. `device_grouped_conv_fwd_xdl_scale_instance.hpp` - With scale operation
13. `device_grouped_conv_fwd_xdl_scaleadd_ab_instance.hpp` - ScaleAdd on inputs
14. `device_grouped_conv_fwd_xdl_scaleadd_scaleadd_relu_instance.hpp` - Fused operations
### Source Files (`/library/src/tensor_operation_instance/gpu/`)
#### Conv2D Forward (`conv2d_fwd/`)
- Standard XDL instances for BF16, F16, F32, INT8
- C-Shuffle variant for F16
#### Grouped Conv1D/2D/3D Forward
Organized by:
- **Algorithm:** `dl/`, `wmma/`, `xdl/`
- **Optimization:** `comp/`, `mem/`, `large_tensor/`, `merged_groups/`
- **Data Type:** Per file (bf16, f16, f32, int8, fp8, bf8)
- **Layout:** Encoded in filename (nhwgc, ngchw, etc.)
---
## Template Instantiation Patterns
### Common Parameters
Template instantiations typically include:
- **Spatial Dimensions:** 1D, 2D, 3D (NDimSpatial)
- **Layouts:** NHWC, NCHW, and grouped variants (NHWGC, NGCHW, etc.)
- **Data Types:** BF16, F16, F32, TF32, INT8, F8, BF8
- **Accumulator Type:** F32, INT32
- **Block Sizes:** 32, 64, 128, 256
- **Thread Tile Sizes:** MPerBlock, NPerBlock, KPerBlock
- **MFMA Sizes:** 16x16, 32x32
- **Pipeline Stages:** 1-2 stages
- **Element-wise Operations:** PassThrough, Scale, ScaleAdd, Bilinear, ReLU, Clamp, etc.
### Example Instantiation (from DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle)
```cpp
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<
NDimSpatial, // 2 or 3
ALayout, // NHWGC, NGCHW, etc.
BLayout, // GKYXC, GKCYX, etc.
DsLayout, // Additional inputs layout
ELayout, // Output layout
BF16, // Input data type
BF16, // Weight data type
F32, // Accumulator type
BF16, // CShuffle data type
DsDataTypes, // Additional input types
BF16, // Output data type
PassThrough, // Input element-wise op
PassThrough, // Weight element-wise op
OutElementOp, // Output element-wise op
ConvSpec, // Convolution specialization
GemmMNKPadding, // GEMM specialization
1, // NumGemmKPrefetchStage
256, // BlockSize
128, // MPerBlock
128, // NPerBlock
32, // KPerBlock
8, // AK1
8, // BK1
32, // MPerXdl
32, // NPerXdl
2, // MXdlPerWave
2, // NXdlPerWave
// ... additional block transfer parameters
>
```
---
## Key Findings
1. **DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle is the primary device operation** with 570 instantiations across 9 different header files for various use cases and fusion patterns.
2. **Eight distinct device operation types** are used for forward convolutions, each optimized for different scenarios:
- XDL (Matrix Core) based operations
- WMMA (Wave Matrix Multiply) based operations
- Direct Load (DL) based operations
- Specialized variants for large tensors, merged groups, and memory optimization
3. **Multiple optimization strategies:**
- Compute-friendly scheduling (Interwave, Intrawave)
- Memory access patterns (Inter-wave, Intra-wave)
- Pipeline versions (v1, v3, v4, v5)
- CShuffle for efficient data movement
4. **Comprehensive data type support:** BF16, F16, F32, TF32, INT8, F8, BF8
5. **Extensive fusion support:** Operations can be fused with Scale, ScaleAdd, Bilinear, ReLU, Clamp, and combinations thereof.
---
## Output Files
### Complete Instantiation Files (Recommended)
1. **`forward_conv_all_instantiations.txt`** (593KB, 4,271 lines)
- **COMPLETE listing of ALL 1,000 template instantiations**
- Human-readable format with line numbers
- Every instantiation shown in full detail
- Organized by device operation, then by file
2. **`forward_conv_all_instantiations.json`** (1.4MB)
- **Structured JSON for programmatic instantiation generation**
- Complete instantiation text for each template
- Includes parsed parameters (data types, block sizes)
- Hierarchical organization: device_operation → file → instantiations[]
- Ready for automated code generation tools
### Summary and Quick Reference Files
3. **`FORWARD_CONVOLUTION_DEVICE_OPS_SUMMARY.md`** (This file)
- Executive summary with tables
- Device operation descriptions
- File organization reference
4. **`forward_convolution_device_ops_report.txt`** (33KB)
- High-level summary of all device operations
- Lists all files containing each device operation
- Quick reference guide
5. **`forward_conv_device_ops_detailed_report.txt`** (43KB)
- Detailed report with sample template instantiations
- Shows first 3 examples per file
- Includes line numbers and file locations
6. **`forward_convolution_device_ops_data.json`** (34KB)
- Basic JSON format with summary data
- File paths and instantiation counts
---
## Usage Examples
### Example 1: DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle with BF16
Found in: `device_grouped_conv_fwd_xdl_instance.hpp`
This is the most versatile device operation with:
- Generic instances for all block sizes
- Optimized instances for small conv.K and conv.C
- NCHW layout support
- 16x16 MFMA support
- FP8 compute support
### Example 2: DeviceGroupedConvFwdDlMultipleD_NHWC_KYXC_NHWK
Found in: `device_grouped_conv_fwd_dl_instance.hpp`
Direct Load implementation with:
- F16 and F32 support
- Specific block configurations
- NHWC layout only
### Example 3: DeviceGroupedConvFwdMultipleD_Wmma_CShuffle
Found in: `device_grouped_conv_fwd_wmma_instance.hpp`
WMMA-based implementation with:
- F16 and INT8 support
- Multiple block sizes (32, 64, 128, 256)
- 16x16 WMMA instruction usage
---
## File Locations Reference
### Main Device Operation Headers
```
library/include/ck/library/tensor_operation_instance/gpu/
├── grouped_conv_fwd/
│ ├── device_grouped_conv_fwd_xdl_instance.hpp ← PRIMARY FILE
│ ├── device_grouped_conv_fwd_dl_instance.hpp
│ ├── device_grouped_conv_fwd_wmma_instance.hpp
│ ├── device_grouped_conv_fwd_xdl_comp_instance.hpp
│ ├── device_grouped_conv_fwd_xdl_mem_instance.hpp
│ ├── device_grouped_conv_fwd_xdl_large_tensor_instance.hpp
│ └── ... (other variants)
└── convolution_forward.hpp ← INTERFACE DEFINITIONS
```
### Source Instantiations
```
library/src/tensor_operation_instance/gpu/
├── conv2d_fwd/ ← Non-grouped 2D convolutions
├── grouped_conv1d_fwd/ ← 1D grouped convolutions
├── grouped_conv2d_fwd/ ← 2D grouped convolutions
│ ├── dl/ ← Direct Load variants
│ ├── wmma/ ← WMMA variants
│ └── xdl/ ← XDL variants
│ ├── comp/ ← Compute-optimized
│ ├── mem/ ← Memory-optimized
│ ├── large_tensor/ ← Large tensor optimized
│ └── merged_groups/ ← Merged groups optimized
└── grouped_conv3d_fwd/ ← 3D grouped convolutions
```
---
## Additional Resources
For detailed analysis and full template instantiations, refer to the generated reports:
- `forward_convolution_device_ops_report.txt` - Quick summary
- `forward_conv_device_ops_detailed_report.txt` - Full details with examples
- `forward_convolution_device_ops_data.json` - Machine-readable format
The analysis scripts used to generate this information:
- `extract_conv_fwd_device_ops.py` - Initial device operation extraction
- `extract_detailed_instantiations.py` - Detailed instantiation analysis

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,296 @@
#!/usr/bin/env python3
"""
Extract ALL template instantiations for forward convolution device operations.
Generates both human-readable .txt and programmatic JSON outputs.
"""
import os
import re
from pathlib import Path
from collections import defaultdict
import json
def find_files(base_paths, extensions):
"""Find all files with given extensions in base paths."""
files = []
for base_path in base_paths:
path_obj = Path(base_path)
if path_obj.is_dir():
for ext in extensions:
files.extend(path_obj.rglob(f'*{ext}'))
return files
def extract_complete_instantiation(lines, start_idx, device_op_name):
"""Extract a complete device operation instantiation with balanced brackets."""
if start_idx >= len(lines):
return None, start_idx
line = lines[start_idx]
# Check if this line contains the device op
if device_op_name + '<' not in line:
return None, start_idx
# Start collecting the full instantiation
full_lines = [line]
bracket_count = line.count('<') - line.count('>')
# Continue to next lines until brackets are balanced
idx = start_idx + 1
while bracket_count > 0 and idx < len(lines):
full_lines.append(lines[idx])
bracket_count += lines[idx].count('<') - lines[idx].count('>')
idx += 1
# Join and clean up
full_text = '\n'.join(full_lines)
return {
'line_start': start_idx + 1,
'line_end': idx,
'full_text': full_text.strip(),
'compressed_text': ' '.join(full_text.split()) # Single line version
}, idx
def extract_all_instantiations(content, device_op_name):
"""Extract ALL device operation instantiations from content."""
instantiations = []
lines = content.split('\n')
idx = 0
while idx < len(lines):
inst, next_idx = extract_complete_instantiation(lines, idx, device_op_name)
if inst:
instantiations.append(inst)
idx = next_idx
else:
idx += 1
return instantiations
def parse_template_parameters(instantiation_text):
"""Parse template parameters from instantiation text (simplified)."""
# This is a simplified parser - actual parsing would need full C++ template parser
# For now, just extract key information
params = {}
# Extract data types
type_patterns = {
'BF16': r'\bBF16\b',
'F16': r'\bF16\b',
'F32': r'\bF32\b',
'TF32': r'\bTF32\b',
'INT8': r'\bint8_t\b',
'F8': r'\bF8\b',
'BF8': r'\bBF8\b',
}
for type_name, pattern in type_patterns.items():
if re.search(pattern, instantiation_text):
params[f'uses_{type_name}'] = True
# Extract block sizes if visible
block_size_match = re.search(r',\s*(\d{2,3}),\s*(\d{2,3}),\s*(\d{2,3}),\s*(\d{1,3}),', instantiation_text)
if block_size_match:
params['block_size'] = block_size_match.group(1)
params['m_per_block'] = block_size_match.group(2)
params['n_per_block'] = block_size_match.group(3)
params['k_per_block'] = block_size_match.group(4)
return params
def main():
# Device operations to extract
target_device_ops = [
'DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle',
'DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3',
'DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K',
'DeviceConv2dFwdXdl_C_Shuffle_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K',
'DeviceGroupedConvFwdMultipleD_Wmma_CShuffle',
'DeviceGroupedConvFwdDlMultipleD_NHWC_KYXC_NHWK',
'DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor',
'DeviceConvFwd',
]
# Define paths
header_paths = [
'library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd',
'library/include/ck/library/tensor_operation_instance/gpu',
]
source_paths = [
'library/src/tensor_operation_instance/gpu/grouped_conv1d_fwd',
'library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd',
'library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd',
'library/src/tensor_operation_instance/gpu/conv2d_fwd',
]
# Find all files
header_files = find_files(header_paths, ['.hpp', '.h', '.inc'])
source_files = find_files(source_paths, ['.cpp', '.hpp', '.inc', '.in'])
all_files = list(set(header_files + source_files))
print(f"Analyzing {len(all_files)} files for ALL instantiations...\n")
# Collect data
all_results = {}
for device_op in target_device_ops:
print(f"Extracting all instantiations for {device_op}...")
device_op_data = {
'device_operation_name': device_op,
'files': {},
'total_instantiations': 0
}
for file_path in all_files:
try:
with open(file_path, 'r', encoding='utf-8', errors='ignore') as f:
content = f.read()
if device_op in content:
instantiations = extract_all_instantiations(content, device_op)
if instantiations:
rel_path = str(file_path).replace(os.getcwd() + '/', '')
# Process each instantiation
processed_insts = []
for inst in instantiations:
processed_inst = {
'line_start': inst['line_start'],
'line_end': inst['line_end'],
'full_text': inst['full_text'],
'compressed_text': inst['compressed_text'],
'template_params': parse_template_parameters(inst['full_text'])
}
processed_insts.append(processed_inst)
device_op_data['files'][rel_path] = processed_insts
device_op_data['total_instantiations'] += len(instantiations)
except Exception as e:
print(f" Error processing {file_path}: {e}")
all_results[device_op] = device_op_data
print(f" Found {device_op_data['total_instantiations']} instantiations in {len(device_op_data['files'])} files")
# Generate comprehensive TXT report
print("\nGenerating comprehensive TXT report...")
txt_report = []
txt_report.append("=" * 120)
txt_report.append("COMPLETE FORWARD CONVOLUTION DEVICE OPERATIONS AND ALL TEMPLATE INSTANTIATIONS")
txt_report.append("=" * 120)
txt_report.append("")
txt_report.append(f"Namespace: ck::tensor_operation::device")
txt_report.append(f"Total Device Operation Types: {len(target_device_ops)}")
total_insts = sum(data['total_instantiations'] for data in all_results.values())
txt_report.append(f"Total Template Instantiations: {total_insts}")
txt_report.append("")
# Summary table
txt_report.append("SUMMARY OF DEVICE OPERATIONS")
txt_report.append("-" * 120)
for device_op in target_device_ops:
data = all_results[device_op]
txt_report.append(f"{data['total_instantiations']:4d} instantiations | {len(data['files']):3d} files | {device_op}")
txt_report.append("")
txt_report.append("=" * 120)
txt_report.append("")
# Detailed section for each device op - ALL instantiations
for device_op in target_device_ops:
data = all_results[device_op]
txt_report.append("")
txt_report.append("=" * 120)
txt_report.append(f"DEVICE OPERATION: {device_op}")
txt_report.append("=" * 120)
txt_report.append("")
txt_report.append(f"Total Instantiations: {data['total_instantiations']}")
txt_report.append(f"Number of Files: {len(data['files'])}")
txt_report.append("")
# List ALL instantiations in each file
for file_path, instantiations in sorted(data['files'].items()):
txt_report.append("-" * 120)
txt_report.append(f"FILE: {file_path}")
txt_report.append(f"Instantiations: {len(instantiations)}")
txt_report.append("")
for idx, inst in enumerate(instantiations, 1):
txt_report.append(f"[{idx}] Lines {inst['line_start']}-{inst['line_end']}:")
txt_report.append("")
# Include full text with proper indentation
for line in inst['full_text'].split('\n'):
txt_report.append(f" {line}")
txt_report.append("")
txt_report.append("")
txt_report.append("")
# Save TXT report
txt_output = 'forward_conv_all_instantiations.txt'
with open(txt_output, 'w') as f:
f.write('\n'.join(txt_report))
print(f"✓ TXT report saved: {txt_output}")
# Generate JSON for programmatic use
print("Generating JSON for programmatic use...")
json_data = {
'metadata': {
'description': 'Forward convolution device operations and template instantiations',
'namespace': 'ck::tensor_operation::device',
'total_device_operations': len(target_device_ops),
'total_instantiations': total_insts
},
'device_operations': {}
}
for device_op in target_device_ops:
data = all_results[device_op]
json_device_op = {
'name': device_op,
'total_instantiations': data['total_instantiations'],
'total_files': len(data['files']),
'instantiations_by_file': {}
}
for file_path, instantiations in data['files'].items():
file_insts = []
for inst in instantiations:
file_insts.append({
'line_start': inst['line_start'],
'line_end': inst['line_end'],
'instantiation_text': inst['full_text'],
'instantiation_compressed': inst['compressed_text'],
'parsed_parameters': inst['template_params']
})
json_device_op['instantiations_by_file'][file_path] = {
'count': len(file_insts),
'instantiations': file_insts
}
json_data['device_operations'][device_op] = json_device_op
# Save JSON
json_output = 'forward_conv_all_instantiations.json'
with open(json_output, 'w') as f:
json.dump(json_data, f, indent=2)
print(f"✓ JSON file saved: {json_output}")
# Summary
print(f"\n{'='*60}")
print(f"GENERATION COMPLETE")
print(f"{'='*60}")
print(f"TXT File: {txt_output}")
print(f"JSON File: {json_output}")
print(f"Total instantiations extracted: {total_insts}")
print(f"{'='*60}")
if __name__ == '__main__':
main()

View File

@@ -0,0 +1,484 @@
#!/usr/bin/env python3
"""
Extract ALL backward convolution device operations and template instantiations.
Handles both backward weight and backward data convolutions.
Generates human-readable .txt and programmatic JSON outputs.
"""
import os
import re
from pathlib import Path
from collections import defaultdict
import json
def find_files(base_paths, extensions):
"""Find all files with given extensions in base paths."""
files = []
for base_path in base_paths:
path_obj = Path(base_path)
if path_obj.is_dir():
for ext in extensions:
files.extend(path_obj.rglob(f'*{ext}'))
return files
def extract_complete_instantiation(lines, start_idx, device_op_name):
"""Extract a complete device operation instantiation with balanced brackets."""
if start_idx >= len(lines):
return None, start_idx
line = lines[start_idx]
# Check if this line contains the device op
if device_op_name + '<' not in line:
return None, start_idx
# Start collecting the full instantiation
full_lines = [line]
bracket_count = line.count('<') - line.count('>')
# Continue to next lines until brackets are balanced
idx = start_idx + 1
while bracket_count > 0 and idx < len(lines):
full_lines.append(lines[idx])
bracket_count += lines[idx].count('<') - lines[idx].count('>')
idx += 1
# Join and clean up
full_text = '\n'.join(full_lines)
return {
'line_start': start_idx + 1,
'line_end': idx,
'full_text': full_text.strip(),
'compressed_text': ' '.join(full_text.split())
}, idx
def extract_all_instantiations(content, device_op_name):
"""Extract ALL device operation instantiations from content."""
instantiations = []
lines = content.split('\n')
idx = 0
while idx < len(lines):
inst, next_idx = extract_complete_instantiation(lines, idx, device_op_name)
if inst:
instantiations.append(inst)
idx = next_idx
else:
idx += 1
return instantiations
def parse_template_parameters(instantiation_text):
"""Parse template parameters from instantiation text."""
params = {}
# Extract data types
type_patterns = {
'BF16': r'\bBF16\b',
'F16': r'\bF16\b',
'F32': r'\bF32\b',
'TF32': r'\bTF32\b',
'INT8': r'\bint8_t\b',
'F8': r'\bF8\b',
'BF8': r'\bBF8\b',
}
for type_name, pattern in type_patterns.items():
if re.search(pattern, instantiation_text):
params[f'uses_{type_name}'] = True
# Extract block sizes if visible
block_size_match = re.search(r',\s*(\d{2,3}),\s*(\d{2,3}),\s*(\d{2,3}),\s*(\d{1,3}),', instantiation_text)
if block_size_match:
params['block_size'] = block_size_match.group(1)
params['m_per_block'] = block_size_match.group(2)
params['n_per_block'] = block_size_match.group(3)
params['k_per_block'] = block_size_match.group(4)
return params
def find_device_operations(all_files):
"""Find all device operations matching backward convolution patterns."""
device_ops = set()
# Patterns for backward operations
patterns = [
r'(Device\w*Conv\w*Bwd\w*Weight\w*)<',
r'(Device\w*Conv\w*Bwd\w*Data\w*)<',
r'(Device\w*ConvBwd\w*)<',
]
for file_path in all_files:
try:
with open(file_path, 'r', encoding='utf-8', errors='ignore') as f:
content = f.read()
for pattern in patterns:
matches = re.finditer(pattern, content)
for match in matches:
device_ops.add(match.group(1))
except:
pass
return sorted(list(device_ops))
def main():
print("="*80)
print("EXTRACTING BACKWARD CONVOLUTION DEVICE OPERATIONS")
print("="*80)
print()
# Define paths
header_paths = [
'library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_bwd_weight',
'library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_bwd_data',
'library/include/ck/library/tensor_operation_instance/gpu',
]
source_paths = [
'library/src/tensor_operation_instance/gpu/grouped_conv1d_bwd_weight',
'library/src/tensor_operation_instance/gpu/grouped_conv2d_bwd_weight',
'library/src/tensor_operation_instance/gpu/grouped_conv3d_bwd_weight',
'library/src/tensor_operation_instance/gpu/grouped_convnd_bwd_weight',
'library/src/tensor_operation_instance/gpu/conv1d_bwd_data',
'library/src/tensor_operation_instance/gpu/conv2d_bwd_data',
'library/src/tensor_operation_instance/gpu/conv3d_bwd_data',
'library/src/tensor_operation_instance/gpu/grouped_conv1d_bwd_data',
'library/src/tensor_operation_instance/gpu/grouped_conv2d_bwd_data',
'library/src/tensor_operation_instance/gpu/grouped_conv3d_bwd_data',
]
# Find all files
header_files = find_files(header_paths, ['.hpp', '.h', '.inc'])
source_files = find_files(source_paths, ['.cpp', '.hpp', '.inc', '.in'])
all_files = list(set(header_files + source_files))
print(f"Found {len(all_files)} files to analyze")
# Discover all backward device operations
print("Discovering device operations...")
target_device_ops = find_device_operations(all_files)
print(f"Found {len(target_device_ops)} device operation types")
for op in target_device_ops:
print(f" - {op}")
print()
# Collect data
all_results = {}
for device_op in target_device_ops:
print(f"Extracting all instantiations for {device_op}...")
device_op_data = {
'device_operation_name': device_op,
'files': {},
'total_instantiations': 0
}
for file_path in all_files:
try:
with open(file_path, 'r', encoding='utf-8', errors='ignore') as f:
content = f.read()
if device_op in content:
instantiations = extract_all_instantiations(content, device_op)
if instantiations:
rel_path = str(file_path).replace(os.getcwd() + '/', '')
processed_insts = []
for inst in instantiations:
processed_inst = {
'line_start': inst['line_start'],
'line_end': inst['line_end'],
'full_text': inst['full_text'],
'compressed_text': inst['compressed_text'],
'template_params': parse_template_parameters(inst['full_text'])
}
processed_insts.append(processed_inst)
device_op_data['files'][rel_path] = processed_insts
device_op_data['total_instantiations'] += len(instantiations)
except Exception as e:
pass
all_results[device_op] = device_op_data
print(f" Found {device_op_data['total_instantiations']} instantiations in {len(device_op_data['files'])} files")
# Separate by type
bwd_weight_ops = {k: v for k, v in all_results.items() if 'Weight' in k or 'Wgt' in k}
bwd_data_ops = {k: v for k, v in all_results.items() if 'Data' in k}
print(f"\nBackward Weight Operations: {len(bwd_weight_ops)}")
print(f"Backward Data Operations: {len(bwd_data_ops)}")
# Generate TXT report
print("\nGenerating comprehensive TXT report...")
txt_report = generate_txt_report(all_results, bwd_weight_ops, bwd_data_ops)
txt_output = 'backward_conv_all_instantiations.txt'
with open(txt_output, 'w') as f:
f.write(txt_report)
print(f"✓ TXT report saved: {txt_output}")
# Generate JSON
print("Generating JSON for programmatic use...")
json_data = generate_json_data(all_results, bwd_weight_ops, bwd_data_ops)
json_output = 'backward_conv_all_instantiations.json'
with open(json_output, 'w') as f:
json.dump(json_data, f, indent=2)
print(f"✓ JSON file saved: {json_output}")
# Generate markdown summary
print("Generating markdown summary...")
md_content = generate_markdown_summary(all_results, bwd_weight_ops, bwd_data_ops)
md_output = 'BACKWARD_CONVOLUTION_DEVICE_OPS_SUMMARY.md'
with open(md_output, 'w') as f:
f.write(md_content)
print(f"✓ Markdown summary saved: {md_output}")
# Final summary
total_insts = sum(data['total_instantiations'] for data in all_results.values())
print(f"\n{'='*80}")
print(f"GENERATION COMPLETE")
print(f"{'='*80}")
print(f"Device Operations Found: {len(all_results)}")
print(f" - Backward Weight: {len(bwd_weight_ops)}")
print(f" - Backward Data: {len(bwd_data_ops)}")
print(f"Total Instantiations: {total_insts}")
print(f"\nFiles generated:")
print(f" - {txt_output}")
print(f" - {json_output}")
print(f" - {md_output}")
print(f"{'='*80}")
def generate_txt_report(all_results, bwd_weight_ops, bwd_data_ops):
"""Generate comprehensive TXT report."""
report = []
report.append("=" * 120)
report.append("COMPLETE BACKWARD CONVOLUTION DEVICE OPERATIONS AND ALL TEMPLATE INSTANTIATIONS")
report.append("=" * 120)
report.append("")
report.append(f"Namespace: ck::tensor_operation::device")
report.append(f"Total Device Operation Types: {len(all_results)}")
total_insts = sum(data['total_instantiations'] for data in all_results.values())
report.append(f"Total Template Instantiations: {total_insts}")
report.append("")
# Summary
report.append("SUMMARY OF DEVICE OPERATIONS")
report.append("-" * 120)
report.append("")
report.append("BACKWARD WEIGHT OPERATIONS:")
for device_op, data in sorted(bwd_weight_ops.items(), key=lambda x: x[1]['total_instantiations'], reverse=True):
report.append(f" {data['total_instantiations']:4d} instantiations | {len(data['files']):3d} files | {device_op}")
report.append("")
report.append("BACKWARD DATA OPERATIONS:")
for device_op, data in sorted(bwd_data_ops.items(), key=lambda x: x[1]['total_instantiations'], reverse=True):
report.append(f" {data['total_instantiations']:4d} instantiations | {len(data['files']):3d} files | {device_op}")
report.append("")
report.append("=" * 120)
report.append("")
# Detailed sections
for category_name, ops_dict in [("BACKWARD WEIGHT", bwd_weight_ops), ("BACKWARD DATA", bwd_data_ops)]:
report.append("")
report.append("=" * 120)
report.append(f"{category_name} OPERATIONS")
report.append("=" * 120)
report.append("")
for device_op in sorted(ops_dict.keys()):
data = all_results[device_op]
report.append("=" * 120)
report.append(f"DEVICE OPERATION: {device_op}")
report.append("=" * 120)
report.append("")
report.append(f"Total Instantiations: {data['total_instantiations']}")
report.append(f"Number of Files: {len(data['files'])}")
report.append("")
for file_path, instantiations in sorted(data['files'].items()):
report.append("-" * 120)
report.append(f"FILE: {file_path}")
report.append(f"Instantiations: {len(instantiations)}")
report.append("")
for idx, inst in enumerate(instantiations, 1):
report.append(f"[{idx}] Lines {inst['line_start']}-{inst['line_end']}:")
report.append("")
for line in inst['full_text'].split('\n'):
report.append(f" {line}")
report.append("")
report.append("")
report.append("")
return '\n'.join(report)
def generate_json_data(all_results, bwd_weight_ops, bwd_data_ops):
"""Generate JSON data for programmatic use."""
total_insts = sum(data['total_instantiations'] for data in all_results.values())
json_data = {
'metadata': {
'description': 'Backward convolution device operations and template instantiations',
'namespace': 'ck::tensor_operation::device',
'total_device_operations': len(all_results),
'backward_weight_operations': len(bwd_weight_ops),
'backward_data_operations': len(bwd_data_ops),
'total_instantiations': total_insts
},
'backward_weight_operations': {},
'backward_data_operations': {}
}
# Process backward weight operations
for device_op, data in bwd_weight_ops.items():
json_device_op = {
'name': device_op,
'total_instantiations': data['total_instantiations'],
'total_files': len(data['files']),
'instantiations_by_file': {}
}
for file_path, instantiations in data['files'].items():
file_insts = []
for inst in instantiations:
file_insts.append({
'line_start': inst['line_start'],
'line_end': inst['line_end'],
'instantiation_text': inst['full_text'],
'instantiation_compressed': inst['compressed_text'],
'parsed_parameters': inst['template_params']
})
json_device_op['instantiations_by_file'][file_path] = {
'count': len(file_insts),
'instantiations': file_insts
}
json_data['backward_weight_operations'][device_op] = json_device_op
# Process backward data operations
for device_op, data in bwd_data_ops.items():
json_device_op = {
'name': device_op,
'total_instantiations': data['total_instantiations'],
'total_files': len(data['files']),
'instantiations_by_file': {}
}
for file_path, instantiations in data['files'].items():
file_insts = []
for inst in instantiations:
file_insts.append({
'line_start': inst['line_start'],
'line_end': inst['line_end'],
'instantiation_text': inst['full_text'],
'instantiation_compressed': inst['compressed_text'],
'parsed_parameters': inst['template_params']
})
json_device_op['instantiations_by_file'][file_path] = {
'count': len(file_insts),
'instantiations': file_insts
}
json_data['backward_data_operations'][device_op] = json_device_op
return json_data
def generate_markdown_summary(all_results, bwd_weight_ops, bwd_data_ops):
"""Generate markdown summary document."""
total_insts = sum(data['total_instantiations'] for data in all_results.values())
bwd_weight_insts = sum(data['total_instantiations'] for data in bwd_weight_ops.values())
bwd_data_insts = sum(data['total_instantiations'] for data in bwd_data_ops.values())
md = []
md.append("# Backward Convolution Device Operations - Comprehensive Summary")
md.append("")
md.append("**Generated:** October 29, 2025 ")
md.append("**Namespace:** `ck::tensor_operation::device` ")
md.append("**Location:** `/library/include` and `/library/src` directories")
md.append("")
md.append("---")
md.append("")
md.append("## Overview")
md.append("")
md.append("This document provides a comprehensive list of all device operations used for backward convolutions (both weight gradients and data gradients) in the Composable Kernel library.")
md.append("")
md.append("### Total Statistics")
md.append(f"- **Unique Device Operation Types:** {len(all_results)}")
md.append(f"- **Backward Weight Operations:** {len(bwd_weight_ops)} types ({bwd_weight_insts} instantiations)")
md.append(f"- **Backward Data Operations:** {len(bwd_data_ops)} types ({bwd_data_insts} instantiations)")
md.append(f"- **Total Template Instantiations:** {total_insts}")
md.append("")
md.append("---")
md.append("")
# Backward Weight Table
md.append("## Backward Weight Device Operations")
md.append("")
md.append("| # | Device Operation Name | Instantiations | Files | Primary Location |")
md.append("|---|----------------------|----------------|-------|------------------|")
for idx, (device_op, data) in enumerate(sorted(bwd_weight_ops.items(), key=lambda x: x[1]['total_instantiations'], reverse=True), 1):
primary_file = sorted(data['files'].keys())[0] if data['files'] else "N/A"
primary_file_short = primary_file.split('/')[-1] if '/' in primary_file else primary_file
md.append(f"| {idx} | `{device_op}` | {data['total_instantiations']} | {len(data['files'])} | `{primary_file_short}` |")
md.append(f"| **Total** | **Backward Weight Operations** | **{bwd_weight_insts}** | **{sum(len(d['files']) for d in bwd_weight_ops.values())}** | |")
md.append("")
md.append("---")
md.append("")
# Backward Data Table
md.append("## Backward Data Device Operations")
md.append("")
md.append("| # | Device Operation Name | Instantiations | Files | Primary Location |")
md.append("|---|----------------------|----------------|-------|------------------|")
for idx, (device_op, data) in enumerate(sorted(bwd_data_ops.items(), key=lambda x: x[1]['total_instantiations'], reverse=True), 1):
primary_file = sorted(data['files'].keys())[0] if data['files'] else "N/A"
primary_file_short = primary_file.split('/')[-1] if '/' in primary_file else primary_file
md.append(f"| {idx} | `{device_op}` | {data['total_instantiations']} | {len(data['files'])} | `{primary_file_short}` |")
md.append(f"| **Total** | **Backward Data Operations** | **{bwd_data_insts}** | **{sum(len(d['files']) for d in bwd_data_ops.values())}** | |")
md.append("")
md.append(f"### Grand Total: {total_insts} Template Instantiations across {len(all_results)} Device Operation Types")
md.append("")
md.append("---")
md.append("")
# Output files section
md.append("## Output Files")
md.append("")
md.append("### Complete Instantiation Files")
md.append("")
md.append("1. **`backward_conv_all_instantiations.txt`**")
md.append(" - COMPLETE listing of ALL template instantiations")
md.append(" - Human-readable format with line numbers")
md.append(" - Organized by operation type, then by file")
md.append("")
md.append("2. **`backward_conv_all_instantiations.json`**")
md.append(" - Structured JSON for programmatic instantiation generation")
md.append(" - Separated into `backward_weight_operations` and `backward_data_operations`")
md.append(" - Each instantiation includes full text and parsed parameters")
md.append(" - Ready for automated code generation")
md.append("")
md.append("3. **`BACKWARD_CONVOLUTION_DEVICE_OPS_SUMMARY.md`** (This file)")
md.append(" - Executive summary with tables")
md.append(" - Quick reference for all backward operations")
md.append("")
return '\n'.join(md)
if __name__ == '__main__':
main()

View File

@@ -0,0 +1,195 @@
#!/usr/bin/env python3
"""
Extract all forward convolution device operations and their template instantiations
from CK library headers and source files.
"""
import os
import re
from pathlib import Path
from collections import defaultdict
import json
# Device operation patterns to search for
DEVICE_OP_PATTERNS = [
r'(Device\w*Conv\w*Fwd\w*)<', # Generic pattern for device ops
]
def find_files(base_paths, extensions):
"""Find all files with given extensions in base paths."""
files = []
for base_path in base_paths:
path_obj = Path(base_path)
if path_obj.is_dir():
for ext in extensions:
files.extend(path_obj.rglob(f'*{ext}'))
return files
def extract_device_operations(file_path):
"""Extract device operations from a file."""
device_ops = []
try:
with open(file_path, 'r', encoding='utf-8', errors='ignore') as f:
content = f.read()
# Search for device operation class names
for pattern in DEVICE_OP_PATTERNS:
matches = re.finditer(pattern, content)
for match in matches:
device_op_name = match.group(1)
# Store the device operation with file location
device_ops.append({
'name': device_op_name,
'file': str(file_path),
'line_context': None # We'll extract context later if needed
})
except Exception as e:
print(f"Error processing {file_path}: {e}")
return device_ops
def extract_template_instantiation(content, device_op_name):
"""Extract a single template instantiation for analysis."""
# Look for the full template instantiation
pattern = rf'{re.escape(device_op_name)}<[^>]*(?:<[^>]*>)*[^>]*>'
# This is complex due to nested templates, so let's use a simpler approach
# Just find lines with the device op name
lines = content.split('\n')
instantiations = []
for i, line in enumerate(lines):
if device_op_name in line and '<' in line:
# Try to capture the full instantiation (might span multiple lines)
instantiation_text = line.strip()
# Count template brackets to see if we have a complete instantiation
open_brackets = instantiation_text.count('<')
close_brackets = instantiation_text.count('>')
# If brackets are balanced, we have a complete line
if open_brackets > 0:
instantiations.append({
'line_number': i + 1,
'text': instantiation_text[:200] # Truncate for readability
})
return instantiations
def main():
# Define paths to search
header_paths = [
'library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd',
'library/include/ck/library/tensor_operation_instance/gpu',
]
source_paths = [
'library/src/tensor_operation_instance/gpu/grouped_conv1d_fwd',
'library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd',
'library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd',
'library/src/tensor_operation_instance/gpu/conv2d_fwd',
]
all_paths = header_paths + source_paths
# Find all relevant files
header_files = find_files(header_paths, ['.hpp', '.h'])
source_files = find_files(source_paths, ['.cpp', '.hpp', '.inc', '.in'])
all_files = list(set(header_files + source_files))
print(f"Found {len(all_files)} files to analyze")
# Collect all device operations
all_device_ops = []
for file_path in all_files:
ops = extract_device_operations(file_path)
all_device_ops.extend(ops)
# Group by device operation name
ops_by_name = defaultdict(list)
for op in all_device_ops:
ops_by_name[op['name']].append(op)
# Count unique device operations
unique_ops = sorted(ops_by_name.keys())
print(f"\nFound {len(unique_ops)} unique device operation types:")
for op_name in unique_ops:
count = len(ops_by_name[op_name])
print(f" {op_name}: {count} occurrences")
# Now extract detailed instantiations for each device op
device_op_details = {}
for device_op_name in unique_ops:
instantiation_files = set()
total_instantiations = 0
# Re-scan files that contain this device op
for file_path in all_files:
try:
with open(file_path, 'r', encoding='utf-8', errors='ignore') as f:
content = f.read()
if device_op_name in content:
instantiations = extract_template_instantiation(content, device_op_name)
if instantiations:
instantiation_files.add(str(file_path))
total_instantiations += len(instantiations)
except Exception as e:
pass
device_op_details[device_op_name] = {
'files': sorted(list(instantiation_files)),
'total_instantiations': total_instantiations
}
# Generate report
report = []
report.append("=" * 80)
report.append("FORWARD CONVOLUTION DEVICE OPERATIONS REPORT")
report.append("=" * 80)
report.append("")
report.append(f"Total Unique Device Operations: {len(unique_ops)}")
report.append("")
for device_op_name in unique_ops:
details = device_op_details[device_op_name]
report.append("-" * 80)
report.append(f"Device Operation: {device_op_name}")
report.append(f"Total Template Instantiations: {details['total_instantiations']}")
report.append(f"Number of Files: {len(details['files'])}")
report.append("")
report.append("Files containing this device operation:")
for file_path in details['files']:
# Make path relative to workspace
rel_path = file_path.replace(os.getcwd() + '/', '')
report.append(f" - {rel_path}")
report.append("")
# Save report
report_text = '\n'.join(report)
with open('forward_convolution_device_ops_report.txt', 'w') as f:
f.write(report_text)
print(f"\nReport saved to: forward_convolution_device_ops_report.txt")
# Also save as JSON for programmatic access
json_data = {
'summary': {
'total_device_operations': len(unique_ops),
'device_operation_names': unique_ops
},
'details': device_op_details
}
with open('forward_convolution_device_ops_data.json', 'w') as f:
json.dump(json_data, f, indent=2)
print(f"JSON data saved to: forward_convolution_device_ops_data.json")
if __name__ == '__main__':
main()

View File

@@ -0,0 +1,201 @@
#!/usr/bin/env python3
"""
Extract detailed template instantiations for forward convolution device operations.
This script parses the actual template parameters for each device operation instance.
"""
import os
import re
from pathlib import Path
from collections import defaultdict
def find_files(base_paths, extensions):
"""Find all files with given extensions in base paths."""
files = []
for base_path in base_paths:
path_obj = Path(base_path)
if path_obj.is_dir():
for ext in extensions:
files.extend(path_obj.rglob(f'*{ext}'))
return files
def extract_device_op_instantiations(content, device_op_name):
"""Extract complete device operation instantiations."""
instantiations = []
lines = content.split('\n')
i = 0
while i < len(lines):
line = lines[i]
# Check if this line contains the device op
if device_op_name + '<' in line:
# Start collecting the full instantiation
full_instantiation = line
bracket_count = line.count('<') - line.count('>')
# Continue to next lines if brackets aren't balanced
j = i + 1
while bracket_count > 0 and j < len(lines):
full_instantiation += '\n' + lines[j]
bracket_count += lines[j].count('<') - lines[j].count('>')
j += 1
# Clean up and store
instantiation_clean = full_instantiation.strip()
if instantiation_clean:
instantiations.append({
'line_start': i + 1,
'line_end': j,
'text': instantiation_clean
})
i = j
else:
i += 1
return instantiations
def main():
# Device operations we're interested in (from previous analysis)
target_device_ops = [
'DeviceConv2dFwdXdl_C_Shuffle_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K',
'DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K',
'DeviceConvFwd',
'DeviceGroupedConvFwdDlMultipleD_NHWC_KYXC_NHWK',
'DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle',
'DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3',
'DeviceGroupedConvFwdMultipleD_Wmma_CShuffle',
'DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor',
]
# Define paths to search
header_paths = [
'library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd',
'library/include/ck/library/tensor_operation_instance/gpu',
]
source_paths = [
'library/src/tensor_operation_instance/gpu/grouped_conv1d_fwd',
'library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd',
'library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd',
'library/src/tensor_operation_instance/gpu/conv2d_fwd',
]
# Find all relevant files
header_files = find_files(header_paths, ['.hpp', '.h', '.inc'])
source_files = find_files(source_paths, ['.cpp', '.hpp', '.inc', '.in'])
all_files = list(set(header_files + source_files))
print(f"Analyzing {len(all_files)} files for detailed instantiations...\n")
# Process each device operation
results = {}
for device_op in target_device_ops:
print(f"Processing {device_op}...")
device_op_data = {
'name': device_op,
'files_with_instantiations': {},
'total_instantiations': 0
}
for file_path in all_files:
try:
with open(file_path, 'r', encoding='utf-8', errors='ignore') as f:
content = f.read()
if device_op in content:
instantiations = extract_device_op_instantiations(content, device_op)
if instantiations:
rel_path = str(file_path).replace(os.getcwd() + '/', '')
device_op_data['files_with_instantiations'][rel_path] = instantiations
device_op_data['total_instantiations'] += len(instantiations)
except Exception as e:
print(f" Error processing {file_path}: {e}")
results[device_op] = device_op_data
print(f" Found {device_op_data['total_instantiations']} instantiations in {len(device_op_data['files_with_instantiations'])} files")
# Generate detailed report
report = []
report.append("=" * 100)
report.append("DETAILED FORWARD CONVOLUTION DEVICE OPERATIONS AND TEMPLATE INSTANTIATIONS")
report.append("=" * 100)
report.append("")
report.append(f"Generated for: ck::tensor_operation::device namespace")
report.append("")
# Summary
report.append("SUMMARY")
report.append("-" * 100)
total_all_instantiations = sum(data['total_instantiations'] for data in results.values())
report.append(f"Total Device Operation Types: {len(target_device_ops)}")
report.append(f"Total Template Instantiations: {total_all_instantiations}")
report.append("")
for device_op in target_device_ops:
data = results[device_op]
report.append(f"{device_op}")
report.append(f" - Instantiations: {data['total_instantiations']}")
report.append(f" - Files: {len(data['files_with_instantiations'])}")
report.append("")
report.append("=" * 100)
report.append("")
# Detailed section for each device op
for device_op in target_device_ops:
data = results[device_op]
report.append("=" * 100)
report.append(f"DEVICE OPERATION: {device_op}")
report.append("=" * 100)
report.append("")
report.append(f"Total Template Instantiations: {data['total_instantiations']}")
report.append(f"Number of Files: {len(data['files_with_instantiations'])}")
report.append("")
# List files and show sample instantiations
for file_path, instantiations in sorted(data['files_with_instantiations'].items()):
report.append("-" * 100)
report.append(f"File: {file_path}")
report.append(f"Instantiations in this file: {len(instantiations)}")
report.append("")
# Show up to 3 sample instantiations from each file
num_samples = min(3, len(instantiations))
if num_samples > 0:
report.append("Sample instantiations:")
for idx, inst in enumerate(instantiations[:num_samples]):
report.append(f" [{idx+1}] Line {inst['line_start']}:")
# Truncate very long instantiations
text = inst['text']
if len(text) > 500:
text = text[:500] + "..."
report.append(f" {text}")
if len(instantiations) > num_samples:
report.append(f" ... and {len(instantiations) - num_samples} more instantiations")
report.append("")
report.append("")
# Save detailed report
report_text = '\n'.join(report)
output_file = 'forward_conv_device_ops_detailed_report.txt'
with open(output_file, 'w') as f:
f.write(report_text)
print(f"\n{'='*60}")
print(f"Detailed report saved to: {output_file}")
print(f"Total instantiations found: {total_all_instantiations}")
print(f"{'='*60}")
if __name__ == '__main__':
main()

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,428 @@
====================================================================================================
DETAILED FORWARD CONVOLUTION DEVICE OPERATIONS AND TEMPLATE INSTANTIATIONS
====================================================================================================
Generated for: ck::tensor_operation::device namespace
SUMMARY
----------------------------------------------------------------------------------------------------
Total Device Operation Types: 8
Total Template Instantiations: 1000
• DeviceConv2dFwdXdl_C_Shuffle_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K
- Instantiations: 61
- Files: 1
• DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K
- Instantiations: 156
- Files: 4
• DeviceConvFwd
- Instantiations: 12
- Files: 6
• DeviceGroupedConvFwdDlMultipleD_NHWC_KYXC_NHWK
- Instantiations: 6
- Files: 1
• DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle
- Instantiations: 570
- Files: 9
• DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3
- Instantiations: 141
- Files: 2
• DeviceGroupedConvFwdMultipleD_Wmma_CShuffle
- Instantiations: 42
- Files: 1
• DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor
- Instantiations: 12
- Files: 1
====================================================================================================
====================================================================================================
DEVICE OPERATION: DeviceConv2dFwdXdl_C_Shuffle_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K
====================================================================================================
Total Template Instantiations: 61
Number of Files: 1
----------------------------------------------------------------------------------------------------
File: library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_c_shuffle_nhwc_kyxc_nhwk_f16_instance.cpp
Instantiations in this file: 61
Sample instantiations:
[1] Line 49:
DeviceConv2dFwdXdl_C_Shuffle_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K< F16, F16, F16, F32, PassThrough, PassThrough, PassThrough, ConvFwdDefault, 256, 256, 128, 4, 8, 32, 32, 4, 2, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, 1, 1, S<1, 1, 32, 1, 1, 8>, ...
[2] Line 50:
DeviceConv2dFwdXdl_C_Shuffle_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K< F16, F16, F16, F32, PassThrough, PassThrough, PassThrough, ConvFwdDefault, 256, 128, 256, 4, 8, 32, 32, 2, 4, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, 1, 1, S<1, 1, 32, 1, 1, 8>, ...
[3] Line 51:
DeviceConv2dFwdXdl_C_Shuffle_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K< F16, F16, F16, F32, PassThrough, PassThrough, PassThrough, ConvFwdDefault, 128, 128, 128, 4, 8, 32, 32, 4, 2, S<4, 32, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, S<4, 32, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, 1, 1, S<1, 1, 16, 1, 1, 8>, ...
... and 58 more instantiations
====================================================================================================
DEVICE OPERATION: DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K
====================================================================================================
Total Template Instantiations: 156
Number of Files: 4
----------------------------------------------------------------------------------------------------
File: library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_bf16_instance.cpp
Instantiations in this file: 39
Sample instantiations:
[1] Line 45:
DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K< BF16, BF16, BF16, F32, PassThrough, PassThrough, PassThrough, ConvFwdDefault, 256, 256, 128, 4, 8, 32, 32, 4, 2, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, 7, 1>,
[2] Line 46:
DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K< BF16, BF16, BF16, F32, PassThrough, PassThrough, PassThrough, ConvFwdDefault, 256, 128, 256, 4, 8, 32, 32, 2, 4, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, 7, 1>,
[3] Line 47:
DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K< BF16, BF16, BF16, F32, PassThrough, PassThrough, PassThrough, ConvFwdDefault, 128, 128, 128, 4, 8, 32, 32, 4, 2, S<4, 32, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, S<4, 32, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, 7, 1>,
... and 36 more instantiations
----------------------------------------------------------------------------------------------------
File: library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f16_instance.cpp
Instantiations in this file: 39
Sample instantiations:
[1] Line 45:
DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K< F16, F16, F16, F32, PassThrough, PassThrough, PassThrough, ConvFwdDefault, 256, 256, 128, 4, 8, 32, 32, 4, 2, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, 7, 1>,
[2] Line 46:
DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K< F16, F16, F16, F32, PassThrough, PassThrough, PassThrough, ConvFwdDefault, 256, 128, 256, 4, 8, 32, 32, 2, 4, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, 7, 1>,
[3] Line 47:
DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K< F16, F16, F16, F32, PassThrough, PassThrough, PassThrough, ConvFwdDefault, 128, 128, 128, 4, 8, 32, 32, 4, 2, S<4, 32, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, S<4, 32, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, 7, 1>,
... and 36 more instantiations
----------------------------------------------------------------------------------------------------
File: library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f32_instance.cpp
Instantiations in this file: 39
Sample instantiations:
[1] Line 44:
DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K< F32, F32, F32, F32, PassThrough, PassThrough, PassThrough, ConvFwdDefault, 256, 256, 128, 4, 4, 32, 32, 4, 2, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 4, 4, true, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 4, 4, true, 7, 1>,
[2] Line 45:
DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K< F32, F32, F32, F32, PassThrough, PassThrough, PassThrough, ConvFwdDefault, 256, 128, 256, 4, 4, 32, 32, 2, 4, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 4, 4, true, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 4, 4, true, 7, 1>,
[3] Line 46:
DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K< F32, F32, F32, F32, PassThrough, PassThrough, PassThrough, ConvFwdDefault, 128, 128, 128, 4, 4, 32, 32, 4, 2, S<4, 32, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 4, 4, true, S<4, 32, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 4, 4, true, 7, 1>,
... and 36 more instantiations
----------------------------------------------------------------------------------------------------
File: library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_int8_instance.cpp
Instantiations in this file: 39
Sample instantiations:
[1] Line 42:
DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K< int8_t, int8_t, int8_t, int32_t, PassThrough, PassThrough, PassThrough, ConvFwdDefault, 256, 256, 128, 4, 16, 32, 32, 4, 2, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 16, 16, true, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 16, 16, true, 7, 1>,
[2] Line 43:
DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K< int8_t, int8_t, int8_t, int32_t, PassThrough, PassThrough, PassThrough, ConvFwdDefault, 256, 128, 256, 4, 16, 32, 32, 2, 4, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 16, 16, true, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 16, 16, true, 7, 1>,
[3] Line 44:
DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K< int8_t, int8_t, int8_t, int32_t, PassThrough, PassThrough, PassThrough, ConvFwdDefault, 128, 128, 128, 4, 16, 32, 32, 4, 2, S<4, 32, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 16, 16, true, S<4, 32, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 16, 16, true, 7, 1>,
... and 36 more instantiations
====================================================================================================
DEVICE OPERATION: DeviceConvFwd
====================================================================================================
Total Template Instantiations: 12
Number of Files: 6
----------------------------------------------------------------------------------------------------
File: library/include/ck/library/tensor_operation_instance/gpu/convolution_forward.hpp
Instantiations in this file: 7
Sample instantiations:
[1] Line 24:
DeviceConvFwd<2, NHWC, KYXC, NHWK, F16, F16, F16, PassThrough, PassThrough, PassThrough>>>&
[2] Line 28:
DeviceConvFwd<2, NHWC, KYXC, NHWK, F16, F16, F16, PassThrough, PassThrough, PassThrough>>>&
[3] Line 33:
std::vector<std::unique_ptr<DeviceConvFwd<2,
NHWC,
KYXC,
NHWK,
BF16,
BF16,
BF16,
PassThrough,
PassThrough,
...
... and 4 more instantiations
----------------------------------------------------------------------------------------------------
File: library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_c_shuffle_nhwc_kyxc_nhwk_f16_instance.cpp
Instantiations in this file: 1
Sample instantiations:
[1] Line 180:
DeviceConvFwd<2, NHWC, KYXC, NHWK, F16, F16, F16, PassThrough, PassThrough, PassThrough>>>&
----------------------------------------------------------------------------------------------------
File: library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_bf16_instance.cpp
Instantiations in this file: 1
Sample instantiations:
[1] Line 106:
std::vector<std::unique_ptr<DeviceConvFwd<2,
NHWC,
KYXC,
NHWK,
BF16,
BF16,
BF16,
PassThrough,
PassThrough,
...
----------------------------------------------------------------------------------------------------
File: library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f16_instance.cpp
Instantiations in this file: 1
Sample instantiations:
[1] Line 107:
DeviceConvFwd<2, NHWC, KYXC, NHWK, F16, F16, F16, PassThrough, PassThrough, PassThrough>>>&
----------------------------------------------------------------------------------------------------
File: library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f32_instance.cpp
Instantiations in this file: 1
Sample instantiations:
[1] Line 106:
DeviceConvFwd<2, NHWC, KYXC, NHWK, F32, F32, F32, PassThrough, PassThrough, PassThrough>>>&
----------------------------------------------------------------------------------------------------
File: library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_int8_instance.cpp
Instantiations in this file: 1
Sample instantiations:
[1] Line 103:
std::vector<std::unique_ptr<DeviceConvFwd<2,
NHWC,
KYXC,
NHWK,
int8_t,
int8_t,
int8_t,
PassThrough,
PassThrough,
...
====================================================================================================
DEVICE OPERATION: DeviceGroupedConvFwdDlMultipleD_NHWC_KYXC_NHWK
====================================================================================================
Total Template Instantiations: 6
Number of Files: 1
----------------------------------------------------------------------------------------------------
File: library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_dl_instance.hpp
Instantiations in this file: 6
Sample instantiations:
[1] Line 57:
DeviceGroupedConvFwdDlMultipleD_NHWC_KYXC_NHWK< 2, F16, F16, DsDatatype, F16, F32, InLayout, WeiLayout, DsLayout, OutLayout, PassThrough, PassThrough, CDEElementOp, ConvSpec, GemmMNKPadding, 8, 16, 4, 2, 1, 1, 2, 1, S<4, 2>, S<1, 1>, S<2, 1, 2, 1>, S<1, 1, 8, 1>, S<1, 2, 0, 3>, S<1, 2, 0, 3>, S<1, 1, 1, 1>, S<1, 2, 0, 3>, S<1, 1, 1, 1>, S<1, 1, 1, 1>, S<2, 1, 4, 1>,...
[2] Line 58:
DeviceGroupedConvFwdDlMultipleD_NHWC_KYXC_NHWK< 2, F16, F16, DsDatatype, F16, F32, InLayout, WeiLayout, DsLayout, OutLayout, PassThrough, PassThrough, CDEElementOp, ConvSpec, GemmMNKPadding, 256, 128, 128, 16, 1, 4, 4, 1, S<8, 2>, S<8, 2>, S<8, 1, 1, 1>, S<2, 1, 128, 1>, S<1, 2, 0, 3>, S<1, 2, 0, 3>, S<1, 1, 1, 1>, S<1, 2, 0, 3>, S<1, 1, 1, 1>, S<8, 1, 1, 1>, S<2, 1, 128, 1>,...
[3] Line 60:
DeviceGroupedConvFwdDlMultipleD_NHWC_KYXC_NHWK< 2, F16, F16, DsDatatype, F16, F32, InLayout, WeiLayout, DsLayout, OutLayout, PassThrough, PassThrough, CDEElementOp, ConvSpec, GemmMNKPadding, 256, 128, 128, 16, 2, 4, 4, 1, S<8, 2>, S<8, 2>, S<8, 1, 1, 2>, S<2, 1, 128, 1>, S<1, 2, 0, 3>, S<1, 2, 0, 3>, S<4, 1, 1, 2>, S<1, 2, 0, 3>, S<1, 1, 1, 2>, S<8, 1, 1, 2>, S<2, 1, 128, 1>,...
... and 3 more instantiations
====================================================================================================
DEVICE OPERATION: DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle
====================================================================================================
Total Template Instantiations: 570
Number of Files: 9
----------------------------------------------------------------------------------------------------
File: library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_bilinear_instance.hpp
Instantiations in this file: 80
Sample instantiations:
[1] Line 54:
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, Tuple<BF16>, BF16, PassThrough, PassThrough, Bilinear, ConvSpec, GemmMNKPadding, 1, 64, 64, 64, 32, 8, 8, 32, 32, 2, 2, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, ...
[2] Line 56:
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, Tuple<BF16>, BF16, PassThrough, PassThrough, Bilinear, ConvSpec, GemmMNKPadding, 1, 64, 64, 32, 32, 8, 8, 32, 32, 2, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, ...
[3] Line 57:
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, Tuple<BF16>, BF16, PassThrough, PassThrough, Bilinear, ConvSpec, GemmMNKPadding, 1, 256, 128, 128, 32, 8, 8, 32, 32, 2, 2, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, ...
... and 77 more instantiations
----------------------------------------------------------------------------------------------------
File: library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_binary_outelementop_instance.hpp
Instantiations in this file: 16
Sample instantiations:
[1] Line 53:
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, F8, F8, F32, F32, Tuple<F32>, F8, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 1, 64, 64, 64, 32, 8, 8, 32, 32, 2, 2, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, ...
[2] Line 55:
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, F8, F8, F32, F32, Tuple<F32>, F8, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 1, 64, 64, 32, 32, 8, 8, 32, 32, 2, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, ...
[3] Line 56:
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, F8, F8, F32, F32, Tuple<F32>, F8, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 1, 256, 128, 128, 32, 8, 8, 32, 32, 2, 2, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, ...
... and 13 more instantiations
----------------------------------------------------------------------------------------------------
File: library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_dynamic_op_instance.hpp
Instantiations in this file: 64
Sample instantiations:
[1] Line 54:
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, Tuple<>, BF16, PassThrough, PassThrough, DynamicUnaryOp, ConvSpec, GemmMNKPadding, 1, 64, 64, 64, 32, 8, 8, 32, 32, 2, 2, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8,...
[2] Line 56:
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, Tuple<>, BF16, PassThrough, PassThrough, DynamicUnaryOp, ConvSpec, GemmMNKPadding, 1, 64, 64, 32, 32, 8, 8, 32, 32, 2, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8,...
[3] Line 57:
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, Tuple<>, BF16, PassThrough, PassThrough, DynamicUnaryOp, ConvSpec, GemmMNKPadding, 1, 256, 128, 128, 32, 8, 8, 32, 32, 2, 2, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8,...
... and 61 more instantiations
----------------------------------------------------------------------------------------------------
File: library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_instance.hpp
Instantiations in this file: 193
Sample instantiations:
[1] Line 68:
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, DsDataTypes, BF16, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 1, 64, 64, 64, 32, 8, 8, 32, 32, 2, 2, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, ...
[2] Line 87:
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, DsDataTypes, BF16, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 1, 64, 64, 64, 32, 8, 8, 32, 32, 2, 2, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, ...
[3] Line 89:
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, DsDataTypes, BF16, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 1, 64, 64, 32, 32, 8, 8, 32, 32, 2, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, ...
... and 190 more instantiations
----------------------------------------------------------------------------------------------------
File: library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_merged_groups_instance.hpp
Instantiations in this file: 21
Sample instantiations:
[1] Line 55:
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, DsDataTypes, BF16, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 1, 64, 64, 16, 16, 4, 4, 16, 16, 4, 1, S< 4, 16, 1>, S<0, 2, 1>, S<0, 2, 1>, 1, 4, 4, 1, S< 4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 4, 1...
[2] Line 56:
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, DsDataTypes, BF16, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 1, 64, 64, 16, 16, 4, 4, 16, 16, 4, 1, S< 4, 16, 1>, S<0, 2, 1>, S<0, 2, 1>, 1, 4, 4, 1, S< 4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 4, 1...
[3] Line 57:
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, DsDataTypes, BF16, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 1, 64, 64, 16, 16, 4, 4, 16, 16, 4, 1, S< 4, 16, 1>, S<0, 2, 1>, S<0, 2, 1>, 1, 4, 4, 1, S< 4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 4, 1...
... and 18 more instantiations
----------------------------------------------------------------------------------------------------
File: library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_outelementop_instance.hpp
Instantiations in this file: 80
Sample instantiations:
[1] Line 54:
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, F8, F8, F32, F32, Tuple<>, F8, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 1, 64, 64, 64, 32, 8, 8, 32, 32, 2, 2, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, ...
[2] Line 56:
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, F8, F8, F32, F32, Tuple<>, F8, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 1, 64, 64, 32, 32, 8, 8, 32, 32, 2, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, ...
[3] Line 57:
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, F8, F8, F32, F32, Tuple<>, F8, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 1, 256, 128, 128, 32, 8, 8, 32, 32, 2, 2, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, ...
... and 77 more instantiations
----------------------------------------------------------------------------------------------------
File: library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_scale_instance.hpp
Instantiations in this file: 80
Sample instantiations:
[1] Line 54:
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, Tuple<>, BF16, PassThrough, PassThrough, Scale, ConvSpec, GemmMNKPadding, 1, 64, 64, 64, 32, 8, 8, 32, 32, 2, 2, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, ...
[2] Line 56:
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, Tuple<>, BF16, PassThrough, PassThrough, Scale, ConvSpec, GemmMNKPadding, 1, 64, 64, 32, 32, 8, 8, 32, 32, 2, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, ...
[3] Line 57:
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, Tuple<>, BF16, PassThrough, PassThrough, Scale, ConvSpec, GemmMNKPadding, 1, 256, 128, 128, 32, 8, 8, 32, 32, 2, 2, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, ...
... and 77 more instantiations
----------------------------------------------------------------------------------------------------
File: library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_scaleadd_ab_instance.hpp
Instantiations in this file: 20
Sample instantiations:
[1] Line 50:
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, ck::Tuple<>,ELayout, ck::Tuple<BF16, BF16>, ck::Tuple<BF16, BF16>, F32, BF16, ck::Tuple<>, BF16, ScaleAdd, ScaleAdd, PassThrough, ConvSpec, GemmMNKPadding, 1, 64, 64, 64, 32, 8, 8, 32, 32, 2, 2, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 1, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, ...
[2] Line 52:
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, ck::Tuple<>,ELayout, ck::Tuple<BF16, BF16>, ck::Tuple<BF16, BF16>, F32, BF16, ck::Tuple<>, BF16, ScaleAdd, ScaleAdd, PassThrough, ConvSpec, GemmMNKPadding, 1, 64, 64, 32, 32, 8, 8, 32, 32, 2, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, ...
[3] Line 53:
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, ck::Tuple<>,ELayout, ck::Tuple<BF16, BF16>, ck::Tuple<BF16, BF16>, F32, BF16, ck::Tuple<>, BF16, ScaleAdd, ScaleAdd, PassThrough, ConvSpec, GemmMNKPadding, 1, 256, 128, 128, 32, 8, 8, 32, 32, 2, 2, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 1, 1, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, ...
... and 17 more instantiations
----------------------------------------------------------------------------------------------------
File: library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_scaleadd_scaleadd_relu_instance.hpp
Instantiations in this file: 16
Sample instantiations:
[1] Line 53:
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, ck::Tuple<BF16, BF16>, BF16, PassThrough, PassThrough, ScaleAddScaleAddRelu, ConvSpec, GemmMNKPadding, 1, 64, 64, 64, 32, 8, 8, 32, 32, 2, 2, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, ...
[2] Line 55:
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, ck::Tuple<BF16, BF16>, BF16, PassThrough, PassThrough, ScaleAddScaleAddRelu, ConvSpec, GemmMNKPadding, 1, 64, 64, 32, 32, 8, 8, 32, 32, 2, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, ...
[3] Line 56:
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, ck::Tuple<BF16, BF16>, BF16, PassThrough, PassThrough, ScaleAddScaleAddRelu, ConvSpec, GemmMNKPadding, 1, 256, 128, 128, 32, 8, 8, 32, 32, 2, 2, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, ...
... and 13 more instantiations
====================================================================================================
DEVICE OPERATION: DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3
====================================================================================================
Total Template Instantiations: 141
Number of Files: 2
----------------------------------------------------------------------------------------------------
File: library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_comp_instance.hpp
Instantiations in this file: 49
Sample instantiations:
[1] Line 71:
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, DsDataTypes, BF16, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 256, 128, 128, 64, 16, 16, 32, 32, 2, 2, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 1, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, ...
[2] Line 90:
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, DsDataTypes, BF16, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 256, 256, 256, 32, 8, 8, 32, 32, 4, 4, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 0, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, ...
[3] Line 91:
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, DsDataTypes, BF16, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 256, 128, 128, 64, 8, 8, 32, 32, 2, 2, S<8, 32, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 0, S<8, 32, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, ...
... and 46 more instantiations
----------------------------------------------------------------------------------------------------
File: library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_mem_instance.hpp
Instantiations in this file: 92
Sample instantiations:
[1] Line 69:
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, DsDataTypes, BF16, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 128, 32, 16, 64, 8, 8, 16, 16, 1, 1, S<8, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 0, S<8, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, ...
[2] Line 70:
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, DsDataTypes, BF16, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 64, 16, 16, 128, 8, 8, 16, 16, 1, 1, S<16, 4, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 0, S<16, 4, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, ...
[3] Line 71:
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, DsDataTypes, BF16, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 64, 16, 16, 64, 8, 8, 16, 16, 1, 1, S<8, 8, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 0, S<8, 8, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, ...
... and 89 more instantiations
====================================================================================================
DEVICE OPERATION: DeviceGroupedConvFwdMultipleD_Wmma_CShuffle
====================================================================================================
Total Template Instantiations: 42
Number of Files: 1
----------------------------------------------------------------------------------------------------
File: library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_wmma_instance.hpp
Instantiations in this file: 42
Sample instantiations:
[1] Line 62:
DeviceGroupedConvFwdMultipleD_Wmma_CShuffle<NDSpatial, ALayout, BLayout, DsLayout, ELayout, F16, F16, F32, F16, DsDatatype, F16, PassThrough, PassThrough, CDEElementOp, ConvSpec, GemmMNKPadding, 1, 128, 64, 64, 32, 8, 16, 16, 2, 2, S<4, 32, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1, S<4, 32, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, ...
[2] Line 64:
DeviceGroupedConvFwdMultipleD_Wmma_CShuffle<NDSpatial, ALayout, BLayout, DsLayout, ELayout, F16, F16, F32, F16, DsDatatype, F16, PassThrough, PassThrough, CDEElementOp, ConvSpec, GemmMNKPadding, 1, 256, 128, 128, 32, 8, 16, 16, 4, 2, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 1, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, ...
[3] Line 65:
DeviceGroupedConvFwdMultipleD_Wmma_CShuffle<NDSpatial, ALayout, BLayout, DsLayout, ELayout, F16, F16, F32, F16, DsDatatype, F16, PassThrough, PassThrough, CDEElementOp, ConvSpec, GemmMNKPadding, 1, 256, 64, 256, 32, 8, 16, 16, 2, 4, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 1, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, ...
... and 39 more instantiations
====================================================================================================
DEVICE OPERATION: DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor
====================================================================================================
Total Template Instantiations: 12
Number of Files: 1
----------------------------------------------------------------------------------------------------
File: library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_large_tensor_instance.hpp
Instantiations in this file: 12
Sample instantiations:
[1] Line 53:
DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, DsDataTypes, BF16, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 1, 64, 64, 64, 32, 8, 8, 32, 32, 2, 2, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, ...
[2] Line 55:
DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, DsDataTypes, BF16, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 1, 256, 256, 128, 32, 8, 8, 32, 32, 4, 2, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 2, 8, 1, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 2, ...
[3] Line 56:
DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, DsDataTypes, BF16, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 1, 256, 256, 128, 32, 8, 8, 32, 32, 4, 2, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 1, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, ...
... and 9 more instantiations

View File

@@ -0,0 +1,285 @@
{
"summary": {
"total_device_operations": 9,
"device_operation_names": [
"DeviceConv2dFwdXdl_C_Shuffle_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K",
"DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K",
"DeviceConvFwd",
"DeviceGroupedConvFwdDlMultipleD_NHWC_KYXC_NHWK",
"DeviceGroupedConvFwdMultipleABD",
"DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle",
"DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3",
"DeviceGroupedConvFwdMultipleD_Wmma_CShuffle",
"DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor"
]
},
"details": {
"DeviceConv2dFwdXdl_C_Shuffle_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K": {
"files": [
"library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_c_shuffle_nhwc_kyxc_nhwk_f16_instance.cpp"
],
"total_instantiations": 61
},
"DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K": {
"files": [
"library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_bf16_instance.cpp",
"library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f16_instance.cpp",
"library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f32_instance.cpp",
"library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_int8_instance.cpp"
],
"total_instantiations": 156
},
"DeviceConvFwd": {
"files": [
"library/include/ck/library/tensor_operation_instance/gpu/convolution_forward.hpp",
"library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_c_shuffle_nhwc_kyxc_nhwk_f16_instance.cpp",
"library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_bf16_instance.cpp",
"library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f16_instance.cpp",
"library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f32_instance.cpp",
"library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_int8_instance.cpp"
],
"total_instantiations": 12
},
"DeviceGroupedConvFwdDlMultipleD_NHWC_KYXC_NHWK": {
"files": [
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_dl_instance.hpp"
],
"total_instantiations": 6
},
"DeviceGroupedConvFwdMultipleABD": {
"files": [
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_bilinear_instance.hpp",
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_binary_outelementop_instance.hpp",
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_comp_instance.hpp",
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_dynamic_op_instance.hpp",
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_instance.hpp",
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_mem_instance.hpp",
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_merged_groups_instance.hpp",
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_outelementop_instance.hpp",
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_scale_instance.hpp",
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_scaleadd_ab_instance.hpp",
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_scaleadd_scaleadd_relu_instance.hpp",
"library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward.hpp",
"library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_bias_bnorm_clamp.hpp",
"library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_bias_clamp.hpp",
"library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_bilinear.hpp",
"library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_clamp.hpp",
"library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_convinvscale.hpp",
"library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_convscale.hpp",
"library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_convscale_add.hpp",
"library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_convscale_relu.hpp",
"library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_dynamic_op.hpp",
"library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_scale.hpp",
"library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_scaleadd_ab.hpp",
"library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_scaleadd_scaleadd_relu.hpp",
"library/include/ck/library/tensor_operation_instance/gpu/quantization/grouped_convolution_bias_forward_perchannel_quantization.hpp",
"library/include/ck/library/tensor_operation_instance/gpu/quantization/grouped_convolution_bias_forward_perlayer_quantization.hpp",
"library/include/ck/library/tensor_operation_instance/gpu/quantization/grouped_convolution_forward_perchannel_quantization.hpp",
"library/include/ck/library/tensor_operation_instance/gpu/quantization/grouped_convolution_forward_perlayer_quantization.hpp",
"library/src/tensor_operation_instance/gpu/grouped_conv1d_fwd/xdl/device_grouped_conv1d_fwd_xdl_gnwc_gkxc_gnwk_bf16_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv1d_fwd/xdl/device_grouped_conv1d_fwd_xdl_gnwc_gkxc_gnwk_f16_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv1d_fwd/xdl/device_grouped_conv1d_fwd_xdl_gnwc_gkxc_gnwk_f32_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv1d_fwd/xdl/device_grouped_conv1d_fwd_xdl_gnwc_gkxc_gnwk_int8_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/dl/device_grouped_conv2d_fwd_dl_gnhwc_gkyxc_gnhwk_f16_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/dl/device_grouped_conv2d_fwd_dl_gnhwc_gkyxc_gnhwk_f32_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/dl/device_grouped_conv2d_fwd_dl_nhwgc_gkyxc_nhwgk_f16_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/dl/device_grouped_conv2d_fwd_dl_nhwgc_gkyxc_nhwgk_f32_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_gnhwc_gkyxc_gnhwk_f16_1x1p0_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_gnhwc_gkyxc_gnhwk_f16_1x1s1p0_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_gnhwc_gkyxc_gnhwk_f16_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_gnhwc_gkyxc_gnhwk_f16_oddc_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_gnhwc_gkyxc_gnhwk_i8_1x1p0_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_gnhwc_gkyxc_gnhwk_i8_1x1s1p0_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_gnhwc_gkyxc_gnhwk_i8_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_gnhwc_gkyxc_gnhwk_i8_oddc_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_nhwgc_gkyxc_nhwgk_f16_1x1p0_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_nhwgc_gkyxc_nhwgk_f16_1x1s1p0_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_nhwgc_gkyxc_nhwgk_f16_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_nhwgc_gkyxc_nhwgk_f16_oddc_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_nhwgc_gkyxc_nhwgk_i8_1x1p0_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_nhwgc_gkyxc_nhwgk_i8_1x1s1p0_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_nhwgc_gkyxc_nhwgk_i8_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_nhwgc_gkyxc_nhwgk_i8_oddc_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_bf16_comp_2x_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_bf16_comp_instance.in",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_bf16_comp_part2_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f16_comp_2x_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f16_comp_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f16_comp_part2_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f32_comp_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_bf16_comp_2x_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_bf16_comp_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_bf16_comp_part2_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f16_comp_2x_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f16_comp_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f16_comp_part2_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_comp_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_tf32_comp_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_int8_comp_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_gnhwc_gkyxc_gnhwk_bf16_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_gnhwc_gkyxc_gnhwk_f16_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_gnhwc_gkyxc_gnhwk_f32_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_bf16_16x16_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_bf16_instance.in",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f16_16x16_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f16_instance.in",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f32_16x16_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f32_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_ngchw_gkyxc_ngkhw_bf16_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_ngchw_gkyxc_ngkhw_f16_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_ngchw_gkyxc_ngkhw_f32_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_ngchw_gkyxc_ngkhw_int8_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_bf16_16x16_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_bf16_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f16_16x16_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f16_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_16x16_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_tf32_16x16_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_tf32_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_int8_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/large_tensor/device_grouped_conv2d_fwd_xdl_large_tensor_nhwgc_gkyxc_nhwgk_bf16_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/large_tensor/device_grouped_conv2d_fwd_xdl_large_tensor_nhwgc_gkyxc_nhwgk_f16_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/large_tensor/device_grouped_conv2d_fwd_xdl_large_tensor_nhwgc_gkyxc_nhwgk_f32_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/large_tensor/device_grouped_conv2d_fwd_xdl_large_tensor_nhwgc_gkyxc_nhwgk_f32_tf32_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/large_tensor/device_grouped_conv2d_fwd_xdl_large_tensor_nhwgc_gkyxc_nhwgk_int8_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_bf16_mem_inter_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_bf16_mem_intra_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f16_mem_inter_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f16_mem_intra_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f32_mem_inter_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f32_mem_intra_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_bf16_mem_inter_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_bf16_mem_intra_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f16_mem_inter_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f16_mem_intra_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_mem_inter_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_mem_intra_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_tf32_mem_inter_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_tf32_mem_intra_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_int8_mem_inter_instance.in",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_int8_mem_intra_instance.in",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/merged_groups/device_grouped_conv2d_fwd_xdl_merged_groups_ngchw_gkcyx_ngkhw_bf16_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/merged_groups/device_grouped_conv2d_fwd_xdl_merged_groups_ngchw_gkcyx_ngkhw_f16_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/merged_groups/device_grouped_conv2d_fwd_xdl_merged_groups_ngchw_gkcyx_ngkhw_f32_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/merged_groups/device_grouped_conv2d_fwd_xdl_merged_groups_nhwgc_gkyxc_nhwgk_bf16_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/merged_groups/device_grouped_conv2d_fwd_xdl_merged_groups_nhwgc_gkyxc_nhwgk_f16_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/merged_groups/device_grouped_conv2d_fwd_xdl_merged_groups_nhwgc_gkyxc_nhwgk_f32_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/merged_groups/device_grouped_conv2d_fwd_xdl_merged_groups_nhwgc_gkyxc_nhwgk_f32_tf32_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/merged_groups/device_grouped_conv2d_fwd_xdl_merged_groups_nhwgc_gkyxc_nhwgk_int8_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_gndhwc_gkzyxc_gndhwk_f16_1x1p0_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_gndhwc_gkzyxc_gndhwk_f16_1x1s1p0_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_gndhwc_gkzyxc_gndhwk_f16_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_gndhwc_gkzyxc_gndhwk_f16_oddc_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_gndhwc_gkzyxc_gndhwk_i8_1x1p0_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_gndhwc_gkzyxc_gndhwk_i8_1x1s1p0_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_gndhwc_gkzyxc_gndhwk_i8_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_gndhwc_gkzyxc_gndhwk_i8_oddc_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_ndhwgc_gkzyxc_ndhwgk_f16_1x1p0_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_ndhwgc_gkzyxc_ndhwgk_f16_1x1s1p0_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_ndhwgc_gkzyxc_ndhwgk_f16_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_ndhwgc_gkzyxc_ndhwgk_f16_oddc_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_ndhwgc_gkzyxc_ndhwgk_i8_1x1p0_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_ndhwgc_gkzyxc_ndhwgk_i8_1x1s1p0_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_ndhwgc_gkzyxc_ndhwgk_i8_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_ndhwgc_gkzyxc_ndhwgk_i8_oddc_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_bf16_comp_instance.in",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f16_comp_instance.in",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f32_comp_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f32_tf32_comp_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_bf16_comp_2x_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_bf16_comp_instance.in",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_bf16_comp_part2_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f16_comp_2x_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f16_comp_instance.in",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f16_comp_part2_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f32_comp_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_gndhwc_gkzyxc_gndhwk_bf16_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_gndhwc_gkzyxc_gndhwk_f16_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_gndhwc_gkzyxc_gndhwk_f32_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_gndhwc_gkzyxc_gndhwk_int8_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_bf16_16x16_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_bf16_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_bf8_fp8_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_bf8_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f16_16x16_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f16_comp_fp8_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f16_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f32_16x16_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f32_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f32_tf32_16x16_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f32_tf32_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_fp8_bf8_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_fp8_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_int8_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_bf16_16x16_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_bf16_instance.in",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f16_16x16_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f16_instance.in",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f32_16x16_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f32_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/large_tensor/device_grouped_conv3d_fwd_xdl_large_tensor_ndhwgc_gkzyxc_ndhwgk_bf16_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/large_tensor/device_grouped_conv3d_fwd_xdl_large_tensor_ndhwgc_gkzyxc_ndhwgk_f16_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/large_tensor/device_grouped_conv3d_fwd_xdl_large_tensor_ndhwgc_gkzyxc_ndhwgk_f32_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/large_tensor/device_grouped_conv3d_fwd_xdl_large_tensor_ndhwgc_gkzyxc_ndhwgk_f32_tf32_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_bf16_mem_inter_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_bf16_mem_intra_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f16_mem_inter_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f16_mem_intra_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f32_mem_inter_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f32_mem_intra_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f32_tf32_mem_inter_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f32_tf32_mem_intra_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_bf16_mem_inter_instance.in",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_bf16_mem_intra_instance.in",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f16_mem_inter_instance.in",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f16_mem_intra_instance.in",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f32_mem_inter_instance.in",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f32_mem_intra_instance.in",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/merged_groups/device_grouped_conv3d_fwd_xdl_merged_groups_ndhwgc_gkzyxc_ndhwgk_bf16_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/merged_groups/device_grouped_conv3d_fwd_xdl_merged_groups_ndhwgc_gkzyxc_ndhwgk_f16_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/merged_groups/device_grouped_conv3d_fwd_xdl_merged_groups_ndhwgc_gkzyxc_ndhwgk_f32_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/merged_groups/device_grouped_conv3d_fwd_xdl_merged_groups_ndhwgc_gkzyxc_ndhwgk_f32_tf32_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/merged_groups/device_grouped_conv3d_fwd_xdl_merged_groups_ngcdhw_gkczyx_ngkdhw_bf16_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/merged_groups/device_grouped_conv3d_fwd_xdl_merged_groups_ngcdhw_gkczyx_ngkdhw_f16_instance.cpp",
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/merged_groups/device_grouped_conv3d_fwd_xdl_merged_groups_ngcdhw_gkczyx_ngkdhw_f32_instance.cpp"
],
"total_instantiations": 975
},
"DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle": {
"files": [
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_bilinear_instance.hpp",
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_binary_outelementop_instance.hpp",
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_comp_instance.hpp",
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_dynamic_op_instance.hpp",
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_instance.hpp",
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_mem_instance.hpp",
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_merged_groups_instance.hpp",
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_outelementop_instance.hpp",
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_scale_instance.hpp",
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_scaleadd_ab_instance.hpp",
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_scaleadd_scaleadd_relu_instance.hpp"
],
"total_instantiations": 711
},
"DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3": {
"files": [
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_comp_instance.hpp",
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_mem_instance.hpp"
],
"total_instantiations": 141
},
"DeviceGroupedConvFwdMultipleD_Wmma_CShuffle": {
"files": [
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_wmma_instance.hpp"
],
"total_instantiations": 42
},
"DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor": {
"files": [
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_large_tensor_instance.hpp"
],
"total_instantiations": 12
}
}
}

View File

@@ -0,0 +1,290 @@
================================================================================
FORWARD CONVOLUTION DEVICE OPERATIONS REPORT
================================================================================
Total Unique Device Operations: 9
--------------------------------------------------------------------------------
Device Operation: DeviceConv2dFwdXdl_C_Shuffle_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K
Total Template Instantiations: 61
Number of Files: 1
Files containing this device operation:
- library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_c_shuffle_nhwc_kyxc_nhwk_f16_instance.cpp
--------------------------------------------------------------------------------
Device Operation: DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K
Total Template Instantiations: 156
Number of Files: 4
Files containing this device operation:
- library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_bf16_instance.cpp
- library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f16_instance.cpp
- library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f32_instance.cpp
- library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_int8_instance.cpp
--------------------------------------------------------------------------------
Device Operation: DeviceConvFwd
Total Template Instantiations: 12
Number of Files: 6
Files containing this device operation:
- library/include/ck/library/tensor_operation_instance/gpu/convolution_forward.hpp
- library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_c_shuffle_nhwc_kyxc_nhwk_f16_instance.cpp
- library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_bf16_instance.cpp
- library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f16_instance.cpp
- library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f32_instance.cpp
- library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_int8_instance.cpp
--------------------------------------------------------------------------------
Device Operation: DeviceGroupedConvFwdDlMultipleD_NHWC_KYXC_NHWK
Total Template Instantiations: 6
Number of Files: 1
Files containing this device operation:
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_dl_instance.hpp
--------------------------------------------------------------------------------
Device Operation: DeviceGroupedConvFwdMultipleABD
Total Template Instantiations: 975
Number of Files: 195
Files containing this device operation:
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_bilinear_instance.hpp
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_binary_outelementop_instance.hpp
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_comp_instance.hpp
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_dynamic_op_instance.hpp
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_instance.hpp
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_mem_instance.hpp
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_merged_groups_instance.hpp
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_outelementop_instance.hpp
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_scale_instance.hpp
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_scaleadd_ab_instance.hpp
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_scaleadd_scaleadd_relu_instance.hpp
- library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward.hpp
- library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_bias_bnorm_clamp.hpp
- library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_bias_clamp.hpp
- library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_bilinear.hpp
- library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_clamp.hpp
- library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_convinvscale.hpp
- library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_convscale.hpp
- library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_convscale_add.hpp
- library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_convscale_relu.hpp
- library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_dynamic_op.hpp
- library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_scale.hpp
- library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_scaleadd_ab.hpp
- library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_scaleadd_scaleadd_relu.hpp
- library/include/ck/library/tensor_operation_instance/gpu/quantization/grouped_convolution_bias_forward_perchannel_quantization.hpp
- library/include/ck/library/tensor_operation_instance/gpu/quantization/grouped_convolution_bias_forward_perlayer_quantization.hpp
- library/include/ck/library/tensor_operation_instance/gpu/quantization/grouped_convolution_forward_perchannel_quantization.hpp
- library/include/ck/library/tensor_operation_instance/gpu/quantization/grouped_convolution_forward_perlayer_quantization.hpp
- library/src/tensor_operation_instance/gpu/grouped_conv1d_fwd/xdl/device_grouped_conv1d_fwd_xdl_gnwc_gkxc_gnwk_bf16_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv1d_fwd/xdl/device_grouped_conv1d_fwd_xdl_gnwc_gkxc_gnwk_f16_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv1d_fwd/xdl/device_grouped_conv1d_fwd_xdl_gnwc_gkxc_gnwk_f32_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv1d_fwd/xdl/device_grouped_conv1d_fwd_xdl_gnwc_gkxc_gnwk_int8_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/dl/device_grouped_conv2d_fwd_dl_gnhwc_gkyxc_gnhwk_f16_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/dl/device_grouped_conv2d_fwd_dl_gnhwc_gkyxc_gnhwk_f32_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/dl/device_grouped_conv2d_fwd_dl_nhwgc_gkyxc_nhwgk_f16_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/dl/device_grouped_conv2d_fwd_dl_nhwgc_gkyxc_nhwgk_f32_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_gnhwc_gkyxc_gnhwk_f16_1x1p0_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_gnhwc_gkyxc_gnhwk_f16_1x1s1p0_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_gnhwc_gkyxc_gnhwk_f16_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_gnhwc_gkyxc_gnhwk_f16_oddc_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_gnhwc_gkyxc_gnhwk_i8_1x1p0_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_gnhwc_gkyxc_gnhwk_i8_1x1s1p0_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_gnhwc_gkyxc_gnhwk_i8_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_gnhwc_gkyxc_gnhwk_i8_oddc_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_nhwgc_gkyxc_nhwgk_f16_1x1p0_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_nhwgc_gkyxc_nhwgk_f16_1x1s1p0_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_nhwgc_gkyxc_nhwgk_f16_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_nhwgc_gkyxc_nhwgk_f16_oddc_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_nhwgc_gkyxc_nhwgk_i8_1x1p0_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_nhwgc_gkyxc_nhwgk_i8_1x1s1p0_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_nhwgc_gkyxc_nhwgk_i8_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_nhwgc_gkyxc_nhwgk_i8_oddc_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_bf16_comp_2x_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_bf16_comp_instance.in
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_bf16_comp_part2_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f16_comp_2x_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f16_comp_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f16_comp_part2_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f32_comp_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_bf16_comp_2x_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_bf16_comp_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_bf16_comp_part2_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f16_comp_2x_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f16_comp_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f16_comp_part2_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_comp_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_tf32_comp_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_int8_comp_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_gnhwc_gkyxc_gnhwk_bf16_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_gnhwc_gkyxc_gnhwk_f16_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_gnhwc_gkyxc_gnhwk_f32_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_bf16_16x16_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_bf16_instance.in
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f16_16x16_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f16_instance.in
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f32_16x16_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f32_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_ngchw_gkyxc_ngkhw_bf16_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_ngchw_gkyxc_ngkhw_f16_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_ngchw_gkyxc_ngkhw_f32_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_ngchw_gkyxc_ngkhw_int8_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_bf16_16x16_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_bf16_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f16_16x16_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f16_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_16x16_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_tf32_16x16_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_tf32_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_int8_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/large_tensor/device_grouped_conv2d_fwd_xdl_large_tensor_nhwgc_gkyxc_nhwgk_bf16_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/large_tensor/device_grouped_conv2d_fwd_xdl_large_tensor_nhwgc_gkyxc_nhwgk_f16_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/large_tensor/device_grouped_conv2d_fwd_xdl_large_tensor_nhwgc_gkyxc_nhwgk_f32_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/large_tensor/device_grouped_conv2d_fwd_xdl_large_tensor_nhwgc_gkyxc_nhwgk_f32_tf32_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/large_tensor/device_grouped_conv2d_fwd_xdl_large_tensor_nhwgc_gkyxc_nhwgk_int8_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_bf16_mem_inter_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_bf16_mem_intra_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f16_mem_inter_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f16_mem_intra_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f32_mem_inter_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f32_mem_intra_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_bf16_mem_inter_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_bf16_mem_intra_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f16_mem_inter_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f16_mem_intra_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_mem_inter_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_mem_intra_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_tf32_mem_inter_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_tf32_mem_intra_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_int8_mem_inter_instance.in
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_int8_mem_intra_instance.in
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/merged_groups/device_grouped_conv2d_fwd_xdl_merged_groups_ngchw_gkcyx_ngkhw_bf16_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/merged_groups/device_grouped_conv2d_fwd_xdl_merged_groups_ngchw_gkcyx_ngkhw_f16_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/merged_groups/device_grouped_conv2d_fwd_xdl_merged_groups_ngchw_gkcyx_ngkhw_f32_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/merged_groups/device_grouped_conv2d_fwd_xdl_merged_groups_nhwgc_gkyxc_nhwgk_bf16_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/merged_groups/device_grouped_conv2d_fwd_xdl_merged_groups_nhwgc_gkyxc_nhwgk_f16_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/merged_groups/device_grouped_conv2d_fwd_xdl_merged_groups_nhwgc_gkyxc_nhwgk_f32_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/merged_groups/device_grouped_conv2d_fwd_xdl_merged_groups_nhwgc_gkyxc_nhwgk_f32_tf32_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/merged_groups/device_grouped_conv2d_fwd_xdl_merged_groups_nhwgc_gkyxc_nhwgk_int8_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_gndhwc_gkzyxc_gndhwk_f16_1x1p0_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_gndhwc_gkzyxc_gndhwk_f16_1x1s1p0_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_gndhwc_gkzyxc_gndhwk_f16_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_gndhwc_gkzyxc_gndhwk_f16_oddc_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_gndhwc_gkzyxc_gndhwk_i8_1x1p0_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_gndhwc_gkzyxc_gndhwk_i8_1x1s1p0_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_gndhwc_gkzyxc_gndhwk_i8_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_gndhwc_gkzyxc_gndhwk_i8_oddc_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_ndhwgc_gkzyxc_ndhwgk_f16_1x1p0_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_ndhwgc_gkzyxc_ndhwgk_f16_1x1s1p0_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_ndhwgc_gkzyxc_ndhwgk_f16_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_ndhwgc_gkzyxc_ndhwgk_f16_oddc_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_ndhwgc_gkzyxc_ndhwgk_i8_1x1p0_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_ndhwgc_gkzyxc_ndhwgk_i8_1x1s1p0_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_ndhwgc_gkzyxc_ndhwgk_i8_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_ndhwgc_gkzyxc_ndhwgk_i8_oddc_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_bf16_comp_instance.in
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f16_comp_instance.in
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f32_comp_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f32_tf32_comp_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_bf16_comp_2x_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_bf16_comp_instance.in
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_bf16_comp_part2_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f16_comp_2x_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f16_comp_instance.in
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f16_comp_part2_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f32_comp_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_gndhwc_gkzyxc_gndhwk_bf16_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_gndhwc_gkzyxc_gndhwk_f16_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_gndhwc_gkzyxc_gndhwk_f32_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_gndhwc_gkzyxc_gndhwk_int8_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_bf16_16x16_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_bf16_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_bf8_fp8_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_bf8_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f16_16x16_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f16_comp_fp8_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f16_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f32_16x16_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f32_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f32_tf32_16x16_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f32_tf32_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_fp8_bf8_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_fp8_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_int8_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_bf16_16x16_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_bf16_instance.in
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f16_16x16_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f16_instance.in
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f32_16x16_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f32_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/large_tensor/device_grouped_conv3d_fwd_xdl_large_tensor_ndhwgc_gkzyxc_ndhwgk_bf16_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/large_tensor/device_grouped_conv3d_fwd_xdl_large_tensor_ndhwgc_gkzyxc_ndhwgk_f16_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/large_tensor/device_grouped_conv3d_fwd_xdl_large_tensor_ndhwgc_gkzyxc_ndhwgk_f32_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/large_tensor/device_grouped_conv3d_fwd_xdl_large_tensor_ndhwgc_gkzyxc_ndhwgk_f32_tf32_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_bf16_mem_inter_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_bf16_mem_intra_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f16_mem_inter_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f16_mem_intra_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f32_mem_inter_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f32_mem_intra_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f32_tf32_mem_inter_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f32_tf32_mem_intra_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_bf16_mem_inter_instance.in
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_bf16_mem_intra_instance.in
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f16_mem_inter_instance.in
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f16_mem_intra_instance.in
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f32_mem_inter_instance.in
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f32_mem_intra_instance.in
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/merged_groups/device_grouped_conv3d_fwd_xdl_merged_groups_ndhwgc_gkzyxc_ndhwgk_bf16_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/merged_groups/device_grouped_conv3d_fwd_xdl_merged_groups_ndhwgc_gkzyxc_ndhwgk_f16_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/merged_groups/device_grouped_conv3d_fwd_xdl_merged_groups_ndhwgc_gkzyxc_ndhwgk_f32_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/merged_groups/device_grouped_conv3d_fwd_xdl_merged_groups_ndhwgc_gkzyxc_ndhwgk_f32_tf32_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/merged_groups/device_grouped_conv3d_fwd_xdl_merged_groups_ngcdhw_gkczyx_ngkdhw_bf16_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/merged_groups/device_grouped_conv3d_fwd_xdl_merged_groups_ngcdhw_gkczyx_ngkdhw_f16_instance.cpp
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/merged_groups/device_grouped_conv3d_fwd_xdl_merged_groups_ngcdhw_gkczyx_ngkdhw_f32_instance.cpp
--------------------------------------------------------------------------------
Device Operation: DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle
Total Template Instantiations: 711
Number of Files: 11
Files containing this device operation:
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_bilinear_instance.hpp
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_binary_outelementop_instance.hpp
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_comp_instance.hpp
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_dynamic_op_instance.hpp
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_instance.hpp
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_mem_instance.hpp
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_merged_groups_instance.hpp
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_outelementop_instance.hpp
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_scale_instance.hpp
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_scaleadd_ab_instance.hpp
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_scaleadd_scaleadd_relu_instance.hpp
--------------------------------------------------------------------------------
Device Operation: DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3
Total Template Instantiations: 141
Number of Files: 2
Files containing this device operation:
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_comp_instance.hpp
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_mem_instance.hpp
--------------------------------------------------------------------------------
Device Operation: DeviceGroupedConvFwdMultipleD_Wmma_CShuffle
Total Template Instantiations: 42
Number of Files: 1
Files containing this device operation:
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_wmma_instance.hpp
--------------------------------------------------------------------------------
Device Operation: DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor
Total Template Instantiations: 12
Number of Files: 1
Files containing this device operation:
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_large_tensor_instance.hpp