mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-07-01 20:27:42 +00:00
Add listing of all fwd and bwd device ops and instances.
This commit is contained in:
72
BACKWARD_CONVOLUTION_DEVICE_OPS_SUMMARY.md
Normal file
72
BACKWARD_CONVOLUTION_DEVICE_OPS_SUMMARY.md
Normal file
@@ -0,0 +1,72 @@
|
||||
# Backward Convolution Device Operations - Comprehensive Summary
|
||||
|
||||
**Generated:** October 29, 2025
|
||||
**Namespace:** `ck::tensor_operation::device`
|
||||
**Location:** `/library/include` and `/library/src` directories
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
This document provides a comprehensive list of all device operations used for backward convolutions (both weight gradients and data gradients) in the Composable Kernel library.
|
||||
|
||||
### Total Statistics
|
||||
- **Unique Device Operation Types:** 15
|
||||
- **Backward Weight Operations:** 9 types (885 instantiations)
|
||||
- **Backward Data Operations:** 7 types (1046 instantiations)
|
||||
- **Total Template Instantiations:** 1827
|
||||
|
||||
---
|
||||
|
||||
## Backward Weight Device Operations
|
||||
|
||||
| # | Device Operation Name | Instantiations | Files | Primary Location |
|
||||
|---|----------------------|----------------|-------|------------------|
|
||||
| 1 | `DeviceGroupedConvBwdWeight` | 342 | 157 | `grouped_convolution_backward_weight.hpp` |
|
||||
| 2 | `DeviceGroupedConvBwdWeightMultipleD_Xdl_CShuffle` | 161 | 2 | `device_grouped_conv_bwd_weight_xdl_bilinear_instance.hpp` |
|
||||
| 3 | `DeviceGroupedConvBwdWeight_Xdl_CShuffle` | 108 | 1 | `device_grouped_conv_bwd_weight_xdl_instance.hpp` |
|
||||
| 4 | `DeviceConv2dBwdDataXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K` | 104 | 4 | `device_conv2d_bwd_data_xdl_nhwc_kyxc_nhwk_bf16_instance.cpp` |
|
||||
| 5 | `DeviceGroupedConvBwdWeightTwoStage_Xdl_CShuffle` | 96 | 1 | `device_grouped_conv_bwd_weight_two_stage_xdl_instance.hpp` |
|
||||
| 6 | `DeviceGroupedConvBwdWeight_Wmma_CShuffle` | 41 | 1 | `device_grouped_conv_bwd_weight_wmma_instance.hpp` |
|
||||
| 7 | `DeviceGroupedConvBwdWeight_Xdl_CShuffleV3` | 16 | 1 | `device_grouped_conv_bwd_weight_v3_xdl_instance.hpp` |
|
||||
| 8 | `DeviceGroupedConvBwdWeightMultipleD` | 14 | 2 | `grouped_convolution_backward_weight_bilinear.hpp` |
|
||||
| 9 | `DeviceGroupedConvBwdWeight_Dl` | 3 | 1 | `device_grouped_conv_bwd_weight_dl_instance.hpp` |
|
||||
| **Total** | **Backward Weight Operations** | **885** | **170** | |
|
||||
|
||||
---
|
||||
|
||||
## Backward Data Device Operations
|
||||
|
||||
| # | Device Operation Name | Instantiations | Files | Primary Location |
|
||||
|---|----------------------|----------------|-------|------------------|
|
||||
| 1 | `DeviceGroupedConvBwdDataMultipleD_Xdl_CShuffle_v1` | 370 | 4 | `device_grouped_conv_bwd_data_transpose_xdl_instance.hpp` |
|
||||
| 2 | `DeviceConvNdBwdDataNwcKxcNwk_Xdl` | 312 | 12 | `device_conv1d_bwd_data_xdl_nwc_kxc_nwk_bf16_instance.cpp` |
|
||||
| 3 | `DeviceGroupedConvBwdDataMultipleD` | 156 | 76 | `grouped_convolution_backward_data.hpp` |
|
||||
| 4 | `DeviceConv2dBwdDataXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K` | 104 | 4 | `device_conv2d_bwd_data_xdl_nhwc_kyxc_nhwk_bf16_instance.cpp` |
|
||||
| 5 | `DeviceGroupedConvBwdDataMultipleD_Wmma_CShuffle` | 66 | 2 | `device_grouped_conv_bwd_data_wmma_f16_instance.hpp` |
|
||||
| 6 | `DeviceConvBwdData` | 32 | 16 | `convolution_backward_data.hpp` |
|
||||
| 7 | `DeviceConvNdBwdDataNwcKxcNwk_Dl` | 6 | 3 | `device_conv2d_bwd_data_dl_nhwc_kyxc_nhwk_f16_instance.cpp` |
|
||||
| **Total** | **Backward Data Operations** | **1046** | **117** | |
|
||||
|
||||
### Grand Total: 1827 Template Instantiations across 15 Device Operation Types
|
||||
|
||||
---
|
||||
|
||||
## Output Files
|
||||
|
||||
### Complete Instantiation Files
|
||||
|
||||
1. **`backward_conv_all_instantiations.txt`**
|
||||
- COMPLETE listing of ALL template instantiations
|
||||
- Human-readable format with line numbers
|
||||
- Organized by operation type, then by file
|
||||
|
||||
2. **`backward_conv_all_instantiations.json`**
|
||||
- Structured JSON for programmatic instantiation generation
|
||||
- Separated into `backward_weight_operations` and `backward_data_operations`
|
||||
- Each instantiation includes full text and parsed parameters
|
||||
- Ready for automated code generation
|
||||
|
||||
3. **`BACKWARD_CONVOLUTION_DEVICE_OPS_SUMMARY.md`** (This file)
|
||||
- Executive summary with tables
|
||||
- Quick reference for all backward operations
|
||||
377
FORWARD_CONVOLUTION_DEVICE_OPS_SUMMARY.md
Normal file
377
FORWARD_CONVOLUTION_DEVICE_OPS_SUMMARY.md
Normal file
@@ -0,0 +1,377 @@
|
||||
# Forward Convolution Device Operations - Comprehensive Summary
|
||||
|
||||
**Generated:** October 29, 2025
|
||||
**Namespace:** `ck::tensor_operation::device`
|
||||
**Location:** `/library/include` and `/library/src` directories
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
This document provides a comprehensive list of all device operations used for forward convolutions in the Composable Kernel library, along with their template instantiations.
|
||||
|
||||
### Total Statistics
|
||||
- **Unique Device Operation Types:** 8
|
||||
- **Total Template Instantiations:** 1,000
|
||||
- **Files Analyzed:** 420
|
||||
- **Header Files:** Located in `/library/include/ck/library/tensor_operation_instance/gpu/`
|
||||
- **Source Files:** Located in `/library/src/tensor_operation_instance/gpu/`
|
||||
|
||||
---
|
||||
|
||||
## Device Operations Table
|
||||
|
||||
### Grouped Convolution Device Operations
|
||||
|
||||
| # | Device Operation Name | Instantiations | Files | Description |
|
||||
|---|----------------------|----------------|-------|-------------|
|
||||
| 1 | `DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle` | 570 | 9 | XDL-based with multiple ABD inputs, CShuffle optimization |
|
||||
| 2 | `DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3` | 141 | 2 | Version 3 with advanced pipeline scheduling |
|
||||
| 3 | `DeviceGroupedConvFwdMultipleD_Wmma_CShuffle` | 42 | 1 | WMMA-based (Wave Matrix Multiply Accumulate) |
|
||||
| 4 | `DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor` | 12 | 1 | Optimized for large tensor dimensions |
|
||||
| 5 | `DeviceGroupedConvFwdDlMultipleD_NHWC_KYXC_NHWK` | 6 | 1 | Direct Load implementation |
|
||||
| **Total** | **Grouped Convolution Operations** | **771** | **14** | |
|
||||
|
||||
### Non-Grouped Convolution Device Operations
|
||||
|
||||
| # | Device Operation Name | Instantiations | Files | Description |
|
||||
|---|----------------------|----------------|-------|-------------|
|
||||
| 1 | `DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K` | 156 | 4 | Standard 2D convolution, explicit layout |
|
||||
| 2 | `DeviceConv2dFwdXdl_C_Shuffle_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K` | 61 | 1 | 2D convolution with C-Shuffle optimization |
|
||||
| 3 | `DeviceConvFwd` | 12 | 6 | Generic interface type |
|
||||
| **Total** | **Non-Grouped Operations** | **229** | **11** | |
|
||||
|
||||
### Grand Total: 1,000 Template Instantiations across 8 Device Operation Types
|
||||
|
||||
---
|
||||
|
||||
## Device Operations List
|
||||
|
||||
### 1. DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle
|
||||
**Most commonly used device operation for forward convolutions**
|
||||
- **Template Instantiations:** 570
|
||||
- **Files:** 9 header files
|
||||
- **Primary Header:** `grouped_conv_fwd/device_grouped_conv_fwd_xdl_instance.hpp`
|
||||
- **Description:** XDL-based grouped convolution with multiple auxiliary inputs/outputs and CShuffle optimization
|
||||
- **Variants:**
|
||||
- Standard instances (BF16, F16, F32, INT8, F8, BF8)
|
||||
- 16x16 MFMA instances
|
||||
- NCHW layout instances
|
||||
- Generic instances
|
||||
- Compute-friendly instances with FP8
|
||||
|
||||
**Key Files:**
|
||||
- `device_grouped_conv_fwd_xdl_instance.hpp` - Main instantiations (193 instances)
|
||||
- `device_grouped_conv_fwd_xdl_bilinear_instance.hpp` - With bilinear fusion (80 instances)
|
||||
- `device_grouped_conv_fwd_xdl_scale_instance.hpp` - With scale operation (80 instances)
|
||||
- `device_grouped_conv_fwd_xdl_dynamic_op_instance.hpp` - With dynamic operations (64 instances)
|
||||
- `device_grouped_conv_fwd_xdl_outelementop_instance.hpp` - With custom output ops (80 instances)
|
||||
- `device_grouped_conv_fwd_xdl_merged_groups_instance.hpp` - Merged groups optimization (21 instances)
|
||||
- `device_grouped_conv_fwd_xdl_scaleadd_ab_instance.hpp` - ScaleAdd operations (20 instances)
|
||||
- `device_grouped_conv_fwd_xdl_scaleadd_scaleadd_relu_instance.hpp` - Fused operations (16 instances)
|
||||
- `device_grouped_conv_fwd_xdl_binary_outelementop_instance.hpp` - Binary output ops (16 instances)
|
||||
|
||||
---
|
||||
|
||||
### 2. DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K
|
||||
**Standard 2D convolution forward operation**
|
||||
- **Template Instantiations:** 156
|
||||
- **Files:** 4 source files
|
||||
- **Description:** XDL-based 2D convolution with explicit NHWC layout
|
||||
|
||||
**Instantiation Files:**
|
||||
- `conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_bf16_instance.cpp` (39 instances)
|
||||
- `conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f16_instance.cpp` (39 instances)
|
||||
- `conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f32_instance.cpp` (39 instances)
|
||||
- `conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_int8_instance.cpp` (39 instances)
|
||||
|
||||
---
|
||||
|
||||
### 3. DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3
|
||||
**Version 3 of XDL CShuffle implementation**
|
||||
- **Template Instantiations:** 141
|
||||
- **Files:** 2 header files
|
||||
- **Description:** Advanced XDL implementation with compute-friendly scheduling and memory optimizations
|
||||
|
||||
**Key Files:**
|
||||
- `device_grouped_conv_fwd_xdl_comp_instance.hpp` (49 instances) - Compute-optimized variants
|
||||
- `device_grouped_conv_fwd_xdl_mem_instance.hpp` (92 instances) - Memory-optimized variants
|
||||
|
||||
**Features:**
|
||||
- BlockGemmPipelineScheduler variants (Interwave, Intrawave)
|
||||
- Multiple pipeline versions (v1, v3, v4, v5)
|
||||
- Support for BF16, F16, F32, TF32, INT8
|
||||
|
||||
---
|
||||
|
||||
### 4. DeviceConv2dFwdXdl_C_Shuffle_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K
|
||||
**C-Shuffle variant for 2D convolution**
|
||||
- **Template Instantiations:** 61
|
||||
- **Files:** 1 source file
|
||||
- **File:** `conv2d_fwd/device_conv2d_fwd_xdl_c_shuffle_nhwc_kyxc_nhwk_f16_instance.cpp`
|
||||
- **Data Type:** F16 only
|
||||
- **Description:** CShuffle optimization for channel dimension
|
||||
|
||||
---
|
||||
|
||||
### 5. DeviceGroupedConvFwdMultipleD_Wmma_CShuffle
|
||||
**WMMA-based grouped convolution**
|
||||
- **Template Instantiations:** 42
|
||||
- **Files:** 1 header file
|
||||
- **File:** `grouped_conv_fwd/device_grouped_conv_fwd_wmma_instance.hpp`
|
||||
- **Description:** Uses Wave Matrix Multiply Accumulate (WMMA) instructions
|
||||
- **Data Types:** F16, INT8
|
||||
- **Block Sizes:** 32, 64, 128, 256
|
||||
|
||||
---
|
||||
|
||||
### 6. DeviceConvFwd
|
||||
**Generic convolution forward interface**
|
||||
- **Template Instantiations:** 12
|
||||
- **Files:** 6 files (headers and sources)
|
||||
- **Description:** High-level interface type for convolution forward operations
|
||||
- **Usage:** Factory pattern and API definitions
|
||||
|
||||
---
|
||||
|
||||
### 7. DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor
|
||||
**Large tensor optimization**
|
||||
- **Template Instantiations:** 12
|
||||
- **Files:** 1 header file
|
||||
- **File:** `grouped_conv_fwd/device_grouped_conv_fwd_xdl_large_tensor_instance.hpp`
|
||||
- **Description:** Optimized for large tensor dimensions
|
||||
- **Data Types:** BF16, F16, F32, TF32, INT8
|
||||
|
||||
---
|
||||
|
||||
### 8. DeviceGroupedConvFwdDlMultipleD_NHWC_KYXC_NHWK
|
||||
**Direct Load (DL) implementation**
|
||||
- **Template Instantiations:** 6
|
||||
- **Files:** 1 header file
|
||||
- **File:** `grouped_conv_fwd/device_grouped_conv_fwd_dl_instance.hpp`
|
||||
- **Description:** Direct load convolution implementation for specific layouts
|
||||
- **Data Types:** F16, F32
|
||||
|
||||
---
|
||||
|
||||
## File Organization
|
||||
|
||||
### Header Files (`/library/include/ck/library/tensor_operation_instance/gpu/`)
|
||||
|
||||
#### Grouped Convolution Forward (`grouped_conv_fwd/`)
|
||||
1. `device_grouped_conv_fwd_dl_instance.hpp` - Direct Load implementations
|
||||
2. `device_grouped_conv_fwd_wmma_instance.hpp` - WMMA implementations
|
||||
3. `device_grouped_conv_fwd_xdl_instance.hpp` - **Main XDL implementations**
|
||||
4. `device_grouped_conv_fwd_xdl_bilinear_instance.hpp` - With bilinear fusion
|
||||
5. `device_grouped_conv_fwd_xdl_binary_outelementop_instance.hpp` - Binary output ops
|
||||
6. `device_grouped_conv_fwd_xdl_comp_instance.hpp` - Compute-optimized
|
||||
7. `device_grouped_conv_fwd_xdl_dynamic_op_instance.hpp` - Dynamic operations
|
||||
8. `device_grouped_conv_fwd_xdl_large_tensor_instance.hpp` - Large tensor optimization
|
||||
9. `device_grouped_conv_fwd_xdl_mem_instance.hpp` - Memory-optimized
|
||||
10. `device_grouped_conv_fwd_xdl_merged_groups_instance.hpp` - Merged groups
|
||||
11. `device_grouped_conv_fwd_xdl_outelementop_instance.hpp` - Custom output operations
|
||||
12. `device_grouped_conv_fwd_xdl_scale_instance.hpp` - With scale operation
|
||||
13. `device_grouped_conv_fwd_xdl_scaleadd_ab_instance.hpp` - ScaleAdd on inputs
|
||||
14. `device_grouped_conv_fwd_xdl_scaleadd_scaleadd_relu_instance.hpp` - Fused operations
|
||||
|
||||
### Source Files (`/library/src/tensor_operation_instance/gpu/`)
|
||||
|
||||
#### Conv2D Forward (`conv2d_fwd/`)
|
||||
- Standard XDL instances for BF16, F16, F32, INT8
|
||||
- C-Shuffle variant for F16
|
||||
|
||||
#### Grouped Conv1D/2D/3D Forward
|
||||
Organized by:
|
||||
- **Algorithm:** `dl/`, `wmma/`, `xdl/`
|
||||
- **Optimization:** `comp/`, `mem/`, `large_tensor/`, `merged_groups/`
|
||||
- **Data Type:** Per file (bf16, f16, f32, int8, fp8, bf8)
|
||||
- **Layout:** Encoded in filename (nhwgc, ngchw, etc.)
|
||||
|
||||
---
|
||||
|
||||
## Template Instantiation Patterns
|
||||
|
||||
### Common Parameters
|
||||
Template instantiations typically include:
|
||||
- **Spatial Dimensions:** 1D, 2D, 3D (NDimSpatial)
|
||||
- **Layouts:** NHWC, NCHW, and grouped variants (NHWGC, NGCHW, etc.)
|
||||
- **Data Types:** BF16, F16, F32, TF32, INT8, F8, BF8
|
||||
- **Accumulator Type:** F32, INT32
|
||||
- **Block Sizes:** 32, 64, 128, 256
|
||||
- **Thread Tile Sizes:** MPerBlock, NPerBlock, KPerBlock
|
||||
- **MFMA Sizes:** 16x16, 32x32
|
||||
- **Pipeline Stages:** 1-2 stages
|
||||
- **Element-wise Operations:** PassThrough, Scale, ScaleAdd, Bilinear, ReLU, Clamp, etc.
|
||||
|
||||
### Example Instantiation (from DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle)
|
||||
```cpp
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<
|
||||
NDimSpatial, // 2 or 3
|
||||
ALayout, // NHWGC, NGCHW, etc.
|
||||
BLayout, // GKYXC, GKCYX, etc.
|
||||
DsLayout, // Additional inputs layout
|
||||
ELayout, // Output layout
|
||||
BF16, // Input data type
|
||||
BF16, // Weight data type
|
||||
F32, // Accumulator type
|
||||
BF16, // CShuffle data type
|
||||
DsDataTypes, // Additional input types
|
||||
BF16, // Output data type
|
||||
PassThrough, // Input element-wise op
|
||||
PassThrough, // Weight element-wise op
|
||||
OutElementOp, // Output element-wise op
|
||||
ConvSpec, // Convolution specialization
|
||||
GemmMNKPadding, // GEMM specialization
|
||||
1, // NumGemmKPrefetchStage
|
||||
256, // BlockSize
|
||||
128, // MPerBlock
|
||||
128, // NPerBlock
|
||||
32, // KPerBlock
|
||||
8, // AK1
|
||||
8, // BK1
|
||||
32, // MPerXdl
|
||||
32, // NPerXdl
|
||||
2, // MXdlPerWave
|
||||
2, // NXdlPerWave
|
||||
// ... additional block transfer parameters
|
||||
>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Key Findings
|
||||
|
||||
1. **DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle is the primary device operation** with 570 instantiations across 9 different header files for various use cases and fusion patterns.
|
||||
|
||||
2. **Eight distinct device operation types** are used for forward convolutions, each optimized for different scenarios:
|
||||
- XDL (Matrix Core) based operations
|
||||
- WMMA (Wave Matrix Multiply) based operations
|
||||
- Direct Load (DL) based operations
|
||||
- Specialized variants for large tensors, merged groups, and memory optimization
|
||||
|
||||
3. **Multiple optimization strategies:**
|
||||
- Compute-friendly scheduling (Interwave, Intrawave)
|
||||
- Memory access patterns (Inter-wave, Intra-wave)
|
||||
- Pipeline versions (v1, v3, v4, v5)
|
||||
- CShuffle for efficient data movement
|
||||
|
||||
4. **Comprehensive data type support:** BF16, F16, F32, TF32, INT8, F8, BF8
|
||||
|
||||
5. **Extensive fusion support:** Operations can be fused with Scale, ScaleAdd, Bilinear, ReLU, Clamp, and combinations thereof.
|
||||
|
||||
---
|
||||
|
||||
## Output Files
|
||||
|
||||
### Complete Instantiation Files (Recommended)
|
||||
|
||||
1. **`forward_conv_all_instantiations.txt`** (593KB, 4,271 lines)
|
||||
- **COMPLETE listing of ALL 1,000 template instantiations**
|
||||
- Human-readable format with line numbers
|
||||
- Every instantiation shown in full detail
|
||||
- Organized by device operation, then by file
|
||||
|
||||
2. **`forward_conv_all_instantiations.json`** (1.4MB)
|
||||
- **Structured JSON for programmatic instantiation generation**
|
||||
- Complete instantiation text for each template
|
||||
- Includes parsed parameters (data types, block sizes)
|
||||
- Hierarchical organization: device_operation → file → instantiations[]
|
||||
- Ready for automated code generation tools
|
||||
|
||||
### Summary and Quick Reference Files
|
||||
|
||||
3. **`FORWARD_CONVOLUTION_DEVICE_OPS_SUMMARY.md`** (This file)
|
||||
- Executive summary with tables
|
||||
- Device operation descriptions
|
||||
- File organization reference
|
||||
|
||||
4. **`forward_convolution_device_ops_report.txt`** (33KB)
|
||||
- High-level summary of all device operations
|
||||
- Lists all files containing each device operation
|
||||
- Quick reference guide
|
||||
|
||||
5. **`forward_conv_device_ops_detailed_report.txt`** (43KB)
|
||||
- Detailed report with sample template instantiations
|
||||
- Shows first 3 examples per file
|
||||
- Includes line numbers and file locations
|
||||
|
||||
6. **`forward_convolution_device_ops_data.json`** (34KB)
|
||||
- Basic JSON format with summary data
|
||||
- File paths and instantiation counts
|
||||
|
||||
---
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Example 1: DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle with BF16
|
||||
Found in: `device_grouped_conv_fwd_xdl_instance.hpp`
|
||||
|
||||
This is the most versatile device operation with:
|
||||
- Generic instances for all block sizes
|
||||
- Optimized instances for small conv.K and conv.C
|
||||
- NCHW layout support
|
||||
- 16x16 MFMA support
|
||||
- FP8 compute support
|
||||
|
||||
### Example 2: DeviceGroupedConvFwdDlMultipleD_NHWC_KYXC_NHWK
|
||||
Found in: `device_grouped_conv_fwd_dl_instance.hpp`
|
||||
|
||||
Direct Load implementation with:
|
||||
- F16 and F32 support
|
||||
- Specific block configurations
|
||||
- NHWC layout only
|
||||
|
||||
### Example 3: DeviceGroupedConvFwdMultipleD_Wmma_CShuffle
|
||||
Found in: `device_grouped_conv_fwd_wmma_instance.hpp`
|
||||
|
||||
WMMA-based implementation with:
|
||||
- F16 and INT8 support
|
||||
- Multiple block sizes (32, 64, 128, 256)
|
||||
- 16x16 WMMA instruction usage
|
||||
|
||||
---
|
||||
|
||||
## File Locations Reference
|
||||
|
||||
### Main Device Operation Headers
|
||||
```
|
||||
library/include/ck/library/tensor_operation_instance/gpu/
|
||||
├── grouped_conv_fwd/
|
||||
│ ├── device_grouped_conv_fwd_xdl_instance.hpp ← PRIMARY FILE
|
||||
│ ├── device_grouped_conv_fwd_dl_instance.hpp
|
||||
│ ├── device_grouped_conv_fwd_wmma_instance.hpp
|
||||
│ ├── device_grouped_conv_fwd_xdl_comp_instance.hpp
|
||||
│ ├── device_grouped_conv_fwd_xdl_mem_instance.hpp
|
||||
│ ├── device_grouped_conv_fwd_xdl_large_tensor_instance.hpp
|
||||
│ └── ... (other variants)
|
||||
└── convolution_forward.hpp ← INTERFACE DEFINITIONS
|
||||
```
|
||||
|
||||
### Source Instantiations
|
||||
```
|
||||
library/src/tensor_operation_instance/gpu/
|
||||
├── conv2d_fwd/ ← Non-grouped 2D convolutions
|
||||
├── grouped_conv1d_fwd/ ← 1D grouped convolutions
|
||||
├── grouped_conv2d_fwd/ ← 2D grouped convolutions
|
||||
│ ├── dl/ ← Direct Load variants
|
||||
│ ├── wmma/ ← WMMA variants
|
||||
│ └── xdl/ ← XDL variants
|
||||
│ ├── comp/ ← Compute-optimized
|
||||
│ ├── mem/ ← Memory-optimized
|
||||
│ ├── large_tensor/ ← Large tensor optimized
|
||||
│ └── merged_groups/ ← Merged groups optimized
|
||||
└── grouped_conv3d_fwd/ ← 3D grouped convolutions
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Additional Resources
|
||||
|
||||
For detailed analysis and full template instantiations, refer to the generated reports:
|
||||
- `forward_convolution_device_ops_report.txt` - Quick summary
|
||||
- `forward_conv_device_ops_detailed_report.txt` - Full details with examples
|
||||
- `forward_convolution_device_ops_data.json` - Machine-readable format
|
||||
|
||||
The analysis scripts used to generate this information:
|
||||
- `extract_conv_fwd_device_ops.py` - Initial device operation extraction
|
||||
- `extract_detailed_instantiations.py` - Detailed instantiation analysis
|
||||
25532
backward_conv_all_instantiations.json
Normal file
25532
backward_conv_all_instantiations.json
Normal file
File diff suppressed because it is too large
Load Diff
14644
backward_conv_all_instantiations.txt
Normal file
14644
backward_conv_all_instantiations.txt
Normal file
File diff suppressed because it is too large
Load Diff
296
extract_all_instantiations_complete.py
Normal file
296
extract_all_instantiations_complete.py
Normal file
@@ -0,0 +1,296 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Extract ALL template instantiations for forward convolution device operations.
|
||||
Generates both human-readable .txt and programmatic JSON outputs.
|
||||
"""
|
||||
|
||||
import os
|
||||
import re
|
||||
from pathlib import Path
|
||||
from collections import defaultdict
|
||||
import json
|
||||
|
||||
def find_files(base_paths, extensions):
|
||||
"""Find all files with given extensions in base paths."""
|
||||
files = []
|
||||
for base_path in base_paths:
|
||||
path_obj = Path(base_path)
|
||||
if path_obj.is_dir():
|
||||
for ext in extensions:
|
||||
files.extend(path_obj.rglob(f'*{ext}'))
|
||||
return files
|
||||
|
||||
def extract_complete_instantiation(lines, start_idx, device_op_name):
|
||||
"""Extract a complete device operation instantiation with balanced brackets."""
|
||||
if start_idx >= len(lines):
|
||||
return None, start_idx
|
||||
|
||||
line = lines[start_idx]
|
||||
|
||||
# Check if this line contains the device op
|
||||
if device_op_name + '<' not in line:
|
||||
return None, start_idx
|
||||
|
||||
# Start collecting the full instantiation
|
||||
full_lines = [line]
|
||||
bracket_count = line.count('<') - line.count('>')
|
||||
|
||||
# Continue to next lines until brackets are balanced
|
||||
idx = start_idx + 1
|
||||
while bracket_count > 0 and idx < len(lines):
|
||||
full_lines.append(lines[idx])
|
||||
bracket_count += lines[idx].count('<') - lines[idx].count('>')
|
||||
idx += 1
|
||||
|
||||
# Join and clean up
|
||||
full_text = '\n'.join(full_lines)
|
||||
|
||||
return {
|
||||
'line_start': start_idx + 1,
|
||||
'line_end': idx,
|
||||
'full_text': full_text.strip(),
|
||||
'compressed_text': ' '.join(full_text.split()) # Single line version
|
||||
}, idx
|
||||
|
||||
def extract_all_instantiations(content, device_op_name):
|
||||
"""Extract ALL device operation instantiations from content."""
|
||||
instantiations = []
|
||||
lines = content.split('\n')
|
||||
|
||||
idx = 0
|
||||
while idx < len(lines):
|
||||
inst, next_idx = extract_complete_instantiation(lines, idx, device_op_name)
|
||||
if inst:
|
||||
instantiations.append(inst)
|
||||
idx = next_idx
|
||||
else:
|
||||
idx += 1
|
||||
|
||||
return instantiations
|
||||
|
||||
def parse_template_parameters(instantiation_text):
|
||||
"""Parse template parameters from instantiation text (simplified)."""
|
||||
# This is a simplified parser - actual parsing would need full C++ template parser
|
||||
# For now, just extract key information
|
||||
params = {}
|
||||
|
||||
# Extract data types
|
||||
type_patterns = {
|
||||
'BF16': r'\bBF16\b',
|
||||
'F16': r'\bF16\b',
|
||||
'F32': r'\bF32\b',
|
||||
'TF32': r'\bTF32\b',
|
||||
'INT8': r'\bint8_t\b',
|
||||
'F8': r'\bF8\b',
|
||||
'BF8': r'\bBF8\b',
|
||||
}
|
||||
|
||||
for type_name, pattern in type_patterns.items():
|
||||
if re.search(pattern, instantiation_text):
|
||||
params[f'uses_{type_name}'] = True
|
||||
|
||||
# Extract block sizes if visible
|
||||
block_size_match = re.search(r',\s*(\d{2,3}),\s*(\d{2,3}),\s*(\d{2,3}),\s*(\d{1,3}),', instantiation_text)
|
||||
if block_size_match:
|
||||
params['block_size'] = block_size_match.group(1)
|
||||
params['m_per_block'] = block_size_match.group(2)
|
||||
params['n_per_block'] = block_size_match.group(3)
|
||||
params['k_per_block'] = block_size_match.group(4)
|
||||
|
||||
return params
|
||||
|
||||
def main():
|
||||
# Device operations to extract
|
||||
target_device_ops = [
|
||||
'DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle',
|
||||
'DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3',
|
||||
'DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K',
|
||||
'DeviceConv2dFwdXdl_C_Shuffle_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K',
|
||||
'DeviceGroupedConvFwdMultipleD_Wmma_CShuffle',
|
||||
'DeviceGroupedConvFwdDlMultipleD_NHWC_KYXC_NHWK',
|
||||
'DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor',
|
||||
'DeviceConvFwd',
|
||||
]
|
||||
|
||||
# Define paths
|
||||
header_paths = [
|
||||
'library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd',
|
||||
'library/include/ck/library/tensor_operation_instance/gpu',
|
||||
]
|
||||
|
||||
source_paths = [
|
||||
'library/src/tensor_operation_instance/gpu/grouped_conv1d_fwd',
|
||||
'library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd',
|
||||
'library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd',
|
||||
'library/src/tensor_operation_instance/gpu/conv2d_fwd',
|
||||
]
|
||||
|
||||
# Find all files
|
||||
header_files = find_files(header_paths, ['.hpp', '.h', '.inc'])
|
||||
source_files = find_files(source_paths, ['.cpp', '.hpp', '.inc', '.in'])
|
||||
all_files = list(set(header_files + source_files))
|
||||
|
||||
print(f"Analyzing {len(all_files)} files for ALL instantiations...\n")
|
||||
|
||||
# Collect data
|
||||
all_results = {}
|
||||
|
||||
for device_op in target_device_ops:
|
||||
print(f"Extracting all instantiations for {device_op}...")
|
||||
|
||||
device_op_data = {
|
||||
'device_operation_name': device_op,
|
||||
'files': {},
|
||||
'total_instantiations': 0
|
||||
}
|
||||
|
||||
for file_path in all_files:
|
||||
try:
|
||||
with open(file_path, 'r', encoding='utf-8', errors='ignore') as f:
|
||||
content = f.read()
|
||||
|
||||
if device_op in content:
|
||||
instantiations = extract_all_instantiations(content, device_op)
|
||||
|
||||
if instantiations:
|
||||
rel_path = str(file_path).replace(os.getcwd() + '/', '')
|
||||
|
||||
# Process each instantiation
|
||||
processed_insts = []
|
||||
for inst in instantiations:
|
||||
processed_inst = {
|
||||
'line_start': inst['line_start'],
|
||||
'line_end': inst['line_end'],
|
||||
'full_text': inst['full_text'],
|
||||
'compressed_text': inst['compressed_text'],
|
||||
'template_params': parse_template_parameters(inst['full_text'])
|
||||
}
|
||||
processed_insts.append(processed_inst)
|
||||
|
||||
device_op_data['files'][rel_path] = processed_insts
|
||||
device_op_data['total_instantiations'] += len(instantiations)
|
||||
|
||||
except Exception as e:
|
||||
print(f" Error processing {file_path}: {e}")
|
||||
|
||||
all_results[device_op] = device_op_data
|
||||
print(f" Found {device_op_data['total_instantiations']} instantiations in {len(device_op_data['files'])} files")
|
||||
|
||||
# Generate comprehensive TXT report
|
||||
print("\nGenerating comprehensive TXT report...")
|
||||
txt_report = []
|
||||
txt_report.append("=" * 120)
|
||||
txt_report.append("COMPLETE FORWARD CONVOLUTION DEVICE OPERATIONS AND ALL TEMPLATE INSTANTIATIONS")
|
||||
txt_report.append("=" * 120)
|
||||
txt_report.append("")
|
||||
txt_report.append(f"Namespace: ck::tensor_operation::device")
|
||||
txt_report.append(f"Total Device Operation Types: {len(target_device_ops)}")
|
||||
total_insts = sum(data['total_instantiations'] for data in all_results.values())
|
||||
txt_report.append(f"Total Template Instantiations: {total_insts}")
|
||||
txt_report.append("")
|
||||
|
||||
# Summary table
|
||||
txt_report.append("SUMMARY OF DEVICE OPERATIONS")
|
||||
txt_report.append("-" * 120)
|
||||
for device_op in target_device_ops:
|
||||
data = all_results[device_op]
|
||||
txt_report.append(f"{data['total_instantiations']:4d} instantiations | {len(data['files']):3d} files | {device_op}")
|
||||
txt_report.append("")
|
||||
txt_report.append("=" * 120)
|
||||
txt_report.append("")
|
||||
|
||||
# Detailed section for each device op - ALL instantiations
|
||||
for device_op in target_device_ops:
|
||||
data = all_results[device_op]
|
||||
|
||||
txt_report.append("")
|
||||
txt_report.append("=" * 120)
|
||||
txt_report.append(f"DEVICE OPERATION: {device_op}")
|
||||
txt_report.append("=" * 120)
|
||||
txt_report.append("")
|
||||
txt_report.append(f"Total Instantiations: {data['total_instantiations']}")
|
||||
txt_report.append(f"Number of Files: {len(data['files'])}")
|
||||
txt_report.append("")
|
||||
|
||||
# List ALL instantiations in each file
|
||||
for file_path, instantiations in sorted(data['files'].items()):
|
||||
txt_report.append("-" * 120)
|
||||
txt_report.append(f"FILE: {file_path}")
|
||||
txt_report.append(f"Instantiations: {len(instantiations)}")
|
||||
txt_report.append("")
|
||||
|
||||
for idx, inst in enumerate(instantiations, 1):
|
||||
txt_report.append(f"[{idx}] Lines {inst['line_start']}-{inst['line_end']}:")
|
||||
txt_report.append("")
|
||||
# Include full text with proper indentation
|
||||
for line in inst['full_text'].split('\n'):
|
||||
txt_report.append(f" {line}")
|
||||
txt_report.append("")
|
||||
|
||||
txt_report.append("")
|
||||
|
||||
txt_report.append("")
|
||||
|
||||
# Save TXT report
|
||||
txt_output = 'forward_conv_all_instantiations.txt'
|
||||
with open(txt_output, 'w') as f:
|
||||
f.write('\n'.join(txt_report))
|
||||
print(f"✓ TXT report saved: {txt_output}")
|
||||
|
||||
# Generate JSON for programmatic use
|
||||
print("Generating JSON for programmatic use...")
|
||||
json_data = {
|
||||
'metadata': {
|
||||
'description': 'Forward convolution device operations and template instantiations',
|
||||
'namespace': 'ck::tensor_operation::device',
|
||||
'total_device_operations': len(target_device_ops),
|
||||
'total_instantiations': total_insts
|
||||
},
|
||||
'device_operations': {}
|
||||
}
|
||||
|
||||
for device_op in target_device_ops:
|
||||
data = all_results[device_op]
|
||||
|
||||
json_device_op = {
|
||||
'name': device_op,
|
||||
'total_instantiations': data['total_instantiations'],
|
||||
'total_files': len(data['files']),
|
||||
'instantiations_by_file': {}
|
||||
}
|
||||
|
||||
for file_path, instantiations in data['files'].items():
|
||||
file_insts = []
|
||||
for inst in instantiations:
|
||||
file_insts.append({
|
||||
'line_start': inst['line_start'],
|
||||
'line_end': inst['line_end'],
|
||||
'instantiation_text': inst['full_text'],
|
||||
'instantiation_compressed': inst['compressed_text'],
|
||||
'parsed_parameters': inst['template_params']
|
||||
})
|
||||
|
||||
json_device_op['instantiations_by_file'][file_path] = {
|
||||
'count': len(file_insts),
|
||||
'instantiations': file_insts
|
||||
}
|
||||
|
||||
json_data['device_operations'][device_op] = json_device_op
|
||||
|
||||
# Save JSON
|
||||
json_output = 'forward_conv_all_instantiations.json'
|
||||
with open(json_output, 'w') as f:
|
||||
json.dump(json_data, f, indent=2)
|
||||
print(f"✓ JSON file saved: {json_output}")
|
||||
|
||||
# Summary
|
||||
print(f"\n{'='*60}")
|
||||
print(f"GENERATION COMPLETE")
|
||||
print(f"{'='*60}")
|
||||
print(f"TXT File: {txt_output}")
|
||||
print(f"JSON File: {json_output}")
|
||||
print(f"Total instantiations extracted: {total_insts}")
|
||||
print(f"{'='*60}")
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
484
extract_backward_conv_ops.py
Normal file
484
extract_backward_conv_ops.py
Normal file
@@ -0,0 +1,484 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Extract ALL backward convolution device operations and template instantiations.
|
||||
Handles both backward weight and backward data convolutions.
|
||||
Generates human-readable .txt and programmatic JSON outputs.
|
||||
"""
|
||||
|
||||
import os
|
||||
import re
|
||||
from pathlib import Path
|
||||
from collections import defaultdict
|
||||
import json
|
||||
|
||||
def find_files(base_paths, extensions):
|
||||
"""Find all files with given extensions in base paths."""
|
||||
files = []
|
||||
for base_path in base_paths:
|
||||
path_obj = Path(base_path)
|
||||
if path_obj.is_dir():
|
||||
for ext in extensions:
|
||||
files.extend(path_obj.rglob(f'*{ext}'))
|
||||
return files
|
||||
|
||||
def extract_complete_instantiation(lines, start_idx, device_op_name):
|
||||
"""Extract a complete device operation instantiation with balanced brackets."""
|
||||
if start_idx >= len(lines):
|
||||
return None, start_idx
|
||||
|
||||
line = lines[start_idx]
|
||||
|
||||
# Check if this line contains the device op
|
||||
if device_op_name + '<' not in line:
|
||||
return None, start_idx
|
||||
|
||||
# Start collecting the full instantiation
|
||||
full_lines = [line]
|
||||
bracket_count = line.count('<') - line.count('>')
|
||||
|
||||
# Continue to next lines until brackets are balanced
|
||||
idx = start_idx + 1
|
||||
while bracket_count > 0 and idx < len(lines):
|
||||
full_lines.append(lines[idx])
|
||||
bracket_count += lines[idx].count('<') - lines[idx].count('>')
|
||||
idx += 1
|
||||
|
||||
# Join and clean up
|
||||
full_text = '\n'.join(full_lines)
|
||||
|
||||
return {
|
||||
'line_start': start_idx + 1,
|
||||
'line_end': idx,
|
||||
'full_text': full_text.strip(),
|
||||
'compressed_text': ' '.join(full_text.split())
|
||||
}, idx
|
||||
|
||||
def extract_all_instantiations(content, device_op_name):
|
||||
"""Extract ALL device operation instantiations from content."""
|
||||
instantiations = []
|
||||
lines = content.split('\n')
|
||||
|
||||
idx = 0
|
||||
while idx < len(lines):
|
||||
inst, next_idx = extract_complete_instantiation(lines, idx, device_op_name)
|
||||
if inst:
|
||||
instantiations.append(inst)
|
||||
idx = next_idx
|
||||
else:
|
||||
idx += 1
|
||||
|
||||
return instantiations
|
||||
|
||||
def parse_template_parameters(instantiation_text):
|
||||
"""Parse template parameters from instantiation text."""
|
||||
params = {}
|
||||
|
||||
# Extract data types
|
||||
type_patterns = {
|
||||
'BF16': r'\bBF16\b',
|
||||
'F16': r'\bF16\b',
|
||||
'F32': r'\bF32\b',
|
||||
'TF32': r'\bTF32\b',
|
||||
'INT8': r'\bint8_t\b',
|
||||
'F8': r'\bF8\b',
|
||||
'BF8': r'\bBF8\b',
|
||||
}
|
||||
|
||||
for type_name, pattern in type_patterns.items():
|
||||
if re.search(pattern, instantiation_text):
|
||||
params[f'uses_{type_name}'] = True
|
||||
|
||||
# Extract block sizes if visible
|
||||
block_size_match = re.search(r',\s*(\d{2,3}),\s*(\d{2,3}),\s*(\d{2,3}),\s*(\d{1,3}),', instantiation_text)
|
||||
if block_size_match:
|
||||
params['block_size'] = block_size_match.group(1)
|
||||
params['m_per_block'] = block_size_match.group(2)
|
||||
params['n_per_block'] = block_size_match.group(3)
|
||||
params['k_per_block'] = block_size_match.group(4)
|
||||
|
||||
return params
|
||||
|
||||
def find_device_operations(all_files):
|
||||
"""Find all device operations matching backward convolution patterns."""
|
||||
device_ops = set()
|
||||
|
||||
# Patterns for backward operations
|
||||
patterns = [
|
||||
r'(Device\w*Conv\w*Bwd\w*Weight\w*)<',
|
||||
r'(Device\w*Conv\w*Bwd\w*Data\w*)<',
|
||||
r'(Device\w*ConvBwd\w*)<',
|
||||
]
|
||||
|
||||
for file_path in all_files:
|
||||
try:
|
||||
with open(file_path, 'r', encoding='utf-8', errors='ignore') as f:
|
||||
content = f.read()
|
||||
|
||||
for pattern in patterns:
|
||||
matches = re.finditer(pattern, content)
|
||||
for match in matches:
|
||||
device_ops.add(match.group(1))
|
||||
except:
|
||||
pass
|
||||
|
||||
return sorted(list(device_ops))
|
||||
|
||||
def main():
|
||||
print("="*80)
|
||||
print("EXTRACTING BACKWARD CONVOLUTION DEVICE OPERATIONS")
|
||||
print("="*80)
|
||||
print()
|
||||
|
||||
# Define paths
|
||||
header_paths = [
|
||||
'library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_bwd_weight',
|
||||
'library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_bwd_data',
|
||||
'library/include/ck/library/tensor_operation_instance/gpu',
|
||||
]
|
||||
|
||||
source_paths = [
|
||||
'library/src/tensor_operation_instance/gpu/grouped_conv1d_bwd_weight',
|
||||
'library/src/tensor_operation_instance/gpu/grouped_conv2d_bwd_weight',
|
||||
'library/src/tensor_operation_instance/gpu/grouped_conv3d_bwd_weight',
|
||||
'library/src/tensor_operation_instance/gpu/grouped_convnd_bwd_weight',
|
||||
'library/src/tensor_operation_instance/gpu/conv1d_bwd_data',
|
||||
'library/src/tensor_operation_instance/gpu/conv2d_bwd_data',
|
||||
'library/src/tensor_operation_instance/gpu/conv3d_bwd_data',
|
||||
'library/src/tensor_operation_instance/gpu/grouped_conv1d_bwd_data',
|
||||
'library/src/tensor_operation_instance/gpu/grouped_conv2d_bwd_data',
|
||||
'library/src/tensor_operation_instance/gpu/grouped_conv3d_bwd_data',
|
||||
]
|
||||
|
||||
# Find all files
|
||||
header_files = find_files(header_paths, ['.hpp', '.h', '.inc'])
|
||||
source_files = find_files(source_paths, ['.cpp', '.hpp', '.inc', '.in'])
|
||||
all_files = list(set(header_files + source_files))
|
||||
|
||||
print(f"Found {len(all_files)} files to analyze")
|
||||
|
||||
# Discover all backward device operations
|
||||
print("Discovering device operations...")
|
||||
target_device_ops = find_device_operations(all_files)
|
||||
print(f"Found {len(target_device_ops)} device operation types")
|
||||
for op in target_device_ops:
|
||||
print(f" - {op}")
|
||||
print()
|
||||
|
||||
# Collect data
|
||||
all_results = {}
|
||||
|
||||
for device_op in target_device_ops:
|
||||
print(f"Extracting all instantiations for {device_op}...")
|
||||
|
||||
device_op_data = {
|
||||
'device_operation_name': device_op,
|
||||
'files': {},
|
||||
'total_instantiations': 0
|
||||
}
|
||||
|
||||
for file_path in all_files:
|
||||
try:
|
||||
with open(file_path, 'r', encoding='utf-8', errors='ignore') as f:
|
||||
content = f.read()
|
||||
|
||||
if device_op in content:
|
||||
instantiations = extract_all_instantiations(content, device_op)
|
||||
|
||||
if instantiations:
|
||||
rel_path = str(file_path).replace(os.getcwd() + '/', '')
|
||||
|
||||
processed_insts = []
|
||||
for inst in instantiations:
|
||||
processed_inst = {
|
||||
'line_start': inst['line_start'],
|
||||
'line_end': inst['line_end'],
|
||||
'full_text': inst['full_text'],
|
||||
'compressed_text': inst['compressed_text'],
|
||||
'template_params': parse_template_parameters(inst['full_text'])
|
||||
}
|
||||
processed_insts.append(processed_inst)
|
||||
|
||||
device_op_data['files'][rel_path] = processed_insts
|
||||
device_op_data['total_instantiations'] += len(instantiations)
|
||||
|
||||
except Exception as e:
|
||||
pass
|
||||
|
||||
all_results[device_op] = device_op_data
|
||||
print(f" Found {device_op_data['total_instantiations']} instantiations in {len(device_op_data['files'])} files")
|
||||
|
||||
# Separate by type
|
||||
bwd_weight_ops = {k: v for k, v in all_results.items() if 'Weight' in k or 'Wgt' in k}
|
||||
bwd_data_ops = {k: v for k, v in all_results.items() if 'Data' in k}
|
||||
|
||||
print(f"\nBackward Weight Operations: {len(bwd_weight_ops)}")
|
||||
print(f"Backward Data Operations: {len(bwd_data_ops)}")
|
||||
|
||||
# Generate TXT report
|
||||
print("\nGenerating comprehensive TXT report...")
|
||||
txt_report = generate_txt_report(all_results, bwd_weight_ops, bwd_data_ops)
|
||||
|
||||
txt_output = 'backward_conv_all_instantiations.txt'
|
||||
with open(txt_output, 'w') as f:
|
||||
f.write(txt_report)
|
||||
print(f"✓ TXT report saved: {txt_output}")
|
||||
|
||||
# Generate JSON
|
||||
print("Generating JSON for programmatic use...")
|
||||
json_data = generate_json_data(all_results, bwd_weight_ops, bwd_data_ops)
|
||||
|
||||
json_output = 'backward_conv_all_instantiations.json'
|
||||
with open(json_output, 'w') as f:
|
||||
json.dump(json_data, f, indent=2)
|
||||
print(f"✓ JSON file saved: {json_output}")
|
||||
|
||||
# Generate markdown summary
|
||||
print("Generating markdown summary...")
|
||||
md_content = generate_markdown_summary(all_results, bwd_weight_ops, bwd_data_ops)
|
||||
|
||||
md_output = 'BACKWARD_CONVOLUTION_DEVICE_OPS_SUMMARY.md'
|
||||
with open(md_output, 'w') as f:
|
||||
f.write(md_content)
|
||||
print(f"✓ Markdown summary saved: {md_output}")
|
||||
|
||||
# Final summary
|
||||
total_insts = sum(data['total_instantiations'] for data in all_results.values())
|
||||
print(f"\n{'='*80}")
|
||||
print(f"GENERATION COMPLETE")
|
||||
print(f"{'='*80}")
|
||||
print(f"Device Operations Found: {len(all_results)}")
|
||||
print(f" - Backward Weight: {len(bwd_weight_ops)}")
|
||||
print(f" - Backward Data: {len(bwd_data_ops)}")
|
||||
print(f"Total Instantiations: {total_insts}")
|
||||
print(f"\nFiles generated:")
|
||||
print(f" - {txt_output}")
|
||||
print(f" - {json_output}")
|
||||
print(f" - {md_output}")
|
||||
print(f"{'='*80}")
|
||||
|
||||
def generate_txt_report(all_results, bwd_weight_ops, bwd_data_ops):
|
||||
"""Generate comprehensive TXT report."""
|
||||
report = []
|
||||
report.append("=" * 120)
|
||||
report.append("COMPLETE BACKWARD CONVOLUTION DEVICE OPERATIONS AND ALL TEMPLATE INSTANTIATIONS")
|
||||
report.append("=" * 120)
|
||||
report.append("")
|
||||
report.append(f"Namespace: ck::tensor_operation::device")
|
||||
report.append(f"Total Device Operation Types: {len(all_results)}")
|
||||
total_insts = sum(data['total_instantiations'] for data in all_results.values())
|
||||
report.append(f"Total Template Instantiations: {total_insts}")
|
||||
report.append("")
|
||||
|
||||
# Summary
|
||||
report.append("SUMMARY OF DEVICE OPERATIONS")
|
||||
report.append("-" * 120)
|
||||
report.append("")
|
||||
report.append("BACKWARD WEIGHT OPERATIONS:")
|
||||
for device_op, data in sorted(bwd_weight_ops.items(), key=lambda x: x[1]['total_instantiations'], reverse=True):
|
||||
report.append(f" {data['total_instantiations']:4d} instantiations | {len(data['files']):3d} files | {device_op}")
|
||||
report.append("")
|
||||
report.append("BACKWARD DATA OPERATIONS:")
|
||||
for device_op, data in sorted(bwd_data_ops.items(), key=lambda x: x[1]['total_instantiations'], reverse=True):
|
||||
report.append(f" {data['total_instantiations']:4d} instantiations | {len(data['files']):3d} files | {device_op}")
|
||||
report.append("")
|
||||
report.append("=" * 120)
|
||||
report.append("")
|
||||
|
||||
# Detailed sections
|
||||
for category_name, ops_dict in [("BACKWARD WEIGHT", bwd_weight_ops), ("BACKWARD DATA", bwd_data_ops)]:
|
||||
report.append("")
|
||||
report.append("=" * 120)
|
||||
report.append(f"{category_name} OPERATIONS")
|
||||
report.append("=" * 120)
|
||||
report.append("")
|
||||
|
||||
for device_op in sorted(ops_dict.keys()):
|
||||
data = all_results[device_op]
|
||||
|
||||
report.append("=" * 120)
|
||||
report.append(f"DEVICE OPERATION: {device_op}")
|
||||
report.append("=" * 120)
|
||||
report.append("")
|
||||
report.append(f"Total Instantiations: {data['total_instantiations']}")
|
||||
report.append(f"Number of Files: {len(data['files'])}")
|
||||
report.append("")
|
||||
|
||||
for file_path, instantiations in sorted(data['files'].items()):
|
||||
report.append("-" * 120)
|
||||
report.append(f"FILE: {file_path}")
|
||||
report.append(f"Instantiations: {len(instantiations)}")
|
||||
report.append("")
|
||||
|
||||
for idx, inst in enumerate(instantiations, 1):
|
||||
report.append(f"[{idx}] Lines {inst['line_start']}-{inst['line_end']}:")
|
||||
report.append("")
|
||||
for line in inst['full_text'].split('\n'):
|
||||
report.append(f" {line}")
|
||||
report.append("")
|
||||
|
||||
report.append("")
|
||||
|
||||
report.append("")
|
||||
|
||||
return '\n'.join(report)
|
||||
|
||||
def generate_json_data(all_results, bwd_weight_ops, bwd_data_ops):
|
||||
"""Generate JSON data for programmatic use."""
|
||||
total_insts = sum(data['total_instantiations'] for data in all_results.values())
|
||||
|
||||
json_data = {
|
||||
'metadata': {
|
||||
'description': 'Backward convolution device operations and template instantiations',
|
||||
'namespace': 'ck::tensor_operation::device',
|
||||
'total_device_operations': len(all_results),
|
||||
'backward_weight_operations': len(bwd_weight_ops),
|
||||
'backward_data_operations': len(bwd_data_ops),
|
||||
'total_instantiations': total_insts
|
||||
},
|
||||
'backward_weight_operations': {},
|
||||
'backward_data_operations': {}
|
||||
}
|
||||
|
||||
# Process backward weight operations
|
||||
for device_op, data in bwd_weight_ops.items():
|
||||
json_device_op = {
|
||||
'name': device_op,
|
||||
'total_instantiations': data['total_instantiations'],
|
||||
'total_files': len(data['files']),
|
||||
'instantiations_by_file': {}
|
||||
}
|
||||
|
||||
for file_path, instantiations in data['files'].items():
|
||||
file_insts = []
|
||||
for inst in instantiations:
|
||||
file_insts.append({
|
||||
'line_start': inst['line_start'],
|
||||
'line_end': inst['line_end'],
|
||||
'instantiation_text': inst['full_text'],
|
||||
'instantiation_compressed': inst['compressed_text'],
|
||||
'parsed_parameters': inst['template_params']
|
||||
})
|
||||
|
||||
json_device_op['instantiations_by_file'][file_path] = {
|
||||
'count': len(file_insts),
|
||||
'instantiations': file_insts
|
||||
}
|
||||
|
||||
json_data['backward_weight_operations'][device_op] = json_device_op
|
||||
|
||||
# Process backward data operations
|
||||
for device_op, data in bwd_data_ops.items():
|
||||
json_device_op = {
|
||||
'name': device_op,
|
||||
'total_instantiations': data['total_instantiations'],
|
||||
'total_files': len(data['files']),
|
||||
'instantiations_by_file': {}
|
||||
}
|
||||
|
||||
for file_path, instantiations in data['files'].items():
|
||||
file_insts = []
|
||||
for inst in instantiations:
|
||||
file_insts.append({
|
||||
'line_start': inst['line_start'],
|
||||
'line_end': inst['line_end'],
|
||||
'instantiation_text': inst['full_text'],
|
||||
'instantiation_compressed': inst['compressed_text'],
|
||||
'parsed_parameters': inst['template_params']
|
||||
})
|
||||
|
||||
json_device_op['instantiations_by_file'][file_path] = {
|
||||
'count': len(file_insts),
|
||||
'instantiations': file_insts
|
||||
}
|
||||
|
||||
json_data['backward_data_operations'][device_op] = json_device_op
|
||||
|
||||
return json_data
|
||||
|
||||
def generate_markdown_summary(all_results, bwd_weight_ops, bwd_data_ops):
|
||||
"""Generate markdown summary document."""
|
||||
total_insts = sum(data['total_instantiations'] for data in all_results.values())
|
||||
bwd_weight_insts = sum(data['total_instantiations'] for data in bwd_weight_ops.values())
|
||||
bwd_data_insts = sum(data['total_instantiations'] for data in bwd_data_ops.values())
|
||||
|
||||
md = []
|
||||
md.append("# Backward Convolution Device Operations - Comprehensive Summary")
|
||||
md.append("")
|
||||
md.append("**Generated:** October 29, 2025 ")
|
||||
md.append("**Namespace:** `ck::tensor_operation::device` ")
|
||||
md.append("**Location:** `/library/include` and `/library/src` directories")
|
||||
md.append("")
|
||||
md.append("---")
|
||||
md.append("")
|
||||
md.append("## Overview")
|
||||
md.append("")
|
||||
md.append("This document provides a comprehensive list of all device operations used for backward convolutions (both weight gradients and data gradients) in the Composable Kernel library.")
|
||||
md.append("")
|
||||
md.append("### Total Statistics")
|
||||
md.append(f"- **Unique Device Operation Types:** {len(all_results)}")
|
||||
md.append(f"- **Backward Weight Operations:** {len(bwd_weight_ops)} types ({bwd_weight_insts} instantiations)")
|
||||
md.append(f"- **Backward Data Operations:** {len(bwd_data_ops)} types ({bwd_data_insts} instantiations)")
|
||||
md.append(f"- **Total Template Instantiations:** {total_insts}")
|
||||
md.append("")
|
||||
md.append("---")
|
||||
md.append("")
|
||||
|
||||
# Backward Weight Table
|
||||
md.append("## Backward Weight Device Operations")
|
||||
md.append("")
|
||||
md.append("| # | Device Operation Name | Instantiations | Files | Primary Location |")
|
||||
md.append("|---|----------------------|----------------|-------|------------------|")
|
||||
|
||||
for idx, (device_op, data) in enumerate(sorted(bwd_weight_ops.items(), key=lambda x: x[1]['total_instantiations'], reverse=True), 1):
|
||||
primary_file = sorted(data['files'].keys())[0] if data['files'] else "N/A"
|
||||
primary_file_short = primary_file.split('/')[-1] if '/' in primary_file else primary_file
|
||||
md.append(f"| {idx} | `{device_op}` | {data['total_instantiations']} | {len(data['files'])} | `{primary_file_short}` |")
|
||||
|
||||
md.append(f"| **Total** | **Backward Weight Operations** | **{bwd_weight_insts}** | **{sum(len(d['files']) for d in bwd_weight_ops.values())}** | |")
|
||||
md.append("")
|
||||
md.append("---")
|
||||
md.append("")
|
||||
|
||||
# Backward Data Table
|
||||
md.append("## Backward Data Device Operations")
|
||||
md.append("")
|
||||
md.append("| # | Device Operation Name | Instantiations | Files | Primary Location |")
|
||||
md.append("|---|----------------------|----------------|-------|------------------|")
|
||||
|
||||
for idx, (device_op, data) in enumerate(sorted(bwd_data_ops.items(), key=lambda x: x[1]['total_instantiations'], reverse=True), 1):
|
||||
primary_file = sorted(data['files'].keys())[0] if data['files'] else "N/A"
|
||||
primary_file_short = primary_file.split('/')[-1] if '/' in primary_file else primary_file
|
||||
md.append(f"| {idx} | `{device_op}` | {data['total_instantiations']} | {len(data['files'])} | `{primary_file_short}` |")
|
||||
|
||||
md.append(f"| **Total** | **Backward Data Operations** | **{bwd_data_insts}** | **{sum(len(d['files']) for d in bwd_data_ops.values())}** | |")
|
||||
md.append("")
|
||||
md.append(f"### Grand Total: {total_insts} Template Instantiations across {len(all_results)} Device Operation Types")
|
||||
md.append("")
|
||||
md.append("---")
|
||||
md.append("")
|
||||
|
||||
# Output files section
|
||||
md.append("## Output Files")
|
||||
md.append("")
|
||||
md.append("### Complete Instantiation Files")
|
||||
md.append("")
|
||||
md.append("1. **`backward_conv_all_instantiations.txt`**")
|
||||
md.append(" - COMPLETE listing of ALL template instantiations")
|
||||
md.append(" - Human-readable format with line numbers")
|
||||
md.append(" - Organized by operation type, then by file")
|
||||
md.append("")
|
||||
md.append("2. **`backward_conv_all_instantiations.json`**")
|
||||
md.append(" - Structured JSON for programmatic instantiation generation")
|
||||
md.append(" - Separated into `backward_weight_operations` and `backward_data_operations`")
|
||||
md.append(" - Each instantiation includes full text and parsed parameters")
|
||||
md.append(" - Ready for automated code generation")
|
||||
md.append("")
|
||||
md.append("3. **`BACKWARD_CONVOLUTION_DEVICE_OPS_SUMMARY.md`** (This file)")
|
||||
md.append(" - Executive summary with tables")
|
||||
md.append(" - Quick reference for all backward operations")
|
||||
md.append("")
|
||||
|
||||
return '\n'.join(md)
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
195
extract_conv_fwd_device_ops.py
Normal file
195
extract_conv_fwd_device_ops.py
Normal file
@@ -0,0 +1,195 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Extract all forward convolution device operations and their template instantiations
|
||||
from CK library headers and source files.
|
||||
"""
|
||||
|
||||
import os
|
||||
import re
|
||||
from pathlib import Path
|
||||
from collections import defaultdict
|
||||
import json
|
||||
|
||||
# Device operation patterns to search for
|
||||
DEVICE_OP_PATTERNS = [
|
||||
r'(Device\w*Conv\w*Fwd\w*)<', # Generic pattern for device ops
|
||||
]
|
||||
|
||||
def find_files(base_paths, extensions):
|
||||
"""Find all files with given extensions in base paths."""
|
||||
files = []
|
||||
for base_path in base_paths:
|
||||
path_obj = Path(base_path)
|
||||
if path_obj.is_dir():
|
||||
for ext in extensions:
|
||||
files.extend(path_obj.rglob(f'*{ext}'))
|
||||
return files
|
||||
|
||||
def extract_device_operations(file_path):
|
||||
"""Extract device operations from a file."""
|
||||
device_ops = []
|
||||
|
||||
try:
|
||||
with open(file_path, 'r', encoding='utf-8', errors='ignore') as f:
|
||||
content = f.read()
|
||||
|
||||
# Search for device operation class names
|
||||
for pattern in DEVICE_OP_PATTERNS:
|
||||
matches = re.finditer(pattern, content)
|
||||
for match in matches:
|
||||
device_op_name = match.group(1)
|
||||
# Store the device operation with file location
|
||||
device_ops.append({
|
||||
'name': device_op_name,
|
||||
'file': str(file_path),
|
||||
'line_context': None # We'll extract context later if needed
|
||||
})
|
||||
except Exception as e:
|
||||
print(f"Error processing {file_path}: {e}")
|
||||
|
||||
return device_ops
|
||||
|
||||
def extract_template_instantiation(content, device_op_name):
|
||||
"""Extract a single template instantiation for analysis."""
|
||||
# Look for the full template instantiation
|
||||
pattern = rf'{re.escape(device_op_name)}<[^>]*(?:<[^>]*>)*[^>]*>'
|
||||
|
||||
# This is complex due to nested templates, so let's use a simpler approach
|
||||
# Just find lines with the device op name
|
||||
lines = content.split('\n')
|
||||
instantiations = []
|
||||
|
||||
for i, line in enumerate(lines):
|
||||
if device_op_name in line and '<' in line:
|
||||
# Try to capture the full instantiation (might span multiple lines)
|
||||
instantiation_text = line.strip()
|
||||
|
||||
# Count template brackets to see if we have a complete instantiation
|
||||
open_brackets = instantiation_text.count('<')
|
||||
close_brackets = instantiation_text.count('>')
|
||||
|
||||
# If brackets are balanced, we have a complete line
|
||||
if open_brackets > 0:
|
||||
instantiations.append({
|
||||
'line_number': i + 1,
|
||||
'text': instantiation_text[:200] # Truncate for readability
|
||||
})
|
||||
|
||||
return instantiations
|
||||
|
||||
def main():
|
||||
# Define paths to search
|
||||
header_paths = [
|
||||
'library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd',
|
||||
'library/include/ck/library/tensor_operation_instance/gpu',
|
||||
]
|
||||
|
||||
source_paths = [
|
||||
'library/src/tensor_operation_instance/gpu/grouped_conv1d_fwd',
|
||||
'library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd',
|
||||
'library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd',
|
||||
'library/src/tensor_operation_instance/gpu/conv2d_fwd',
|
||||
]
|
||||
|
||||
all_paths = header_paths + source_paths
|
||||
|
||||
# Find all relevant files
|
||||
header_files = find_files(header_paths, ['.hpp', '.h'])
|
||||
source_files = find_files(source_paths, ['.cpp', '.hpp', '.inc', '.in'])
|
||||
|
||||
all_files = list(set(header_files + source_files))
|
||||
|
||||
print(f"Found {len(all_files)} files to analyze")
|
||||
|
||||
# Collect all device operations
|
||||
all_device_ops = []
|
||||
for file_path in all_files:
|
||||
ops = extract_device_operations(file_path)
|
||||
all_device_ops.extend(ops)
|
||||
|
||||
# Group by device operation name
|
||||
ops_by_name = defaultdict(list)
|
||||
for op in all_device_ops:
|
||||
ops_by_name[op['name']].append(op)
|
||||
|
||||
# Count unique device operations
|
||||
unique_ops = sorted(ops_by_name.keys())
|
||||
|
||||
print(f"\nFound {len(unique_ops)} unique device operation types:")
|
||||
for op_name in unique_ops:
|
||||
count = len(ops_by_name[op_name])
|
||||
print(f" {op_name}: {count} occurrences")
|
||||
|
||||
# Now extract detailed instantiations for each device op
|
||||
device_op_details = {}
|
||||
|
||||
for device_op_name in unique_ops:
|
||||
instantiation_files = set()
|
||||
total_instantiations = 0
|
||||
|
||||
# Re-scan files that contain this device op
|
||||
for file_path in all_files:
|
||||
try:
|
||||
with open(file_path, 'r', encoding='utf-8', errors='ignore') as f:
|
||||
content = f.read()
|
||||
|
||||
if device_op_name in content:
|
||||
instantiations = extract_template_instantiation(content, device_op_name)
|
||||
if instantiations:
|
||||
instantiation_files.add(str(file_path))
|
||||
total_instantiations += len(instantiations)
|
||||
except Exception as e:
|
||||
pass
|
||||
|
||||
device_op_details[device_op_name] = {
|
||||
'files': sorted(list(instantiation_files)),
|
||||
'total_instantiations': total_instantiations
|
||||
}
|
||||
|
||||
# Generate report
|
||||
report = []
|
||||
report.append("=" * 80)
|
||||
report.append("FORWARD CONVOLUTION DEVICE OPERATIONS REPORT")
|
||||
report.append("=" * 80)
|
||||
report.append("")
|
||||
report.append(f"Total Unique Device Operations: {len(unique_ops)}")
|
||||
report.append("")
|
||||
|
||||
for device_op_name in unique_ops:
|
||||
details = device_op_details[device_op_name]
|
||||
report.append("-" * 80)
|
||||
report.append(f"Device Operation: {device_op_name}")
|
||||
report.append(f"Total Template Instantiations: {details['total_instantiations']}")
|
||||
report.append(f"Number of Files: {len(details['files'])}")
|
||||
report.append("")
|
||||
report.append("Files containing this device operation:")
|
||||
for file_path in details['files']:
|
||||
# Make path relative to workspace
|
||||
rel_path = file_path.replace(os.getcwd() + '/', '')
|
||||
report.append(f" - {rel_path}")
|
||||
report.append("")
|
||||
|
||||
# Save report
|
||||
report_text = '\n'.join(report)
|
||||
|
||||
with open('forward_convolution_device_ops_report.txt', 'w') as f:
|
||||
f.write(report_text)
|
||||
|
||||
print(f"\nReport saved to: forward_convolution_device_ops_report.txt")
|
||||
|
||||
# Also save as JSON for programmatic access
|
||||
json_data = {
|
||||
'summary': {
|
||||
'total_device_operations': len(unique_ops),
|
||||
'device_operation_names': unique_ops
|
||||
},
|
||||
'details': device_op_details
|
||||
}
|
||||
|
||||
with open('forward_convolution_device_ops_data.json', 'w') as f:
|
||||
json.dump(json_data, f, indent=2)
|
||||
|
||||
print(f"JSON data saved to: forward_convolution_device_ops_data.json")
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
201
extract_detailed_instantiations.py
Normal file
201
extract_detailed_instantiations.py
Normal file
@@ -0,0 +1,201 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Extract detailed template instantiations for forward convolution device operations.
|
||||
This script parses the actual template parameters for each device operation instance.
|
||||
"""
|
||||
|
||||
import os
|
||||
import re
|
||||
from pathlib import Path
|
||||
from collections import defaultdict
|
||||
|
||||
def find_files(base_paths, extensions):
|
||||
"""Find all files with given extensions in base paths."""
|
||||
files = []
|
||||
for base_path in base_paths:
|
||||
path_obj = Path(base_path)
|
||||
if path_obj.is_dir():
|
||||
for ext in extensions:
|
||||
files.extend(path_obj.rglob(f'*{ext}'))
|
||||
return files
|
||||
|
||||
def extract_device_op_instantiations(content, device_op_name):
|
||||
"""Extract complete device operation instantiations."""
|
||||
instantiations = []
|
||||
lines = content.split('\n')
|
||||
|
||||
i = 0
|
||||
while i < len(lines):
|
||||
line = lines[i]
|
||||
|
||||
# Check if this line contains the device op
|
||||
if device_op_name + '<' in line:
|
||||
# Start collecting the full instantiation
|
||||
full_instantiation = line
|
||||
bracket_count = line.count('<') - line.count('>')
|
||||
|
||||
# Continue to next lines if brackets aren't balanced
|
||||
j = i + 1
|
||||
while bracket_count > 0 and j < len(lines):
|
||||
full_instantiation += '\n' + lines[j]
|
||||
bracket_count += lines[j].count('<') - lines[j].count('>')
|
||||
j += 1
|
||||
|
||||
# Clean up and store
|
||||
instantiation_clean = full_instantiation.strip()
|
||||
if instantiation_clean:
|
||||
instantiations.append({
|
||||
'line_start': i + 1,
|
||||
'line_end': j,
|
||||
'text': instantiation_clean
|
||||
})
|
||||
|
||||
i = j
|
||||
else:
|
||||
i += 1
|
||||
|
||||
return instantiations
|
||||
|
||||
def main():
|
||||
# Device operations we're interested in (from previous analysis)
|
||||
target_device_ops = [
|
||||
'DeviceConv2dFwdXdl_C_Shuffle_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K',
|
||||
'DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K',
|
||||
'DeviceConvFwd',
|
||||
'DeviceGroupedConvFwdDlMultipleD_NHWC_KYXC_NHWK',
|
||||
'DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle',
|
||||
'DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3',
|
||||
'DeviceGroupedConvFwdMultipleD_Wmma_CShuffle',
|
||||
'DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor',
|
||||
]
|
||||
|
||||
# Define paths to search
|
||||
header_paths = [
|
||||
'library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd',
|
||||
'library/include/ck/library/tensor_operation_instance/gpu',
|
||||
]
|
||||
|
||||
source_paths = [
|
||||
'library/src/tensor_operation_instance/gpu/grouped_conv1d_fwd',
|
||||
'library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd',
|
||||
'library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd',
|
||||
'library/src/tensor_operation_instance/gpu/conv2d_fwd',
|
||||
]
|
||||
|
||||
# Find all relevant files
|
||||
header_files = find_files(header_paths, ['.hpp', '.h', '.inc'])
|
||||
source_files = find_files(source_paths, ['.cpp', '.hpp', '.inc', '.in'])
|
||||
|
||||
all_files = list(set(header_files + source_files))
|
||||
|
||||
print(f"Analyzing {len(all_files)} files for detailed instantiations...\n")
|
||||
|
||||
# Process each device operation
|
||||
results = {}
|
||||
|
||||
for device_op in target_device_ops:
|
||||
print(f"Processing {device_op}...")
|
||||
|
||||
device_op_data = {
|
||||
'name': device_op,
|
||||
'files_with_instantiations': {},
|
||||
'total_instantiations': 0
|
||||
}
|
||||
|
||||
for file_path in all_files:
|
||||
try:
|
||||
with open(file_path, 'r', encoding='utf-8', errors='ignore') as f:
|
||||
content = f.read()
|
||||
|
||||
if device_op in content:
|
||||
instantiations = extract_device_op_instantiations(content, device_op)
|
||||
|
||||
if instantiations:
|
||||
rel_path = str(file_path).replace(os.getcwd() + '/', '')
|
||||
device_op_data['files_with_instantiations'][rel_path] = instantiations
|
||||
device_op_data['total_instantiations'] += len(instantiations)
|
||||
except Exception as e:
|
||||
print(f" Error processing {file_path}: {e}")
|
||||
|
||||
results[device_op] = device_op_data
|
||||
print(f" Found {device_op_data['total_instantiations']} instantiations in {len(device_op_data['files_with_instantiations'])} files")
|
||||
|
||||
# Generate detailed report
|
||||
report = []
|
||||
report.append("=" * 100)
|
||||
report.append("DETAILED FORWARD CONVOLUTION DEVICE OPERATIONS AND TEMPLATE INSTANTIATIONS")
|
||||
report.append("=" * 100)
|
||||
report.append("")
|
||||
report.append(f"Generated for: ck::tensor_operation::device namespace")
|
||||
report.append("")
|
||||
|
||||
# Summary
|
||||
report.append("SUMMARY")
|
||||
report.append("-" * 100)
|
||||
total_all_instantiations = sum(data['total_instantiations'] for data in results.values())
|
||||
report.append(f"Total Device Operation Types: {len(target_device_ops)}")
|
||||
report.append(f"Total Template Instantiations: {total_all_instantiations}")
|
||||
report.append("")
|
||||
|
||||
for device_op in target_device_ops:
|
||||
data = results[device_op]
|
||||
report.append(f" • {device_op}")
|
||||
report.append(f" - Instantiations: {data['total_instantiations']}")
|
||||
report.append(f" - Files: {len(data['files_with_instantiations'])}")
|
||||
|
||||
report.append("")
|
||||
report.append("=" * 100)
|
||||
report.append("")
|
||||
|
||||
# Detailed section for each device op
|
||||
for device_op in target_device_ops:
|
||||
data = results[device_op]
|
||||
|
||||
report.append("=" * 100)
|
||||
report.append(f"DEVICE OPERATION: {device_op}")
|
||||
report.append("=" * 100)
|
||||
report.append("")
|
||||
report.append(f"Total Template Instantiations: {data['total_instantiations']}")
|
||||
report.append(f"Number of Files: {len(data['files_with_instantiations'])}")
|
||||
report.append("")
|
||||
|
||||
# List files and show sample instantiations
|
||||
for file_path, instantiations in sorted(data['files_with_instantiations'].items()):
|
||||
report.append("-" * 100)
|
||||
report.append(f"File: {file_path}")
|
||||
report.append(f"Instantiations in this file: {len(instantiations)}")
|
||||
report.append("")
|
||||
|
||||
# Show up to 3 sample instantiations from each file
|
||||
num_samples = min(3, len(instantiations))
|
||||
if num_samples > 0:
|
||||
report.append("Sample instantiations:")
|
||||
for idx, inst in enumerate(instantiations[:num_samples]):
|
||||
report.append(f" [{idx+1}] Line {inst['line_start']}:")
|
||||
# Truncate very long instantiations
|
||||
text = inst['text']
|
||||
if len(text) > 500:
|
||||
text = text[:500] + "..."
|
||||
report.append(f" {text}")
|
||||
|
||||
if len(instantiations) > num_samples:
|
||||
report.append(f" ... and {len(instantiations) - num_samples} more instantiations")
|
||||
|
||||
report.append("")
|
||||
|
||||
report.append("")
|
||||
|
||||
# Save detailed report
|
||||
report_text = '\n'.join(report)
|
||||
|
||||
output_file = 'forward_conv_device_ops_detailed_report.txt'
|
||||
with open(output_file, 'w') as f:
|
||||
f.write(report_text)
|
||||
|
||||
print(f"\n{'='*60}")
|
||||
print(f"Detailed report saved to: {output_file}")
|
||||
print(f"Total instantiations found: {total_all_instantiations}")
|
||||
print(f"{'='*60}")
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
13902
forward_conv_all_instantiations.json
Normal file
13902
forward_conv_all_instantiations.json
Normal file
File diff suppressed because it is too large
Load Diff
4271
forward_conv_all_instantiations.txt
Normal file
4271
forward_conv_all_instantiations.txt
Normal file
File diff suppressed because it is too large
Load Diff
428
forward_conv_device_ops_detailed_report.txt
Normal file
428
forward_conv_device_ops_detailed_report.txt
Normal file
@@ -0,0 +1,428 @@
|
||||
====================================================================================================
|
||||
DETAILED FORWARD CONVOLUTION DEVICE OPERATIONS AND TEMPLATE INSTANTIATIONS
|
||||
====================================================================================================
|
||||
|
||||
Generated for: ck::tensor_operation::device namespace
|
||||
|
||||
SUMMARY
|
||||
----------------------------------------------------------------------------------------------------
|
||||
Total Device Operation Types: 8
|
||||
Total Template Instantiations: 1000
|
||||
|
||||
• DeviceConv2dFwdXdl_C_Shuffle_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K
|
||||
- Instantiations: 61
|
||||
- Files: 1
|
||||
• DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K
|
||||
- Instantiations: 156
|
||||
- Files: 4
|
||||
• DeviceConvFwd
|
||||
- Instantiations: 12
|
||||
- Files: 6
|
||||
• DeviceGroupedConvFwdDlMultipleD_NHWC_KYXC_NHWK
|
||||
- Instantiations: 6
|
||||
- Files: 1
|
||||
• DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle
|
||||
- Instantiations: 570
|
||||
- Files: 9
|
||||
• DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3
|
||||
- Instantiations: 141
|
||||
- Files: 2
|
||||
• DeviceGroupedConvFwdMultipleD_Wmma_CShuffle
|
||||
- Instantiations: 42
|
||||
- Files: 1
|
||||
• DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor
|
||||
- Instantiations: 12
|
||||
- Files: 1
|
||||
|
||||
====================================================================================================
|
||||
|
||||
====================================================================================================
|
||||
DEVICE OPERATION: DeviceConv2dFwdXdl_C_Shuffle_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K
|
||||
====================================================================================================
|
||||
|
||||
Total Template Instantiations: 61
|
||||
Number of Files: 1
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
File: library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_c_shuffle_nhwc_kyxc_nhwk_f16_instance.cpp
|
||||
Instantiations in this file: 61
|
||||
|
||||
Sample instantiations:
|
||||
[1] Line 49:
|
||||
DeviceConv2dFwdXdl_C_Shuffle_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K< F16, F16, F16, F32, PassThrough, PassThrough, PassThrough, ConvFwdDefault, 256, 256, 128, 4, 8, 32, 32, 4, 2, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, 1, 1, S<1, 1, 32, 1, 1, 8>, ...
|
||||
[2] Line 50:
|
||||
DeviceConv2dFwdXdl_C_Shuffle_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K< F16, F16, F16, F32, PassThrough, PassThrough, PassThrough, ConvFwdDefault, 256, 128, 256, 4, 8, 32, 32, 2, 4, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, 1, 1, S<1, 1, 32, 1, 1, 8>, ...
|
||||
[3] Line 51:
|
||||
DeviceConv2dFwdXdl_C_Shuffle_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K< F16, F16, F16, F32, PassThrough, PassThrough, PassThrough, ConvFwdDefault, 128, 128, 128, 4, 8, 32, 32, 4, 2, S<4, 32, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, S<4, 32, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, 1, 1, S<1, 1, 16, 1, 1, 8>, ...
|
||||
... and 58 more instantiations
|
||||
|
||||
|
||||
====================================================================================================
|
||||
DEVICE OPERATION: DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K
|
||||
====================================================================================================
|
||||
|
||||
Total Template Instantiations: 156
|
||||
Number of Files: 4
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
File: library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_bf16_instance.cpp
|
||||
Instantiations in this file: 39
|
||||
|
||||
Sample instantiations:
|
||||
[1] Line 45:
|
||||
DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K< BF16, BF16, BF16, F32, PassThrough, PassThrough, PassThrough, ConvFwdDefault, 256, 256, 128, 4, 8, 32, 32, 4, 2, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, 7, 1>,
|
||||
[2] Line 46:
|
||||
DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K< BF16, BF16, BF16, F32, PassThrough, PassThrough, PassThrough, ConvFwdDefault, 256, 128, 256, 4, 8, 32, 32, 2, 4, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, 7, 1>,
|
||||
[3] Line 47:
|
||||
DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K< BF16, BF16, BF16, F32, PassThrough, PassThrough, PassThrough, ConvFwdDefault, 128, 128, 128, 4, 8, 32, 32, 4, 2, S<4, 32, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, S<4, 32, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, 7, 1>,
|
||||
... and 36 more instantiations
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
File: library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f16_instance.cpp
|
||||
Instantiations in this file: 39
|
||||
|
||||
Sample instantiations:
|
||||
[1] Line 45:
|
||||
DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K< F16, F16, F16, F32, PassThrough, PassThrough, PassThrough, ConvFwdDefault, 256, 256, 128, 4, 8, 32, 32, 4, 2, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, 7, 1>,
|
||||
[2] Line 46:
|
||||
DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K< F16, F16, F16, F32, PassThrough, PassThrough, PassThrough, ConvFwdDefault, 256, 128, 256, 4, 8, 32, 32, 2, 4, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, 7, 1>,
|
||||
[3] Line 47:
|
||||
DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K< F16, F16, F16, F32, PassThrough, PassThrough, PassThrough, ConvFwdDefault, 128, 128, 128, 4, 8, 32, 32, 4, 2, S<4, 32, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, S<4, 32, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, 7, 1>,
|
||||
... and 36 more instantiations
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
File: library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f32_instance.cpp
|
||||
Instantiations in this file: 39
|
||||
|
||||
Sample instantiations:
|
||||
[1] Line 44:
|
||||
DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K< F32, F32, F32, F32, PassThrough, PassThrough, PassThrough, ConvFwdDefault, 256, 256, 128, 4, 4, 32, 32, 4, 2, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 4, 4, true, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 4, 4, true, 7, 1>,
|
||||
[2] Line 45:
|
||||
DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K< F32, F32, F32, F32, PassThrough, PassThrough, PassThrough, ConvFwdDefault, 256, 128, 256, 4, 4, 32, 32, 2, 4, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 4, 4, true, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 4, 4, true, 7, 1>,
|
||||
[3] Line 46:
|
||||
DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K< F32, F32, F32, F32, PassThrough, PassThrough, PassThrough, ConvFwdDefault, 128, 128, 128, 4, 4, 32, 32, 4, 2, S<4, 32, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 4, 4, true, S<4, 32, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 4, 4, true, 7, 1>,
|
||||
... and 36 more instantiations
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
File: library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_int8_instance.cpp
|
||||
Instantiations in this file: 39
|
||||
|
||||
Sample instantiations:
|
||||
[1] Line 42:
|
||||
DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K< int8_t, int8_t, int8_t, int32_t, PassThrough, PassThrough, PassThrough, ConvFwdDefault, 256, 256, 128, 4, 16, 32, 32, 4, 2, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 16, 16, true, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 16, 16, true, 7, 1>,
|
||||
[2] Line 43:
|
||||
DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K< int8_t, int8_t, int8_t, int32_t, PassThrough, PassThrough, PassThrough, ConvFwdDefault, 256, 128, 256, 4, 16, 32, 32, 2, 4, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 16, 16, true, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 16, 16, true, 7, 1>,
|
||||
[3] Line 44:
|
||||
DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K< int8_t, int8_t, int8_t, int32_t, PassThrough, PassThrough, PassThrough, ConvFwdDefault, 128, 128, 128, 4, 16, 32, 32, 4, 2, S<4, 32, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 16, 16, true, S<4, 32, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 16, 16, true, 7, 1>,
|
||||
... and 36 more instantiations
|
||||
|
||||
|
||||
====================================================================================================
|
||||
DEVICE OPERATION: DeviceConvFwd
|
||||
====================================================================================================
|
||||
|
||||
Total Template Instantiations: 12
|
||||
Number of Files: 6
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
File: library/include/ck/library/tensor_operation_instance/gpu/convolution_forward.hpp
|
||||
Instantiations in this file: 7
|
||||
|
||||
Sample instantiations:
|
||||
[1] Line 24:
|
||||
DeviceConvFwd<2, NHWC, KYXC, NHWK, F16, F16, F16, PassThrough, PassThrough, PassThrough>>>&
|
||||
[2] Line 28:
|
||||
DeviceConvFwd<2, NHWC, KYXC, NHWK, F16, F16, F16, PassThrough, PassThrough, PassThrough>>>&
|
||||
[3] Line 33:
|
||||
std::vector<std::unique_ptr<DeviceConvFwd<2,
|
||||
NHWC,
|
||||
KYXC,
|
||||
NHWK,
|
||||
BF16,
|
||||
BF16,
|
||||
BF16,
|
||||
PassThrough,
|
||||
PassThrough,
|
||||
...
|
||||
... and 4 more instantiations
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
File: library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_c_shuffle_nhwc_kyxc_nhwk_f16_instance.cpp
|
||||
Instantiations in this file: 1
|
||||
|
||||
Sample instantiations:
|
||||
[1] Line 180:
|
||||
DeviceConvFwd<2, NHWC, KYXC, NHWK, F16, F16, F16, PassThrough, PassThrough, PassThrough>>>&
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
File: library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_bf16_instance.cpp
|
||||
Instantiations in this file: 1
|
||||
|
||||
Sample instantiations:
|
||||
[1] Line 106:
|
||||
std::vector<std::unique_ptr<DeviceConvFwd<2,
|
||||
NHWC,
|
||||
KYXC,
|
||||
NHWK,
|
||||
BF16,
|
||||
BF16,
|
||||
BF16,
|
||||
PassThrough,
|
||||
PassThrough,
|
||||
...
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
File: library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f16_instance.cpp
|
||||
Instantiations in this file: 1
|
||||
|
||||
Sample instantiations:
|
||||
[1] Line 107:
|
||||
DeviceConvFwd<2, NHWC, KYXC, NHWK, F16, F16, F16, PassThrough, PassThrough, PassThrough>>>&
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
File: library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f32_instance.cpp
|
||||
Instantiations in this file: 1
|
||||
|
||||
Sample instantiations:
|
||||
[1] Line 106:
|
||||
DeviceConvFwd<2, NHWC, KYXC, NHWK, F32, F32, F32, PassThrough, PassThrough, PassThrough>>>&
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
File: library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_int8_instance.cpp
|
||||
Instantiations in this file: 1
|
||||
|
||||
Sample instantiations:
|
||||
[1] Line 103:
|
||||
std::vector<std::unique_ptr<DeviceConvFwd<2,
|
||||
NHWC,
|
||||
KYXC,
|
||||
NHWK,
|
||||
int8_t,
|
||||
int8_t,
|
||||
int8_t,
|
||||
PassThrough,
|
||||
PassThrough,
|
||||
...
|
||||
|
||||
|
||||
====================================================================================================
|
||||
DEVICE OPERATION: DeviceGroupedConvFwdDlMultipleD_NHWC_KYXC_NHWK
|
||||
====================================================================================================
|
||||
|
||||
Total Template Instantiations: 6
|
||||
Number of Files: 1
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
File: library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_dl_instance.hpp
|
||||
Instantiations in this file: 6
|
||||
|
||||
Sample instantiations:
|
||||
[1] Line 57:
|
||||
DeviceGroupedConvFwdDlMultipleD_NHWC_KYXC_NHWK< 2, F16, F16, DsDatatype, F16, F32, InLayout, WeiLayout, DsLayout, OutLayout, PassThrough, PassThrough, CDEElementOp, ConvSpec, GemmMNKPadding, 8, 16, 4, 2, 1, 1, 2, 1, S<4, 2>, S<1, 1>, S<2, 1, 2, 1>, S<1, 1, 8, 1>, S<1, 2, 0, 3>, S<1, 2, 0, 3>, S<1, 1, 1, 1>, S<1, 2, 0, 3>, S<1, 1, 1, 1>, S<1, 1, 1, 1>, S<2, 1, 4, 1>,...
|
||||
[2] Line 58:
|
||||
DeviceGroupedConvFwdDlMultipleD_NHWC_KYXC_NHWK< 2, F16, F16, DsDatatype, F16, F32, InLayout, WeiLayout, DsLayout, OutLayout, PassThrough, PassThrough, CDEElementOp, ConvSpec, GemmMNKPadding, 256, 128, 128, 16, 1, 4, 4, 1, S<8, 2>, S<8, 2>, S<8, 1, 1, 1>, S<2, 1, 128, 1>, S<1, 2, 0, 3>, S<1, 2, 0, 3>, S<1, 1, 1, 1>, S<1, 2, 0, 3>, S<1, 1, 1, 1>, S<8, 1, 1, 1>, S<2, 1, 128, 1>,...
|
||||
[3] Line 60:
|
||||
DeviceGroupedConvFwdDlMultipleD_NHWC_KYXC_NHWK< 2, F16, F16, DsDatatype, F16, F32, InLayout, WeiLayout, DsLayout, OutLayout, PassThrough, PassThrough, CDEElementOp, ConvSpec, GemmMNKPadding, 256, 128, 128, 16, 2, 4, 4, 1, S<8, 2>, S<8, 2>, S<8, 1, 1, 2>, S<2, 1, 128, 1>, S<1, 2, 0, 3>, S<1, 2, 0, 3>, S<4, 1, 1, 2>, S<1, 2, 0, 3>, S<1, 1, 1, 2>, S<8, 1, 1, 2>, S<2, 1, 128, 1>,...
|
||||
... and 3 more instantiations
|
||||
|
||||
|
||||
====================================================================================================
|
||||
DEVICE OPERATION: DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle
|
||||
====================================================================================================
|
||||
|
||||
Total Template Instantiations: 570
|
||||
Number of Files: 9
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
File: library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_bilinear_instance.hpp
|
||||
Instantiations in this file: 80
|
||||
|
||||
Sample instantiations:
|
||||
[1] Line 54:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, Tuple<BF16>, BF16, PassThrough, PassThrough, Bilinear, ConvSpec, GemmMNKPadding, 1, 64, 64, 64, 32, 8, 8, 32, 32, 2, 2, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, ...
|
||||
[2] Line 56:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, Tuple<BF16>, BF16, PassThrough, PassThrough, Bilinear, ConvSpec, GemmMNKPadding, 1, 64, 64, 32, 32, 8, 8, 32, 32, 2, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, ...
|
||||
[3] Line 57:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, Tuple<BF16>, BF16, PassThrough, PassThrough, Bilinear, ConvSpec, GemmMNKPadding, 1, 256, 128, 128, 32, 8, 8, 32, 32, 2, 2, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, ...
|
||||
... and 77 more instantiations
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
File: library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_binary_outelementop_instance.hpp
|
||||
Instantiations in this file: 16
|
||||
|
||||
Sample instantiations:
|
||||
[1] Line 53:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, F8, F8, F32, F32, Tuple<F32>, F8, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 1, 64, 64, 64, 32, 8, 8, 32, 32, 2, 2, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, ...
|
||||
[2] Line 55:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, F8, F8, F32, F32, Tuple<F32>, F8, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 1, 64, 64, 32, 32, 8, 8, 32, 32, 2, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, ...
|
||||
[3] Line 56:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, F8, F8, F32, F32, Tuple<F32>, F8, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 1, 256, 128, 128, 32, 8, 8, 32, 32, 2, 2, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, ...
|
||||
... and 13 more instantiations
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
File: library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_dynamic_op_instance.hpp
|
||||
Instantiations in this file: 64
|
||||
|
||||
Sample instantiations:
|
||||
[1] Line 54:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, Tuple<>, BF16, PassThrough, PassThrough, DynamicUnaryOp, ConvSpec, GemmMNKPadding, 1, 64, 64, 64, 32, 8, 8, 32, 32, 2, 2, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8,...
|
||||
[2] Line 56:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, Tuple<>, BF16, PassThrough, PassThrough, DynamicUnaryOp, ConvSpec, GemmMNKPadding, 1, 64, 64, 32, 32, 8, 8, 32, 32, 2, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8,...
|
||||
[3] Line 57:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, Tuple<>, BF16, PassThrough, PassThrough, DynamicUnaryOp, ConvSpec, GemmMNKPadding, 1, 256, 128, 128, 32, 8, 8, 32, 32, 2, 2, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8,...
|
||||
... and 61 more instantiations
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
File: library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_instance.hpp
|
||||
Instantiations in this file: 193
|
||||
|
||||
Sample instantiations:
|
||||
[1] Line 68:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, DsDataTypes, BF16, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 1, 64, 64, 64, 32, 8, 8, 32, 32, 2, 2, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, ...
|
||||
[2] Line 87:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, DsDataTypes, BF16, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 1, 64, 64, 64, 32, 8, 8, 32, 32, 2, 2, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, ...
|
||||
[3] Line 89:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, DsDataTypes, BF16, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 1, 64, 64, 32, 32, 8, 8, 32, 32, 2, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, ...
|
||||
... and 190 more instantiations
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
File: library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_merged_groups_instance.hpp
|
||||
Instantiations in this file: 21
|
||||
|
||||
Sample instantiations:
|
||||
[1] Line 55:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, DsDataTypes, BF16, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 1, 64, 64, 16, 16, 4, 4, 16, 16, 4, 1, S< 4, 16, 1>, S<0, 2, 1>, S<0, 2, 1>, 1, 4, 4, 1, S< 4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 4, 1...
|
||||
[2] Line 56:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, DsDataTypes, BF16, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 1, 64, 64, 16, 16, 4, 4, 16, 16, 4, 1, S< 4, 16, 1>, S<0, 2, 1>, S<0, 2, 1>, 1, 4, 4, 1, S< 4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 4, 1...
|
||||
[3] Line 57:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, DsDataTypes, BF16, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 1, 64, 64, 16, 16, 4, 4, 16, 16, 4, 1, S< 4, 16, 1>, S<0, 2, 1>, S<0, 2, 1>, 1, 4, 4, 1, S< 4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 4, 1...
|
||||
... and 18 more instantiations
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
File: library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_outelementop_instance.hpp
|
||||
Instantiations in this file: 80
|
||||
|
||||
Sample instantiations:
|
||||
[1] Line 54:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, F8, F8, F32, F32, Tuple<>, F8, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 1, 64, 64, 64, 32, 8, 8, 32, 32, 2, 2, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, ...
|
||||
[2] Line 56:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, F8, F8, F32, F32, Tuple<>, F8, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 1, 64, 64, 32, 32, 8, 8, 32, 32, 2, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, ...
|
||||
[3] Line 57:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, F8, F8, F32, F32, Tuple<>, F8, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 1, 256, 128, 128, 32, 8, 8, 32, 32, 2, 2, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, ...
|
||||
... and 77 more instantiations
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
File: library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_scale_instance.hpp
|
||||
Instantiations in this file: 80
|
||||
|
||||
Sample instantiations:
|
||||
[1] Line 54:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, Tuple<>, BF16, PassThrough, PassThrough, Scale, ConvSpec, GemmMNKPadding, 1, 64, 64, 64, 32, 8, 8, 32, 32, 2, 2, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, ...
|
||||
[2] Line 56:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, Tuple<>, BF16, PassThrough, PassThrough, Scale, ConvSpec, GemmMNKPadding, 1, 64, 64, 32, 32, 8, 8, 32, 32, 2, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, ...
|
||||
[3] Line 57:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, Tuple<>, BF16, PassThrough, PassThrough, Scale, ConvSpec, GemmMNKPadding, 1, 256, 128, 128, 32, 8, 8, 32, 32, 2, 2, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, ...
|
||||
... and 77 more instantiations
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
File: library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_scaleadd_ab_instance.hpp
|
||||
Instantiations in this file: 20
|
||||
|
||||
Sample instantiations:
|
||||
[1] Line 50:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, ck::Tuple<>,ELayout, ck::Tuple<BF16, BF16>, ck::Tuple<BF16, BF16>, F32, BF16, ck::Tuple<>, BF16, ScaleAdd, ScaleAdd, PassThrough, ConvSpec, GemmMNKPadding, 1, 64, 64, 64, 32, 8, 8, 32, 32, 2, 2, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 1, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, ...
|
||||
[2] Line 52:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, ck::Tuple<>,ELayout, ck::Tuple<BF16, BF16>, ck::Tuple<BF16, BF16>, F32, BF16, ck::Tuple<>, BF16, ScaleAdd, ScaleAdd, PassThrough, ConvSpec, GemmMNKPadding, 1, 64, 64, 32, 32, 8, 8, 32, 32, 2, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, ...
|
||||
[3] Line 53:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, ck::Tuple<>,ELayout, ck::Tuple<BF16, BF16>, ck::Tuple<BF16, BF16>, F32, BF16, ck::Tuple<>, BF16, ScaleAdd, ScaleAdd, PassThrough, ConvSpec, GemmMNKPadding, 1, 256, 128, 128, 32, 8, 8, 32, 32, 2, 2, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 1, 1, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, ...
|
||||
... and 17 more instantiations
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
File: library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_scaleadd_scaleadd_relu_instance.hpp
|
||||
Instantiations in this file: 16
|
||||
|
||||
Sample instantiations:
|
||||
[1] Line 53:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, ck::Tuple<BF16, BF16>, BF16, PassThrough, PassThrough, ScaleAddScaleAddRelu, ConvSpec, GemmMNKPadding, 1, 64, 64, 64, 32, 8, 8, 32, 32, 2, 2, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, ...
|
||||
[2] Line 55:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, ck::Tuple<BF16, BF16>, BF16, PassThrough, PassThrough, ScaleAddScaleAddRelu, ConvSpec, GemmMNKPadding, 1, 64, 64, 32, 32, 8, 8, 32, 32, 2, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, ...
|
||||
[3] Line 56:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, ck::Tuple<BF16, BF16>, BF16, PassThrough, PassThrough, ScaleAddScaleAddRelu, ConvSpec, GemmMNKPadding, 1, 256, 128, 128, 32, 8, 8, 32, 32, 2, 2, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, ...
|
||||
... and 13 more instantiations
|
||||
|
||||
|
||||
====================================================================================================
|
||||
DEVICE OPERATION: DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3
|
||||
====================================================================================================
|
||||
|
||||
Total Template Instantiations: 141
|
||||
Number of Files: 2
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
File: library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_comp_instance.hpp
|
||||
Instantiations in this file: 49
|
||||
|
||||
Sample instantiations:
|
||||
[1] Line 71:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, DsDataTypes, BF16, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 256, 128, 128, 64, 16, 16, 32, 32, 2, 2, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 1, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, ...
|
||||
[2] Line 90:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, DsDataTypes, BF16, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 256, 256, 256, 32, 8, 8, 32, 32, 4, 4, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 0, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, ...
|
||||
[3] Line 91:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, DsDataTypes, BF16, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 256, 128, 128, 64, 8, 8, 32, 32, 2, 2, S<8, 32, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 0, S<8, 32, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, ...
|
||||
... and 46 more instantiations
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
File: library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_mem_instance.hpp
|
||||
Instantiations in this file: 92
|
||||
|
||||
Sample instantiations:
|
||||
[1] Line 69:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, DsDataTypes, BF16, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 128, 32, 16, 64, 8, 8, 16, 16, 1, 1, S<8, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 0, S<8, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, ...
|
||||
[2] Line 70:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, DsDataTypes, BF16, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 64, 16, 16, 128, 8, 8, 16, 16, 1, 1, S<16, 4, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 0, S<16, 4, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, ...
|
||||
[3] Line 71:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, DsDataTypes, BF16, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 64, 16, 16, 64, 8, 8, 16, 16, 1, 1, S<8, 8, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 0, S<8, 8, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, ...
|
||||
... and 89 more instantiations
|
||||
|
||||
|
||||
====================================================================================================
|
||||
DEVICE OPERATION: DeviceGroupedConvFwdMultipleD_Wmma_CShuffle
|
||||
====================================================================================================
|
||||
|
||||
Total Template Instantiations: 42
|
||||
Number of Files: 1
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
File: library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_wmma_instance.hpp
|
||||
Instantiations in this file: 42
|
||||
|
||||
Sample instantiations:
|
||||
[1] Line 62:
|
||||
DeviceGroupedConvFwdMultipleD_Wmma_CShuffle<NDSpatial, ALayout, BLayout, DsLayout, ELayout, F16, F16, F32, F16, DsDatatype, F16, PassThrough, PassThrough, CDEElementOp, ConvSpec, GemmMNKPadding, 1, 128, 64, 64, 32, 8, 16, 16, 2, 2, S<4, 32, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1, S<4, 32, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, ...
|
||||
[2] Line 64:
|
||||
DeviceGroupedConvFwdMultipleD_Wmma_CShuffle<NDSpatial, ALayout, BLayout, DsLayout, ELayout, F16, F16, F32, F16, DsDatatype, F16, PassThrough, PassThrough, CDEElementOp, ConvSpec, GemmMNKPadding, 1, 256, 128, 128, 32, 8, 16, 16, 4, 2, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 1, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, ...
|
||||
[3] Line 65:
|
||||
DeviceGroupedConvFwdMultipleD_Wmma_CShuffle<NDSpatial, ALayout, BLayout, DsLayout, ELayout, F16, F16, F32, F16, DsDatatype, F16, PassThrough, PassThrough, CDEElementOp, ConvSpec, GemmMNKPadding, 1, 256, 64, 256, 32, 8, 16, 16, 2, 4, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 1, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, ...
|
||||
... and 39 more instantiations
|
||||
|
||||
|
||||
====================================================================================================
|
||||
DEVICE OPERATION: DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor
|
||||
====================================================================================================
|
||||
|
||||
Total Template Instantiations: 12
|
||||
Number of Files: 1
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
File: library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_large_tensor_instance.hpp
|
||||
Instantiations in this file: 12
|
||||
|
||||
Sample instantiations:
|
||||
[1] Line 53:
|
||||
DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, DsDataTypes, BF16, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 1, 64, 64, 64, 32, 8, 8, 32, 32, 2, 2, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, ...
|
||||
[2] Line 55:
|
||||
DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, DsDataTypes, BF16, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 1, 256, 256, 128, 32, 8, 8, 32, 32, 4, 2, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 2, 8, 1, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 2, ...
|
||||
[3] Line 56:
|
||||
DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, DsDataTypes, BF16, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 1, 256, 256, 128, 32, 8, 8, 32, 32, 4, 2, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 1, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, ...
|
||||
... and 9 more instantiations
|
||||
|
||||
285
forward_convolution_device_ops_data.json
Normal file
285
forward_convolution_device_ops_data.json
Normal file
@@ -0,0 +1,285 @@
|
||||
{
|
||||
"summary": {
|
||||
"total_device_operations": 9,
|
||||
"device_operation_names": [
|
||||
"DeviceConv2dFwdXdl_C_Shuffle_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K",
|
||||
"DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K",
|
||||
"DeviceConvFwd",
|
||||
"DeviceGroupedConvFwdDlMultipleD_NHWC_KYXC_NHWK",
|
||||
"DeviceGroupedConvFwdMultipleABD",
|
||||
"DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle",
|
||||
"DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3",
|
||||
"DeviceGroupedConvFwdMultipleD_Wmma_CShuffle",
|
||||
"DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor"
|
||||
]
|
||||
},
|
||||
"details": {
|
||||
"DeviceConv2dFwdXdl_C_Shuffle_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K": {
|
||||
"files": [
|
||||
"library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_c_shuffle_nhwc_kyxc_nhwk_f16_instance.cpp"
|
||||
],
|
||||
"total_instantiations": 61
|
||||
},
|
||||
"DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K": {
|
||||
"files": [
|
||||
"library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_bf16_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f16_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f32_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_int8_instance.cpp"
|
||||
],
|
||||
"total_instantiations": 156
|
||||
},
|
||||
"DeviceConvFwd": {
|
||||
"files": [
|
||||
"library/include/ck/library/tensor_operation_instance/gpu/convolution_forward.hpp",
|
||||
"library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_c_shuffle_nhwc_kyxc_nhwk_f16_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_bf16_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f16_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f32_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_int8_instance.cpp"
|
||||
],
|
||||
"total_instantiations": 12
|
||||
},
|
||||
"DeviceGroupedConvFwdDlMultipleD_NHWC_KYXC_NHWK": {
|
||||
"files": [
|
||||
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_dl_instance.hpp"
|
||||
],
|
||||
"total_instantiations": 6
|
||||
},
|
||||
"DeviceGroupedConvFwdMultipleABD": {
|
||||
"files": [
|
||||
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_bilinear_instance.hpp",
|
||||
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_binary_outelementop_instance.hpp",
|
||||
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_comp_instance.hpp",
|
||||
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_dynamic_op_instance.hpp",
|
||||
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_instance.hpp",
|
||||
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_mem_instance.hpp",
|
||||
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_merged_groups_instance.hpp",
|
||||
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_outelementop_instance.hpp",
|
||||
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_scale_instance.hpp",
|
||||
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_scaleadd_ab_instance.hpp",
|
||||
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_scaleadd_scaleadd_relu_instance.hpp",
|
||||
"library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward.hpp",
|
||||
"library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_bias_bnorm_clamp.hpp",
|
||||
"library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_bias_clamp.hpp",
|
||||
"library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_bilinear.hpp",
|
||||
"library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_clamp.hpp",
|
||||
"library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_convinvscale.hpp",
|
||||
"library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_convscale.hpp",
|
||||
"library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_convscale_add.hpp",
|
||||
"library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_convscale_relu.hpp",
|
||||
"library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_dynamic_op.hpp",
|
||||
"library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_scale.hpp",
|
||||
"library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_scaleadd_ab.hpp",
|
||||
"library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_scaleadd_scaleadd_relu.hpp",
|
||||
"library/include/ck/library/tensor_operation_instance/gpu/quantization/grouped_convolution_bias_forward_perchannel_quantization.hpp",
|
||||
"library/include/ck/library/tensor_operation_instance/gpu/quantization/grouped_convolution_bias_forward_perlayer_quantization.hpp",
|
||||
"library/include/ck/library/tensor_operation_instance/gpu/quantization/grouped_convolution_forward_perchannel_quantization.hpp",
|
||||
"library/include/ck/library/tensor_operation_instance/gpu/quantization/grouped_convolution_forward_perlayer_quantization.hpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv1d_fwd/xdl/device_grouped_conv1d_fwd_xdl_gnwc_gkxc_gnwk_bf16_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv1d_fwd/xdl/device_grouped_conv1d_fwd_xdl_gnwc_gkxc_gnwk_f16_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv1d_fwd/xdl/device_grouped_conv1d_fwd_xdl_gnwc_gkxc_gnwk_f32_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv1d_fwd/xdl/device_grouped_conv1d_fwd_xdl_gnwc_gkxc_gnwk_int8_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/dl/device_grouped_conv2d_fwd_dl_gnhwc_gkyxc_gnhwk_f16_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/dl/device_grouped_conv2d_fwd_dl_gnhwc_gkyxc_gnhwk_f32_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/dl/device_grouped_conv2d_fwd_dl_nhwgc_gkyxc_nhwgk_f16_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/dl/device_grouped_conv2d_fwd_dl_nhwgc_gkyxc_nhwgk_f32_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_gnhwc_gkyxc_gnhwk_f16_1x1p0_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_gnhwc_gkyxc_gnhwk_f16_1x1s1p0_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_gnhwc_gkyxc_gnhwk_f16_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_gnhwc_gkyxc_gnhwk_f16_oddc_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_gnhwc_gkyxc_gnhwk_i8_1x1p0_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_gnhwc_gkyxc_gnhwk_i8_1x1s1p0_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_gnhwc_gkyxc_gnhwk_i8_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_gnhwc_gkyxc_gnhwk_i8_oddc_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_nhwgc_gkyxc_nhwgk_f16_1x1p0_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_nhwgc_gkyxc_nhwgk_f16_1x1s1p0_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_nhwgc_gkyxc_nhwgk_f16_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_nhwgc_gkyxc_nhwgk_f16_oddc_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_nhwgc_gkyxc_nhwgk_i8_1x1p0_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_nhwgc_gkyxc_nhwgk_i8_1x1s1p0_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_nhwgc_gkyxc_nhwgk_i8_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_nhwgc_gkyxc_nhwgk_i8_oddc_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_bf16_comp_2x_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_bf16_comp_instance.in",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_bf16_comp_part2_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f16_comp_2x_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f16_comp_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f16_comp_part2_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f32_comp_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_bf16_comp_2x_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_bf16_comp_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_bf16_comp_part2_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f16_comp_2x_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f16_comp_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f16_comp_part2_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_comp_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_tf32_comp_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_int8_comp_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_gnhwc_gkyxc_gnhwk_bf16_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_gnhwc_gkyxc_gnhwk_f16_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_gnhwc_gkyxc_gnhwk_f32_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_bf16_16x16_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_bf16_instance.in",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f16_16x16_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f16_instance.in",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f32_16x16_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f32_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_ngchw_gkyxc_ngkhw_bf16_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_ngchw_gkyxc_ngkhw_f16_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_ngchw_gkyxc_ngkhw_f32_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_ngchw_gkyxc_ngkhw_int8_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_bf16_16x16_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_bf16_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f16_16x16_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f16_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_16x16_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_tf32_16x16_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_tf32_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_int8_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/large_tensor/device_grouped_conv2d_fwd_xdl_large_tensor_nhwgc_gkyxc_nhwgk_bf16_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/large_tensor/device_grouped_conv2d_fwd_xdl_large_tensor_nhwgc_gkyxc_nhwgk_f16_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/large_tensor/device_grouped_conv2d_fwd_xdl_large_tensor_nhwgc_gkyxc_nhwgk_f32_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/large_tensor/device_grouped_conv2d_fwd_xdl_large_tensor_nhwgc_gkyxc_nhwgk_f32_tf32_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/large_tensor/device_grouped_conv2d_fwd_xdl_large_tensor_nhwgc_gkyxc_nhwgk_int8_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_bf16_mem_inter_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_bf16_mem_intra_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f16_mem_inter_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f16_mem_intra_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f32_mem_inter_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f32_mem_intra_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_bf16_mem_inter_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_bf16_mem_intra_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f16_mem_inter_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f16_mem_intra_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_mem_inter_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_mem_intra_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_tf32_mem_inter_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_tf32_mem_intra_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_int8_mem_inter_instance.in",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_int8_mem_intra_instance.in",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/merged_groups/device_grouped_conv2d_fwd_xdl_merged_groups_ngchw_gkcyx_ngkhw_bf16_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/merged_groups/device_grouped_conv2d_fwd_xdl_merged_groups_ngchw_gkcyx_ngkhw_f16_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/merged_groups/device_grouped_conv2d_fwd_xdl_merged_groups_ngchw_gkcyx_ngkhw_f32_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/merged_groups/device_grouped_conv2d_fwd_xdl_merged_groups_nhwgc_gkyxc_nhwgk_bf16_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/merged_groups/device_grouped_conv2d_fwd_xdl_merged_groups_nhwgc_gkyxc_nhwgk_f16_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/merged_groups/device_grouped_conv2d_fwd_xdl_merged_groups_nhwgc_gkyxc_nhwgk_f32_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/merged_groups/device_grouped_conv2d_fwd_xdl_merged_groups_nhwgc_gkyxc_nhwgk_f32_tf32_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/merged_groups/device_grouped_conv2d_fwd_xdl_merged_groups_nhwgc_gkyxc_nhwgk_int8_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_gndhwc_gkzyxc_gndhwk_f16_1x1p0_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_gndhwc_gkzyxc_gndhwk_f16_1x1s1p0_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_gndhwc_gkzyxc_gndhwk_f16_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_gndhwc_gkzyxc_gndhwk_f16_oddc_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_gndhwc_gkzyxc_gndhwk_i8_1x1p0_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_gndhwc_gkzyxc_gndhwk_i8_1x1s1p0_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_gndhwc_gkzyxc_gndhwk_i8_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_gndhwc_gkzyxc_gndhwk_i8_oddc_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_ndhwgc_gkzyxc_ndhwgk_f16_1x1p0_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_ndhwgc_gkzyxc_ndhwgk_f16_1x1s1p0_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_ndhwgc_gkzyxc_ndhwgk_f16_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_ndhwgc_gkzyxc_ndhwgk_f16_oddc_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_ndhwgc_gkzyxc_ndhwgk_i8_1x1p0_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_ndhwgc_gkzyxc_ndhwgk_i8_1x1s1p0_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_ndhwgc_gkzyxc_ndhwgk_i8_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_ndhwgc_gkzyxc_ndhwgk_i8_oddc_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_bf16_comp_instance.in",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f16_comp_instance.in",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f32_comp_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f32_tf32_comp_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_bf16_comp_2x_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_bf16_comp_instance.in",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_bf16_comp_part2_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f16_comp_2x_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f16_comp_instance.in",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f16_comp_part2_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f32_comp_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_gndhwc_gkzyxc_gndhwk_bf16_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_gndhwc_gkzyxc_gndhwk_f16_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_gndhwc_gkzyxc_gndhwk_f32_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_gndhwc_gkzyxc_gndhwk_int8_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_bf16_16x16_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_bf16_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_bf8_fp8_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_bf8_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f16_16x16_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f16_comp_fp8_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f16_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f32_16x16_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f32_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f32_tf32_16x16_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f32_tf32_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_fp8_bf8_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_fp8_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_int8_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_bf16_16x16_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_bf16_instance.in",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f16_16x16_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f16_instance.in",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f32_16x16_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f32_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/large_tensor/device_grouped_conv3d_fwd_xdl_large_tensor_ndhwgc_gkzyxc_ndhwgk_bf16_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/large_tensor/device_grouped_conv3d_fwd_xdl_large_tensor_ndhwgc_gkzyxc_ndhwgk_f16_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/large_tensor/device_grouped_conv3d_fwd_xdl_large_tensor_ndhwgc_gkzyxc_ndhwgk_f32_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/large_tensor/device_grouped_conv3d_fwd_xdl_large_tensor_ndhwgc_gkzyxc_ndhwgk_f32_tf32_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_bf16_mem_inter_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_bf16_mem_intra_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f16_mem_inter_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f16_mem_intra_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f32_mem_inter_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f32_mem_intra_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f32_tf32_mem_inter_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f32_tf32_mem_intra_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_bf16_mem_inter_instance.in",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_bf16_mem_intra_instance.in",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f16_mem_inter_instance.in",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f16_mem_intra_instance.in",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f32_mem_inter_instance.in",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f32_mem_intra_instance.in",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/merged_groups/device_grouped_conv3d_fwd_xdl_merged_groups_ndhwgc_gkzyxc_ndhwgk_bf16_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/merged_groups/device_grouped_conv3d_fwd_xdl_merged_groups_ndhwgc_gkzyxc_ndhwgk_f16_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/merged_groups/device_grouped_conv3d_fwd_xdl_merged_groups_ndhwgc_gkzyxc_ndhwgk_f32_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/merged_groups/device_grouped_conv3d_fwd_xdl_merged_groups_ndhwgc_gkzyxc_ndhwgk_f32_tf32_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/merged_groups/device_grouped_conv3d_fwd_xdl_merged_groups_ngcdhw_gkczyx_ngkdhw_bf16_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/merged_groups/device_grouped_conv3d_fwd_xdl_merged_groups_ngcdhw_gkczyx_ngkdhw_f16_instance.cpp",
|
||||
"library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/merged_groups/device_grouped_conv3d_fwd_xdl_merged_groups_ngcdhw_gkczyx_ngkdhw_f32_instance.cpp"
|
||||
],
|
||||
"total_instantiations": 975
|
||||
},
|
||||
"DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle": {
|
||||
"files": [
|
||||
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_bilinear_instance.hpp",
|
||||
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_binary_outelementop_instance.hpp",
|
||||
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_comp_instance.hpp",
|
||||
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_dynamic_op_instance.hpp",
|
||||
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_instance.hpp",
|
||||
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_mem_instance.hpp",
|
||||
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_merged_groups_instance.hpp",
|
||||
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_outelementop_instance.hpp",
|
||||
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_scale_instance.hpp",
|
||||
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_scaleadd_ab_instance.hpp",
|
||||
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_scaleadd_scaleadd_relu_instance.hpp"
|
||||
],
|
||||
"total_instantiations": 711
|
||||
},
|
||||
"DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3": {
|
||||
"files": [
|
||||
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_comp_instance.hpp",
|
||||
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_mem_instance.hpp"
|
||||
],
|
||||
"total_instantiations": 141
|
||||
},
|
||||
"DeviceGroupedConvFwdMultipleD_Wmma_CShuffle": {
|
||||
"files": [
|
||||
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_wmma_instance.hpp"
|
||||
],
|
||||
"total_instantiations": 42
|
||||
},
|
||||
"DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor": {
|
||||
"files": [
|
||||
"library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_large_tensor_instance.hpp"
|
||||
],
|
||||
"total_instantiations": 12
|
||||
}
|
||||
}
|
||||
}
|
||||
290
forward_convolution_device_ops_report.txt
Normal file
290
forward_convolution_device_ops_report.txt
Normal file
@@ -0,0 +1,290 @@
|
||||
================================================================================
|
||||
FORWARD CONVOLUTION DEVICE OPERATIONS REPORT
|
||||
================================================================================
|
||||
|
||||
Total Unique Device Operations: 9
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
Device Operation: DeviceConv2dFwdXdl_C_Shuffle_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K
|
||||
Total Template Instantiations: 61
|
||||
Number of Files: 1
|
||||
|
||||
Files containing this device operation:
|
||||
- library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_c_shuffle_nhwc_kyxc_nhwk_f16_instance.cpp
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
Device Operation: DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K
|
||||
Total Template Instantiations: 156
|
||||
Number of Files: 4
|
||||
|
||||
Files containing this device operation:
|
||||
- library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_bf16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f32_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_int8_instance.cpp
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
Device Operation: DeviceConvFwd
|
||||
Total Template Instantiations: 12
|
||||
Number of Files: 6
|
||||
|
||||
Files containing this device operation:
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/convolution_forward.hpp
|
||||
- library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_c_shuffle_nhwc_kyxc_nhwk_f16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_bf16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f32_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_int8_instance.cpp
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
Device Operation: DeviceGroupedConvFwdDlMultipleD_NHWC_KYXC_NHWK
|
||||
Total Template Instantiations: 6
|
||||
Number of Files: 1
|
||||
|
||||
Files containing this device operation:
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_dl_instance.hpp
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
Device Operation: DeviceGroupedConvFwdMultipleABD
|
||||
Total Template Instantiations: 975
|
||||
Number of Files: 195
|
||||
|
||||
Files containing this device operation:
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_bilinear_instance.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_binary_outelementop_instance.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_comp_instance.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_dynamic_op_instance.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_instance.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_mem_instance.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_merged_groups_instance.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_outelementop_instance.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_scale_instance.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_scaleadd_ab_instance.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_scaleadd_scaleadd_relu_instance.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_bias_bnorm_clamp.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_bias_clamp.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_bilinear.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_clamp.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_convinvscale.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_convscale.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_convscale_add.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_convscale_relu.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_dynamic_op.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_scale.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_scaleadd_ab.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_scaleadd_scaleadd_relu.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/quantization/grouped_convolution_bias_forward_perchannel_quantization.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/quantization/grouped_convolution_bias_forward_perlayer_quantization.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/quantization/grouped_convolution_forward_perchannel_quantization.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/quantization/grouped_convolution_forward_perlayer_quantization.hpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv1d_fwd/xdl/device_grouped_conv1d_fwd_xdl_gnwc_gkxc_gnwk_bf16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv1d_fwd/xdl/device_grouped_conv1d_fwd_xdl_gnwc_gkxc_gnwk_f16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv1d_fwd/xdl/device_grouped_conv1d_fwd_xdl_gnwc_gkxc_gnwk_f32_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv1d_fwd/xdl/device_grouped_conv1d_fwd_xdl_gnwc_gkxc_gnwk_int8_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/dl/device_grouped_conv2d_fwd_dl_gnhwc_gkyxc_gnhwk_f16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/dl/device_grouped_conv2d_fwd_dl_gnhwc_gkyxc_gnhwk_f32_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/dl/device_grouped_conv2d_fwd_dl_nhwgc_gkyxc_nhwgk_f16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/dl/device_grouped_conv2d_fwd_dl_nhwgc_gkyxc_nhwgk_f32_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_gnhwc_gkyxc_gnhwk_f16_1x1p0_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_gnhwc_gkyxc_gnhwk_f16_1x1s1p0_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_gnhwc_gkyxc_gnhwk_f16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_gnhwc_gkyxc_gnhwk_f16_oddc_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_gnhwc_gkyxc_gnhwk_i8_1x1p0_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_gnhwc_gkyxc_gnhwk_i8_1x1s1p0_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_gnhwc_gkyxc_gnhwk_i8_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_gnhwc_gkyxc_gnhwk_i8_oddc_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_nhwgc_gkyxc_nhwgk_f16_1x1p0_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_nhwgc_gkyxc_nhwgk_f16_1x1s1p0_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_nhwgc_gkyxc_nhwgk_f16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_nhwgc_gkyxc_nhwgk_f16_oddc_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_nhwgc_gkyxc_nhwgk_i8_1x1p0_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_nhwgc_gkyxc_nhwgk_i8_1x1s1p0_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_nhwgc_gkyxc_nhwgk_i8_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_nhwgc_gkyxc_nhwgk_i8_oddc_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_bf16_comp_2x_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_bf16_comp_instance.in
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_bf16_comp_part2_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f16_comp_2x_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f16_comp_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f16_comp_part2_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f32_comp_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_bf16_comp_2x_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_bf16_comp_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_bf16_comp_part2_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f16_comp_2x_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f16_comp_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f16_comp_part2_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_comp_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_tf32_comp_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_int8_comp_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_gnhwc_gkyxc_gnhwk_bf16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_gnhwc_gkyxc_gnhwk_f16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_gnhwc_gkyxc_gnhwk_f32_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_bf16_16x16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_bf16_instance.in
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f16_16x16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f16_instance.in
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f32_16x16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f32_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_ngchw_gkyxc_ngkhw_bf16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_ngchw_gkyxc_ngkhw_f16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_ngchw_gkyxc_ngkhw_f32_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_ngchw_gkyxc_ngkhw_int8_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_bf16_16x16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_bf16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f16_16x16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_16x16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_tf32_16x16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_tf32_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_int8_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/large_tensor/device_grouped_conv2d_fwd_xdl_large_tensor_nhwgc_gkyxc_nhwgk_bf16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/large_tensor/device_grouped_conv2d_fwd_xdl_large_tensor_nhwgc_gkyxc_nhwgk_f16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/large_tensor/device_grouped_conv2d_fwd_xdl_large_tensor_nhwgc_gkyxc_nhwgk_f32_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/large_tensor/device_grouped_conv2d_fwd_xdl_large_tensor_nhwgc_gkyxc_nhwgk_f32_tf32_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/large_tensor/device_grouped_conv2d_fwd_xdl_large_tensor_nhwgc_gkyxc_nhwgk_int8_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_bf16_mem_inter_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_bf16_mem_intra_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f16_mem_inter_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f16_mem_intra_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f32_mem_inter_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f32_mem_intra_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_bf16_mem_inter_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_bf16_mem_intra_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f16_mem_inter_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f16_mem_intra_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_mem_inter_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_mem_intra_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_tf32_mem_inter_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_tf32_mem_intra_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_int8_mem_inter_instance.in
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_int8_mem_intra_instance.in
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/merged_groups/device_grouped_conv2d_fwd_xdl_merged_groups_ngchw_gkcyx_ngkhw_bf16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/merged_groups/device_grouped_conv2d_fwd_xdl_merged_groups_ngchw_gkcyx_ngkhw_f16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/merged_groups/device_grouped_conv2d_fwd_xdl_merged_groups_ngchw_gkcyx_ngkhw_f32_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/merged_groups/device_grouped_conv2d_fwd_xdl_merged_groups_nhwgc_gkyxc_nhwgk_bf16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/merged_groups/device_grouped_conv2d_fwd_xdl_merged_groups_nhwgc_gkyxc_nhwgk_f16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/merged_groups/device_grouped_conv2d_fwd_xdl_merged_groups_nhwgc_gkyxc_nhwgk_f32_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/merged_groups/device_grouped_conv2d_fwd_xdl_merged_groups_nhwgc_gkyxc_nhwgk_f32_tf32_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/merged_groups/device_grouped_conv2d_fwd_xdl_merged_groups_nhwgc_gkyxc_nhwgk_int8_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_gndhwc_gkzyxc_gndhwk_f16_1x1p0_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_gndhwc_gkzyxc_gndhwk_f16_1x1s1p0_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_gndhwc_gkzyxc_gndhwk_f16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_gndhwc_gkzyxc_gndhwk_f16_oddc_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_gndhwc_gkzyxc_gndhwk_i8_1x1p0_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_gndhwc_gkzyxc_gndhwk_i8_1x1s1p0_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_gndhwc_gkzyxc_gndhwk_i8_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_gndhwc_gkzyxc_gndhwk_i8_oddc_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_ndhwgc_gkzyxc_ndhwgk_f16_1x1p0_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_ndhwgc_gkzyxc_ndhwgk_f16_1x1s1p0_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_ndhwgc_gkzyxc_ndhwgk_f16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_ndhwgc_gkzyxc_ndhwgk_f16_oddc_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_ndhwgc_gkzyxc_ndhwgk_i8_1x1p0_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_ndhwgc_gkzyxc_ndhwgk_i8_1x1s1p0_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_ndhwgc_gkzyxc_ndhwgk_i8_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_ndhwgc_gkzyxc_ndhwgk_i8_oddc_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_bf16_comp_instance.in
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f16_comp_instance.in
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f32_comp_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f32_tf32_comp_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_bf16_comp_2x_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_bf16_comp_instance.in
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_bf16_comp_part2_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f16_comp_2x_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f16_comp_instance.in
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f16_comp_part2_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f32_comp_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_gndhwc_gkzyxc_gndhwk_bf16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_gndhwc_gkzyxc_gndhwk_f16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_gndhwc_gkzyxc_gndhwk_f32_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_gndhwc_gkzyxc_gndhwk_int8_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_bf16_16x16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_bf16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_bf8_fp8_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_bf8_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f16_16x16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f16_comp_fp8_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f32_16x16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f32_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f32_tf32_16x16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f32_tf32_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_fp8_bf8_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_fp8_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_int8_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_bf16_16x16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_bf16_instance.in
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f16_16x16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f16_instance.in
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f32_16x16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f32_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/large_tensor/device_grouped_conv3d_fwd_xdl_large_tensor_ndhwgc_gkzyxc_ndhwgk_bf16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/large_tensor/device_grouped_conv3d_fwd_xdl_large_tensor_ndhwgc_gkzyxc_ndhwgk_f16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/large_tensor/device_grouped_conv3d_fwd_xdl_large_tensor_ndhwgc_gkzyxc_ndhwgk_f32_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/large_tensor/device_grouped_conv3d_fwd_xdl_large_tensor_ndhwgc_gkzyxc_ndhwgk_f32_tf32_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_bf16_mem_inter_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_bf16_mem_intra_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f16_mem_inter_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f16_mem_intra_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f32_mem_inter_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f32_mem_intra_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f32_tf32_mem_inter_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f32_tf32_mem_intra_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_bf16_mem_inter_instance.in
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_bf16_mem_intra_instance.in
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f16_mem_inter_instance.in
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f16_mem_intra_instance.in
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f32_mem_inter_instance.in
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f32_mem_intra_instance.in
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/merged_groups/device_grouped_conv3d_fwd_xdl_merged_groups_ndhwgc_gkzyxc_ndhwgk_bf16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/merged_groups/device_grouped_conv3d_fwd_xdl_merged_groups_ndhwgc_gkzyxc_ndhwgk_f16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/merged_groups/device_grouped_conv3d_fwd_xdl_merged_groups_ndhwgc_gkzyxc_ndhwgk_f32_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/merged_groups/device_grouped_conv3d_fwd_xdl_merged_groups_ndhwgc_gkzyxc_ndhwgk_f32_tf32_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/merged_groups/device_grouped_conv3d_fwd_xdl_merged_groups_ngcdhw_gkczyx_ngkdhw_bf16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/merged_groups/device_grouped_conv3d_fwd_xdl_merged_groups_ngcdhw_gkczyx_ngkdhw_f16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/merged_groups/device_grouped_conv3d_fwd_xdl_merged_groups_ngcdhw_gkczyx_ngkdhw_f32_instance.cpp
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
Device Operation: DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle
|
||||
Total Template Instantiations: 711
|
||||
Number of Files: 11
|
||||
|
||||
Files containing this device operation:
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_bilinear_instance.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_binary_outelementop_instance.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_comp_instance.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_dynamic_op_instance.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_instance.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_mem_instance.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_merged_groups_instance.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_outelementop_instance.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_scale_instance.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_scaleadd_ab_instance.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_scaleadd_scaleadd_relu_instance.hpp
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
Device Operation: DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3
|
||||
Total Template Instantiations: 141
|
||||
Number of Files: 2
|
||||
|
||||
Files containing this device operation:
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_comp_instance.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_mem_instance.hpp
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
Device Operation: DeviceGroupedConvFwdMultipleD_Wmma_CShuffle
|
||||
Total Template Instantiations: 42
|
||||
Number of Files: 1
|
||||
|
||||
Files containing this device operation:
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_wmma_instance.hpp
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
Device Operation: DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor
|
||||
Total Template Instantiations: 12
|
||||
Number of Files: 1
|
||||
|
||||
Files containing this device operation:
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_large_tensor_instance.hpp
|
||||
Reference in New Issue
Block a user