mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-07-01 12:17:00 +00:00
Remove obsolete files.
This commit is contained in:
@@ -1,377 +0,0 @@
|
||||
# Forward Convolution Device Operations - Comprehensive Summary
|
||||
|
||||
**Generated:** October 29, 2025
|
||||
**Namespace:** `ck::tensor_operation::device`
|
||||
**Location:** `/library/include` and `/library/src` directories
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
This document provides a comprehensive list of all device operations used for forward convolutions in the Composable Kernel library, along with their template instantiations.
|
||||
|
||||
### Total Statistics
|
||||
- **Unique Device Operation Types:** 8
|
||||
- **Total Template Instantiations:** 1,000
|
||||
- **Files Analyzed:** 420
|
||||
- **Header Files:** Located in `/library/include/ck/library/tensor_operation_instance/gpu/`
|
||||
- **Source Files:** Located in `/library/src/tensor_operation_instance/gpu/`
|
||||
|
||||
---
|
||||
|
||||
## Device Operations Table
|
||||
|
||||
### Grouped Convolution Device Operations
|
||||
|
||||
| # | Device Operation Name | Instantiations | Files | Description |
|
||||
|---|----------------------|----------------|-------|-------------|
|
||||
| 1 | `DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle` | 570 | 9 | XDL-based with multiple ABD inputs, CShuffle optimization |
|
||||
| 2 | `DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3` | 141 | 2 | Version 3 with advanced pipeline scheduling |
|
||||
| 3 | `DeviceGroupedConvFwdMultipleD_Wmma_CShuffle` | 42 | 1 | WMMA-based (Wave Matrix Multiply Accumulate) |
|
||||
| 4 | `DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor` | 12 | 1 | Optimized for large tensor dimensions |
|
||||
| 5 | `DeviceGroupedConvFwdDlMultipleD_NHWC_KYXC_NHWK` | 6 | 1 | Direct Load implementation |
|
||||
| **Total** | **Grouped Convolution Operations** | **771** | **14** | |
|
||||
|
||||
### Non-Grouped Convolution Device Operations
|
||||
|
||||
| # | Device Operation Name | Instantiations | Files | Description |
|
||||
|---|----------------------|----------------|-------|-------------|
|
||||
| 1 | `DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K` | 156 | 4 | Standard 2D convolution, explicit layout |
|
||||
| 2 | `DeviceConv2dFwdXdl_C_Shuffle_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K` | 61 | 1 | 2D convolution with C-Shuffle optimization |
|
||||
| 3 | `DeviceConvFwd` | 12 | 6 | Generic interface type |
|
||||
| **Total** | **Non-Grouped Operations** | **229** | **11** | |
|
||||
|
||||
### Grand Total: 1,000 Template Instantiations across 8 Device Operation Types
|
||||
|
||||
---
|
||||
|
||||
## Device Operations List
|
||||
|
||||
### 1. DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle
|
||||
**Most commonly used device operation for forward convolutions**
|
||||
- **Template Instantiations:** 570
|
||||
- **Files:** 9 header files
|
||||
- **Primary Header:** `grouped_conv_fwd/device_grouped_conv_fwd_xdl_instance.hpp`
|
||||
- **Description:** XDL-based grouped convolution with multiple auxiliary inputs/outputs and CShuffle optimization
|
||||
- **Variants:**
|
||||
- Standard instances (BF16, F16, F32, INT8, F8, BF8)
|
||||
- 16x16 MFMA instances
|
||||
- NCHW layout instances
|
||||
- Generic instances
|
||||
- Compute-friendly instances with FP8
|
||||
|
||||
**Key Files:**
|
||||
- `device_grouped_conv_fwd_xdl_instance.hpp` - Main instantiations (193 instances)
|
||||
- `device_grouped_conv_fwd_xdl_bilinear_instance.hpp` - With bilinear fusion (80 instances)
|
||||
- `device_grouped_conv_fwd_xdl_scale_instance.hpp` - With scale operation (80 instances)
|
||||
- `device_grouped_conv_fwd_xdl_dynamic_op_instance.hpp` - With dynamic operations (64 instances)
|
||||
- `device_grouped_conv_fwd_xdl_outelementop_instance.hpp` - With custom output ops (80 instances)
|
||||
- `device_grouped_conv_fwd_xdl_merged_groups_instance.hpp` - Merged groups optimization (21 instances)
|
||||
- `device_grouped_conv_fwd_xdl_scaleadd_ab_instance.hpp` - ScaleAdd operations (20 instances)
|
||||
- `device_grouped_conv_fwd_xdl_scaleadd_scaleadd_relu_instance.hpp` - Fused operations (16 instances)
|
||||
- `device_grouped_conv_fwd_xdl_binary_outelementop_instance.hpp` - Binary output ops (16 instances)
|
||||
|
||||
---
|
||||
|
||||
### 2. DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K
|
||||
**Standard 2D convolution forward operation**
|
||||
- **Template Instantiations:** 156
|
||||
- **Files:** 4 source files
|
||||
- **Description:** XDL-based 2D convolution with explicit NHWC layout
|
||||
|
||||
**Instantiation Files:**
|
||||
- `conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_bf16_instance.cpp` (39 instances)
|
||||
- `conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f16_instance.cpp` (39 instances)
|
||||
- `conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f32_instance.cpp` (39 instances)
|
||||
- `conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_int8_instance.cpp` (39 instances)
|
||||
|
||||
---
|
||||
|
||||
### 3. DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3
|
||||
**Version 3 of XDL CShuffle implementation**
|
||||
- **Template Instantiations:** 141
|
||||
- **Files:** 2 header files
|
||||
- **Description:** Advanced XDL implementation with compute-friendly scheduling and memory optimizations
|
||||
|
||||
**Key Files:**
|
||||
- `device_grouped_conv_fwd_xdl_comp_instance.hpp` (49 instances) - Compute-optimized variants
|
||||
- `device_grouped_conv_fwd_xdl_mem_instance.hpp` (92 instances) - Memory-optimized variants
|
||||
|
||||
**Features:**
|
||||
- BlockGemmPipelineScheduler variants (Interwave, Intrawave)
|
||||
- Multiple pipeline versions (v1, v3, v4, v5)
|
||||
- Support for BF16, F16, F32, TF32, INT8
|
||||
|
||||
---
|
||||
|
||||
### 4. DeviceConv2dFwdXdl_C_Shuffle_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K
|
||||
**C-Shuffle variant for 2D convolution**
|
||||
- **Template Instantiations:** 61
|
||||
- **Files:** 1 source file
|
||||
- **File:** `conv2d_fwd/device_conv2d_fwd_xdl_c_shuffle_nhwc_kyxc_nhwk_f16_instance.cpp`
|
||||
- **Data Type:** F16 only
|
||||
- **Description:** CShuffle optimization for channel dimension
|
||||
|
||||
---
|
||||
|
||||
### 5. DeviceGroupedConvFwdMultipleD_Wmma_CShuffle
|
||||
**WMMA-based grouped convolution**
|
||||
- **Template Instantiations:** 42
|
||||
- **Files:** 1 header file
|
||||
- **File:** `grouped_conv_fwd/device_grouped_conv_fwd_wmma_instance.hpp`
|
||||
- **Description:** Uses Wave Matrix Multiply Accumulate (WMMA) instructions
|
||||
- **Data Types:** F16, INT8
|
||||
- **Block Sizes:** 32, 64, 128, 256
|
||||
|
||||
---
|
||||
|
||||
### 6. DeviceConvFwd
|
||||
**Generic convolution forward interface**
|
||||
- **Template Instantiations:** 12
|
||||
- **Files:** 6 files (headers and sources)
|
||||
- **Description:** High-level interface type for convolution forward operations
|
||||
- **Usage:** Factory pattern and API definitions
|
||||
|
||||
---
|
||||
|
||||
### 7. DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor
|
||||
**Large tensor optimization**
|
||||
- **Template Instantiations:** 12
|
||||
- **Files:** 1 header file
|
||||
- **File:** `grouped_conv_fwd/device_grouped_conv_fwd_xdl_large_tensor_instance.hpp`
|
||||
- **Description:** Optimized for large tensor dimensions
|
||||
- **Data Types:** BF16, F16, F32, TF32, INT8
|
||||
|
||||
---
|
||||
|
||||
### 8. DeviceGroupedConvFwdDlMultipleD_NHWC_KYXC_NHWK
|
||||
**Direct Load (DL) implementation**
|
||||
- **Template Instantiations:** 6
|
||||
- **Files:** 1 header file
|
||||
- **File:** `grouped_conv_fwd/device_grouped_conv_fwd_dl_instance.hpp`
|
||||
- **Description:** Direct load convolution implementation for specific layouts
|
||||
- **Data Types:** F16, F32
|
||||
|
||||
---
|
||||
|
||||
## File Organization
|
||||
|
||||
### Header Files (`/library/include/ck/library/tensor_operation_instance/gpu/`)
|
||||
|
||||
#### Grouped Convolution Forward (`grouped_conv_fwd/`)
|
||||
1. `device_grouped_conv_fwd_dl_instance.hpp` - Direct Load implementations
|
||||
2. `device_grouped_conv_fwd_wmma_instance.hpp` - WMMA implementations
|
||||
3. `device_grouped_conv_fwd_xdl_instance.hpp` - **Main XDL implementations**
|
||||
4. `device_grouped_conv_fwd_xdl_bilinear_instance.hpp` - With bilinear fusion
|
||||
5. `device_grouped_conv_fwd_xdl_binary_outelementop_instance.hpp` - Binary output ops
|
||||
6. `device_grouped_conv_fwd_xdl_comp_instance.hpp` - Compute-optimized
|
||||
7. `device_grouped_conv_fwd_xdl_dynamic_op_instance.hpp` - Dynamic operations
|
||||
8. `device_grouped_conv_fwd_xdl_large_tensor_instance.hpp` - Large tensor optimization
|
||||
9. `device_grouped_conv_fwd_xdl_mem_instance.hpp` - Memory-optimized
|
||||
10. `device_grouped_conv_fwd_xdl_merged_groups_instance.hpp` - Merged groups
|
||||
11. `device_grouped_conv_fwd_xdl_outelementop_instance.hpp` - Custom output operations
|
||||
12. `device_grouped_conv_fwd_xdl_scale_instance.hpp` - With scale operation
|
||||
13. `device_grouped_conv_fwd_xdl_scaleadd_ab_instance.hpp` - ScaleAdd on inputs
|
||||
14. `device_grouped_conv_fwd_xdl_scaleadd_scaleadd_relu_instance.hpp` - Fused operations
|
||||
|
||||
### Source Files (`/library/src/tensor_operation_instance/gpu/`)
|
||||
|
||||
#### Conv2D Forward (`conv2d_fwd/`)
|
||||
- Standard XDL instances for BF16, F16, F32, INT8
|
||||
- C-Shuffle variant for F16
|
||||
|
||||
#### Grouped Conv1D/2D/3D Forward
|
||||
Organized by:
|
||||
- **Algorithm:** `dl/`, `wmma/`, `xdl/`
|
||||
- **Optimization:** `comp/`, `mem/`, `large_tensor/`, `merged_groups/`
|
||||
- **Data Type:** Per file (bf16, f16, f32, int8, fp8, bf8)
|
||||
- **Layout:** Encoded in filename (nhwgc, ngchw, etc.)
|
||||
|
||||
---
|
||||
|
||||
## Template Instantiation Patterns
|
||||
|
||||
### Common Parameters
|
||||
Template instantiations typically include:
|
||||
- **Spatial Dimensions:** 1D, 2D, 3D (NDimSpatial)
|
||||
- **Layouts:** NHWC, NCHW, and grouped variants (NHWGC, NGCHW, etc.)
|
||||
- **Data Types:** BF16, F16, F32, TF32, INT8, F8, BF8
|
||||
- **Accumulator Type:** F32, INT32
|
||||
- **Block Sizes:** 32, 64, 128, 256
|
||||
- **Thread Tile Sizes:** MPerBlock, NPerBlock, KPerBlock
|
||||
- **MFMA Sizes:** 16x16, 32x32
|
||||
- **Pipeline Stages:** 1-2 stages
|
||||
- **Element-wise Operations:** PassThrough, Scale, ScaleAdd, Bilinear, ReLU, Clamp, etc.
|
||||
|
||||
### Example Instantiation (from DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle)
|
||||
```cpp
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<
|
||||
NDimSpatial, // 2 or 3
|
||||
ALayout, // NHWGC, NGCHW, etc.
|
||||
BLayout, // GKYXC, GKCYX, etc.
|
||||
DsLayout, // Additional inputs layout
|
||||
ELayout, // Output layout
|
||||
BF16, // Input data type
|
||||
BF16, // Weight data type
|
||||
F32, // Accumulator type
|
||||
BF16, // CShuffle data type
|
||||
DsDataTypes, // Additional input types
|
||||
BF16, // Output data type
|
||||
PassThrough, // Input element-wise op
|
||||
PassThrough, // Weight element-wise op
|
||||
OutElementOp, // Output element-wise op
|
||||
ConvSpec, // Convolution specialization
|
||||
GemmMNKPadding, // GEMM specialization
|
||||
1, // NumGemmKPrefetchStage
|
||||
256, // BlockSize
|
||||
128, // MPerBlock
|
||||
128, // NPerBlock
|
||||
32, // KPerBlock
|
||||
8, // AK1
|
||||
8, // BK1
|
||||
32, // MPerXdl
|
||||
32, // NPerXdl
|
||||
2, // MXdlPerWave
|
||||
2, // NXdlPerWave
|
||||
// ... additional block transfer parameters
|
||||
>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Key Findings
|
||||
|
||||
1. **DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle is the primary device operation** with 570 instantiations across 9 different header files for various use cases and fusion patterns.
|
||||
|
||||
2. **Eight distinct device operation types** are used for forward convolutions, each optimized for different scenarios:
|
||||
- XDL (Matrix Core) based operations
|
||||
- WMMA (Wave Matrix Multiply) based operations
|
||||
- Direct Load (DL) based operations
|
||||
- Specialized variants for large tensors, merged groups, and memory optimization
|
||||
|
||||
3. **Multiple optimization strategies:**
|
||||
- Compute-friendly scheduling (Interwave, Intrawave)
|
||||
- Memory access patterns (Inter-wave, Intra-wave)
|
||||
- Pipeline versions (v1, v3, v4, v5)
|
||||
- CShuffle for efficient data movement
|
||||
|
||||
4. **Comprehensive data type support:** BF16, F16, F32, TF32, INT8, F8, BF8
|
||||
|
||||
5. **Extensive fusion support:** Operations can be fused with Scale, ScaleAdd, Bilinear, ReLU, Clamp, and combinations thereof.
|
||||
|
||||
---
|
||||
|
||||
## Output Files
|
||||
|
||||
### Complete Instantiation Files (Recommended)
|
||||
|
||||
1. **`forward_conv_all_instantiations.txt`** (593KB, 4,271 lines)
|
||||
- **COMPLETE listing of ALL 1,000 template instantiations**
|
||||
- Human-readable format with line numbers
|
||||
- Every instantiation shown in full detail
|
||||
- Organized by device operation, then by file
|
||||
|
||||
2. **`forward_conv_all_instantiations.json`** (1.4MB)
|
||||
- **Structured JSON for programmatic instantiation generation**
|
||||
- Complete instantiation text for each template
|
||||
- Includes parsed parameters (data types, block sizes)
|
||||
- Hierarchical organization: device_operation → file → instantiations[]
|
||||
- Ready for automated code generation tools
|
||||
|
||||
### Summary and Quick Reference Files
|
||||
|
||||
3. **`FORWARD_CONVOLUTION_DEVICE_OPS_SUMMARY.md`** (This file)
|
||||
- Executive summary with tables
|
||||
- Device operation descriptions
|
||||
- File organization reference
|
||||
|
||||
4. **`forward_convolution_device_ops_report.txt`** (33KB)
|
||||
- High-level summary of all device operations
|
||||
- Lists all files containing each device operation
|
||||
- Quick reference guide
|
||||
|
||||
5. **`forward_conv_device_ops_detailed_report.txt`** (43KB)
|
||||
- Detailed report with sample template instantiations
|
||||
- Shows first 3 examples per file
|
||||
- Includes line numbers and file locations
|
||||
|
||||
6. **`forward_convolution_device_ops_data.json`** (34KB)
|
||||
- Basic JSON format with summary data
|
||||
- File paths and instantiation counts
|
||||
|
||||
---
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Example 1: DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle with BF16
|
||||
Found in: `device_grouped_conv_fwd_xdl_instance.hpp`
|
||||
|
||||
This is the most versatile device operation with:
|
||||
- Generic instances for all block sizes
|
||||
- Optimized instances for small conv.K and conv.C
|
||||
- NCHW layout support
|
||||
- 16x16 MFMA support
|
||||
- FP8 compute support
|
||||
|
||||
### Example 2: DeviceGroupedConvFwdDlMultipleD_NHWC_KYXC_NHWK
|
||||
Found in: `device_grouped_conv_fwd_dl_instance.hpp`
|
||||
|
||||
Direct Load implementation with:
|
||||
- F16 and F32 support
|
||||
- Specific block configurations
|
||||
- NHWC layout only
|
||||
|
||||
### Example 3: DeviceGroupedConvFwdMultipleD_Wmma_CShuffle
|
||||
Found in: `device_grouped_conv_fwd_wmma_instance.hpp`
|
||||
|
||||
WMMA-based implementation with:
|
||||
- F16 and INT8 support
|
||||
- Multiple block sizes (32, 64, 128, 256)
|
||||
- 16x16 WMMA instruction usage
|
||||
|
||||
---
|
||||
|
||||
## File Locations Reference
|
||||
|
||||
### Main Device Operation Headers
|
||||
```
|
||||
library/include/ck/library/tensor_operation_instance/gpu/
|
||||
├── grouped_conv_fwd/
|
||||
│ ├── device_grouped_conv_fwd_xdl_instance.hpp ← PRIMARY FILE
|
||||
│ ├── device_grouped_conv_fwd_dl_instance.hpp
|
||||
│ ├── device_grouped_conv_fwd_wmma_instance.hpp
|
||||
│ ├── device_grouped_conv_fwd_xdl_comp_instance.hpp
|
||||
│ ├── device_grouped_conv_fwd_xdl_mem_instance.hpp
|
||||
│ ├── device_grouped_conv_fwd_xdl_large_tensor_instance.hpp
|
||||
│ └── ... (other variants)
|
||||
└── convolution_forward.hpp ← INTERFACE DEFINITIONS
|
||||
```
|
||||
|
||||
### Source Instantiations
|
||||
```
|
||||
library/src/tensor_operation_instance/gpu/
|
||||
├── conv2d_fwd/ ← Non-grouped 2D convolutions
|
||||
├── grouped_conv1d_fwd/ ← 1D grouped convolutions
|
||||
├── grouped_conv2d_fwd/ ← 2D grouped convolutions
|
||||
│ ├── dl/ ← Direct Load variants
|
||||
│ ├── wmma/ ← WMMA variants
|
||||
│ └── xdl/ ← XDL variants
|
||||
│ ├── comp/ ← Compute-optimized
|
||||
│ ├── mem/ ← Memory-optimized
|
||||
│ ├── large_tensor/ ← Large tensor optimized
|
||||
│ └── merged_groups/ ← Merged groups optimized
|
||||
└── grouped_conv3d_fwd/ ← 3D grouped convolutions
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Additional Resources
|
||||
|
||||
For detailed analysis and full template instantiations, refer to the generated reports:
|
||||
- `forward_convolution_device_ops_report.txt` - Quick summary
|
||||
- `forward_conv_device_ops_detailed_report.txt` - Full details with examples
|
||||
- `forward_convolution_device_ops_data.json` - Machine-readable format
|
||||
|
||||
The analysis scripts used to generate this information:
|
||||
- `extract_conv_fwd_device_ops.py` - Initial device operation extraction
|
||||
- `extract_detailed_instantiations.py` - Detailed instantiation analysis
|
||||
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@@ -1,428 +0,0 @@
|
||||
====================================================================================================
|
||||
DETAILED FORWARD CONVOLUTION DEVICE OPERATIONS AND TEMPLATE INSTANTIATIONS
|
||||
====================================================================================================
|
||||
|
||||
Generated for: ck::tensor_operation::device namespace
|
||||
|
||||
SUMMARY
|
||||
----------------------------------------------------------------------------------------------------
|
||||
Total Device Operation Types: 8
|
||||
Total Template Instantiations: 1000
|
||||
|
||||
• DeviceConv2dFwdXdl_C_Shuffle_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K
|
||||
- Instantiations: 61
|
||||
- Files: 1
|
||||
• DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K
|
||||
- Instantiations: 156
|
||||
- Files: 4
|
||||
• DeviceConvFwd
|
||||
- Instantiations: 12
|
||||
- Files: 6
|
||||
• DeviceGroupedConvFwdDlMultipleD_NHWC_KYXC_NHWK
|
||||
- Instantiations: 6
|
||||
- Files: 1
|
||||
• DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle
|
||||
- Instantiations: 570
|
||||
- Files: 9
|
||||
• DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3
|
||||
- Instantiations: 141
|
||||
- Files: 2
|
||||
• DeviceGroupedConvFwdMultipleD_Wmma_CShuffle
|
||||
- Instantiations: 42
|
||||
- Files: 1
|
||||
• DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor
|
||||
- Instantiations: 12
|
||||
- Files: 1
|
||||
|
||||
====================================================================================================
|
||||
|
||||
====================================================================================================
|
||||
DEVICE OPERATION: DeviceConv2dFwdXdl_C_Shuffle_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K
|
||||
====================================================================================================
|
||||
|
||||
Total Template Instantiations: 61
|
||||
Number of Files: 1
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
File: library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_c_shuffle_nhwc_kyxc_nhwk_f16_instance.cpp
|
||||
Instantiations in this file: 61
|
||||
|
||||
Sample instantiations:
|
||||
[1] Line 49:
|
||||
DeviceConv2dFwdXdl_C_Shuffle_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K< F16, F16, F16, F32, PassThrough, PassThrough, PassThrough, ConvFwdDefault, 256, 256, 128, 4, 8, 32, 32, 4, 2, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, 1, 1, S<1, 1, 32, 1, 1, 8>, ...
|
||||
[2] Line 50:
|
||||
DeviceConv2dFwdXdl_C_Shuffle_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K< F16, F16, F16, F32, PassThrough, PassThrough, PassThrough, ConvFwdDefault, 256, 128, 256, 4, 8, 32, 32, 2, 4, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, 1, 1, S<1, 1, 32, 1, 1, 8>, ...
|
||||
[3] Line 51:
|
||||
DeviceConv2dFwdXdl_C_Shuffle_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K< F16, F16, F16, F32, PassThrough, PassThrough, PassThrough, ConvFwdDefault, 128, 128, 128, 4, 8, 32, 32, 4, 2, S<4, 32, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, S<4, 32, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, 1, 1, S<1, 1, 16, 1, 1, 8>, ...
|
||||
... and 58 more instantiations
|
||||
|
||||
|
||||
====================================================================================================
|
||||
DEVICE OPERATION: DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K
|
||||
====================================================================================================
|
||||
|
||||
Total Template Instantiations: 156
|
||||
Number of Files: 4
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
File: library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_bf16_instance.cpp
|
||||
Instantiations in this file: 39
|
||||
|
||||
Sample instantiations:
|
||||
[1] Line 45:
|
||||
DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K< BF16, BF16, BF16, F32, PassThrough, PassThrough, PassThrough, ConvFwdDefault, 256, 256, 128, 4, 8, 32, 32, 4, 2, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, 7, 1>,
|
||||
[2] Line 46:
|
||||
DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K< BF16, BF16, BF16, F32, PassThrough, PassThrough, PassThrough, ConvFwdDefault, 256, 128, 256, 4, 8, 32, 32, 2, 4, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, 7, 1>,
|
||||
[3] Line 47:
|
||||
DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K< BF16, BF16, BF16, F32, PassThrough, PassThrough, PassThrough, ConvFwdDefault, 128, 128, 128, 4, 8, 32, 32, 4, 2, S<4, 32, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, S<4, 32, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, 7, 1>,
|
||||
... and 36 more instantiations
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
File: library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f16_instance.cpp
|
||||
Instantiations in this file: 39
|
||||
|
||||
Sample instantiations:
|
||||
[1] Line 45:
|
||||
DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K< F16, F16, F16, F32, PassThrough, PassThrough, PassThrough, ConvFwdDefault, 256, 256, 128, 4, 8, 32, 32, 4, 2, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, 7, 1>,
|
||||
[2] Line 46:
|
||||
DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K< F16, F16, F16, F32, PassThrough, PassThrough, PassThrough, ConvFwdDefault, 256, 128, 256, 4, 8, 32, 32, 2, 4, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, 7, 1>,
|
||||
[3] Line 47:
|
||||
DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K< F16, F16, F16, F32, PassThrough, PassThrough, PassThrough, ConvFwdDefault, 128, 128, 128, 4, 8, 32, 32, 4, 2, S<4, 32, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, S<4, 32, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, true, 7, 1>,
|
||||
... and 36 more instantiations
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
File: library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f32_instance.cpp
|
||||
Instantiations in this file: 39
|
||||
|
||||
Sample instantiations:
|
||||
[1] Line 44:
|
||||
DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K< F32, F32, F32, F32, PassThrough, PassThrough, PassThrough, ConvFwdDefault, 256, 256, 128, 4, 4, 32, 32, 4, 2, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 4, 4, true, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 4, 4, true, 7, 1>,
|
||||
[2] Line 45:
|
||||
DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K< F32, F32, F32, F32, PassThrough, PassThrough, PassThrough, ConvFwdDefault, 256, 128, 256, 4, 4, 32, 32, 2, 4, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 4, 4, true, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 4, 4, true, 7, 1>,
|
||||
[3] Line 46:
|
||||
DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K< F32, F32, F32, F32, PassThrough, PassThrough, PassThrough, ConvFwdDefault, 128, 128, 128, 4, 4, 32, 32, 4, 2, S<4, 32, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 4, 4, true, S<4, 32, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 4, 4, true, 7, 1>,
|
||||
... and 36 more instantiations
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
File: library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_int8_instance.cpp
|
||||
Instantiations in this file: 39
|
||||
|
||||
Sample instantiations:
|
||||
[1] Line 42:
|
||||
DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K< int8_t, int8_t, int8_t, int32_t, PassThrough, PassThrough, PassThrough, ConvFwdDefault, 256, 256, 128, 4, 16, 32, 32, 4, 2, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 16, 16, true, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 16, 16, true, 7, 1>,
|
||||
[2] Line 43:
|
||||
DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K< int8_t, int8_t, int8_t, int32_t, PassThrough, PassThrough, PassThrough, ConvFwdDefault, 256, 128, 256, 4, 16, 32, 32, 2, 4, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 16, 16, true, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 16, 16, true, 7, 1>,
|
||||
[3] Line 44:
|
||||
DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K< int8_t, int8_t, int8_t, int32_t, PassThrough, PassThrough, PassThrough, ConvFwdDefault, 128, 128, 128, 4, 16, 32, 32, 4, 2, S<4, 32, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 16, 16, true, S<4, 32, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 16, 16, true, 7, 1>,
|
||||
... and 36 more instantiations
|
||||
|
||||
|
||||
====================================================================================================
|
||||
DEVICE OPERATION: DeviceConvFwd
|
||||
====================================================================================================
|
||||
|
||||
Total Template Instantiations: 12
|
||||
Number of Files: 6
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
File: library/include/ck/library/tensor_operation_instance/gpu/convolution_forward.hpp
|
||||
Instantiations in this file: 7
|
||||
|
||||
Sample instantiations:
|
||||
[1] Line 24:
|
||||
DeviceConvFwd<2, NHWC, KYXC, NHWK, F16, F16, F16, PassThrough, PassThrough, PassThrough>>>&
|
||||
[2] Line 28:
|
||||
DeviceConvFwd<2, NHWC, KYXC, NHWK, F16, F16, F16, PassThrough, PassThrough, PassThrough>>>&
|
||||
[3] Line 33:
|
||||
std::vector<std::unique_ptr<DeviceConvFwd<2,
|
||||
NHWC,
|
||||
KYXC,
|
||||
NHWK,
|
||||
BF16,
|
||||
BF16,
|
||||
BF16,
|
||||
PassThrough,
|
||||
PassThrough,
|
||||
...
|
||||
... and 4 more instantiations
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
File: library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_c_shuffle_nhwc_kyxc_nhwk_f16_instance.cpp
|
||||
Instantiations in this file: 1
|
||||
|
||||
Sample instantiations:
|
||||
[1] Line 180:
|
||||
DeviceConvFwd<2, NHWC, KYXC, NHWK, F16, F16, F16, PassThrough, PassThrough, PassThrough>>>&
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
File: library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_bf16_instance.cpp
|
||||
Instantiations in this file: 1
|
||||
|
||||
Sample instantiations:
|
||||
[1] Line 106:
|
||||
std::vector<std::unique_ptr<DeviceConvFwd<2,
|
||||
NHWC,
|
||||
KYXC,
|
||||
NHWK,
|
||||
BF16,
|
||||
BF16,
|
||||
BF16,
|
||||
PassThrough,
|
||||
PassThrough,
|
||||
...
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
File: library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f16_instance.cpp
|
||||
Instantiations in this file: 1
|
||||
|
||||
Sample instantiations:
|
||||
[1] Line 107:
|
||||
DeviceConvFwd<2, NHWC, KYXC, NHWK, F16, F16, F16, PassThrough, PassThrough, PassThrough>>>&
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
File: library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f32_instance.cpp
|
||||
Instantiations in this file: 1
|
||||
|
||||
Sample instantiations:
|
||||
[1] Line 106:
|
||||
DeviceConvFwd<2, NHWC, KYXC, NHWK, F32, F32, F32, PassThrough, PassThrough, PassThrough>>>&
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
File: library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_int8_instance.cpp
|
||||
Instantiations in this file: 1
|
||||
|
||||
Sample instantiations:
|
||||
[1] Line 103:
|
||||
std::vector<std::unique_ptr<DeviceConvFwd<2,
|
||||
NHWC,
|
||||
KYXC,
|
||||
NHWK,
|
||||
int8_t,
|
||||
int8_t,
|
||||
int8_t,
|
||||
PassThrough,
|
||||
PassThrough,
|
||||
...
|
||||
|
||||
|
||||
====================================================================================================
|
||||
DEVICE OPERATION: DeviceGroupedConvFwdDlMultipleD_NHWC_KYXC_NHWK
|
||||
====================================================================================================
|
||||
|
||||
Total Template Instantiations: 6
|
||||
Number of Files: 1
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
File: library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_dl_instance.hpp
|
||||
Instantiations in this file: 6
|
||||
|
||||
Sample instantiations:
|
||||
[1] Line 57:
|
||||
DeviceGroupedConvFwdDlMultipleD_NHWC_KYXC_NHWK< 2, F16, F16, DsDatatype, F16, F32, InLayout, WeiLayout, DsLayout, OutLayout, PassThrough, PassThrough, CDEElementOp, ConvSpec, GemmMNKPadding, 8, 16, 4, 2, 1, 1, 2, 1, S<4, 2>, S<1, 1>, S<2, 1, 2, 1>, S<1, 1, 8, 1>, S<1, 2, 0, 3>, S<1, 2, 0, 3>, S<1, 1, 1, 1>, S<1, 2, 0, 3>, S<1, 1, 1, 1>, S<1, 1, 1, 1>, S<2, 1, 4, 1>,...
|
||||
[2] Line 58:
|
||||
DeviceGroupedConvFwdDlMultipleD_NHWC_KYXC_NHWK< 2, F16, F16, DsDatatype, F16, F32, InLayout, WeiLayout, DsLayout, OutLayout, PassThrough, PassThrough, CDEElementOp, ConvSpec, GemmMNKPadding, 256, 128, 128, 16, 1, 4, 4, 1, S<8, 2>, S<8, 2>, S<8, 1, 1, 1>, S<2, 1, 128, 1>, S<1, 2, 0, 3>, S<1, 2, 0, 3>, S<1, 1, 1, 1>, S<1, 2, 0, 3>, S<1, 1, 1, 1>, S<8, 1, 1, 1>, S<2, 1, 128, 1>,...
|
||||
[3] Line 60:
|
||||
DeviceGroupedConvFwdDlMultipleD_NHWC_KYXC_NHWK< 2, F16, F16, DsDatatype, F16, F32, InLayout, WeiLayout, DsLayout, OutLayout, PassThrough, PassThrough, CDEElementOp, ConvSpec, GemmMNKPadding, 256, 128, 128, 16, 2, 4, 4, 1, S<8, 2>, S<8, 2>, S<8, 1, 1, 2>, S<2, 1, 128, 1>, S<1, 2, 0, 3>, S<1, 2, 0, 3>, S<4, 1, 1, 2>, S<1, 2, 0, 3>, S<1, 1, 1, 2>, S<8, 1, 1, 2>, S<2, 1, 128, 1>,...
|
||||
... and 3 more instantiations
|
||||
|
||||
|
||||
====================================================================================================
|
||||
DEVICE OPERATION: DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle
|
||||
====================================================================================================
|
||||
|
||||
Total Template Instantiations: 570
|
||||
Number of Files: 9
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
File: library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_bilinear_instance.hpp
|
||||
Instantiations in this file: 80
|
||||
|
||||
Sample instantiations:
|
||||
[1] Line 54:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, Tuple<BF16>, BF16, PassThrough, PassThrough, Bilinear, ConvSpec, GemmMNKPadding, 1, 64, 64, 64, 32, 8, 8, 32, 32, 2, 2, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, ...
|
||||
[2] Line 56:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, Tuple<BF16>, BF16, PassThrough, PassThrough, Bilinear, ConvSpec, GemmMNKPadding, 1, 64, 64, 32, 32, 8, 8, 32, 32, 2, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, ...
|
||||
[3] Line 57:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, Tuple<BF16>, BF16, PassThrough, PassThrough, Bilinear, ConvSpec, GemmMNKPadding, 1, 256, 128, 128, 32, 8, 8, 32, 32, 2, 2, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, ...
|
||||
... and 77 more instantiations
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
File: library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_binary_outelementop_instance.hpp
|
||||
Instantiations in this file: 16
|
||||
|
||||
Sample instantiations:
|
||||
[1] Line 53:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, F8, F8, F32, F32, Tuple<F32>, F8, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 1, 64, 64, 64, 32, 8, 8, 32, 32, 2, 2, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, ...
|
||||
[2] Line 55:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, F8, F8, F32, F32, Tuple<F32>, F8, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 1, 64, 64, 32, 32, 8, 8, 32, 32, 2, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, ...
|
||||
[3] Line 56:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, F8, F8, F32, F32, Tuple<F32>, F8, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 1, 256, 128, 128, 32, 8, 8, 32, 32, 2, 2, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, ...
|
||||
... and 13 more instantiations
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
File: library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_dynamic_op_instance.hpp
|
||||
Instantiations in this file: 64
|
||||
|
||||
Sample instantiations:
|
||||
[1] Line 54:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, Tuple<>, BF16, PassThrough, PassThrough, DynamicUnaryOp, ConvSpec, GemmMNKPadding, 1, 64, 64, 64, 32, 8, 8, 32, 32, 2, 2, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8,...
|
||||
[2] Line 56:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, Tuple<>, BF16, PassThrough, PassThrough, DynamicUnaryOp, ConvSpec, GemmMNKPadding, 1, 64, 64, 32, 32, 8, 8, 32, 32, 2, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8,...
|
||||
[3] Line 57:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, Tuple<>, BF16, PassThrough, PassThrough, DynamicUnaryOp, ConvSpec, GemmMNKPadding, 1, 256, 128, 128, 32, 8, 8, 32, 32, 2, 2, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8,...
|
||||
... and 61 more instantiations
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
File: library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_instance.hpp
|
||||
Instantiations in this file: 193
|
||||
|
||||
Sample instantiations:
|
||||
[1] Line 68:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, DsDataTypes, BF16, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 1, 64, 64, 64, 32, 8, 8, 32, 32, 2, 2, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, ...
|
||||
[2] Line 87:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, DsDataTypes, BF16, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 1, 64, 64, 64, 32, 8, 8, 32, 32, 2, 2, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, ...
|
||||
[3] Line 89:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, DsDataTypes, BF16, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 1, 64, 64, 32, 32, 8, 8, 32, 32, 2, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, ...
|
||||
... and 190 more instantiations
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
File: library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_merged_groups_instance.hpp
|
||||
Instantiations in this file: 21
|
||||
|
||||
Sample instantiations:
|
||||
[1] Line 55:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, DsDataTypes, BF16, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 1, 64, 64, 16, 16, 4, 4, 16, 16, 4, 1, S< 4, 16, 1>, S<0, 2, 1>, S<0, 2, 1>, 1, 4, 4, 1, S< 4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 4, 1...
|
||||
[2] Line 56:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, DsDataTypes, BF16, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 1, 64, 64, 16, 16, 4, 4, 16, 16, 4, 1, S< 4, 16, 1>, S<0, 2, 1>, S<0, 2, 1>, 1, 4, 4, 1, S< 4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 4, 1...
|
||||
[3] Line 57:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, DsDataTypes, BF16, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 1, 64, 64, 16, 16, 4, 4, 16, 16, 4, 1, S< 4, 16, 1>, S<0, 2, 1>, S<0, 2, 1>, 1, 4, 4, 1, S< 4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 4, 1...
|
||||
... and 18 more instantiations
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
File: library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_outelementop_instance.hpp
|
||||
Instantiations in this file: 80
|
||||
|
||||
Sample instantiations:
|
||||
[1] Line 54:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, F8, F8, F32, F32, Tuple<>, F8, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 1, 64, 64, 64, 32, 8, 8, 32, 32, 2, 2, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, ...
|
||||
[2] Line 56:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, F8, F8, F32, F32, Tuple<>, F8, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 1, 64, 64, 32, 32, 8, 8, 32, 32, 2, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, ...
|
||||
[3] Line 57:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, F8, F8, F32, F32, Tuple<>, F8, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 1, 256, 128, 128, 32, 8, 8, 32, 32, 2, 2, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, ...
|
||||
... and 77 more instantiations
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
File: library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_scale_instance.hpp
|
||||
Instantiations in this file: 80
|
||||
|
||||
Sample instantiations:
|
||||
[1] Line 54:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, Tuple<>, BF16, PassThrough, PassThrough, Scale, ConvSpec, GemmMNKPadding, 1, 64, 64, 64, 32, 8, 8, 32, 32, 2, 2, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, ...
|
||||
[2] Line 56:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, Tuple<>, BF16, PassThrough, PassThrough, Scale, ConvSpec, GemmMNKPadding, 1, 64, 64, 32, 32, 8, 8, 32, 32, 2, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, ...
|
||||
[3] Line 57:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, Tuple<>, BF16, PassThrough, PassThrough, Scale, ConvSpec, GemmMNKPadding, 1, 256, 128, 128, 32, 8, 8, 32, 32, 2, 2, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, ...
|
||||
... and 77 more instantiations
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
File: library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_scaleadd_ab_instance.hpp
|
||||
Instantiations in this file: 20
|
||||
|
||||
Sample instantiations:
|
||||
[1] Line 50:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, ck::Tuple<>,ELayout, ck::Tuple<BF16, BF16>, ck::Tuple<BF16, BF16>, F32, BF16, ck::Tuple<>, BF16, ScaleAdd, ScaleAdd, PassThrough, ConvSpec, GemmMNKPadding, 1, 64, 64, 64, 32, 8, 8, 32, 32, 2, 2, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 1, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, ...
|
||||
[2] Line 52:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, ck::Tuple<>,ELayout, ck::Tuple<BF16, BF16>, ck::Tuple<BF16, BF16>, F32, BF16, ck::Tuple<>, BF16, ScaleAdd, ScaleAdd, PassThrough, ConvSpec, GemmMNKPadding, 1, 64, 64, 32, 32, 8, 8, 32, 32, 2, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, ...
|
||||
[3] Line 53:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, ck::Tuple<>,ELayout, ck::Tuple<BF16, BF16>, ck::Tuple<BF16, BF16>, F32, BF16, ck::Tuple<>, BF16, ScaleAdd, ScaleAdd, PassThrough, ConvSpec, GemmMNKPadding, 1, 256, 128, 128, 32, 8, 8, 32, 32, 2, 2, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 1, 1, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, ...
|
||||
... and 17 more instantiations
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
File: library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_scaleadd_scaleadd_relu_instance.hpp
|
||||
Instantiations in this file: 16
|
||||
|
||||
Sample instantiations:
|
||||
[1] Line 53:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, ck::Tuple<BF16, BF16>, BF16, PassThrough, PassThrough, ScaleAddScaleAddRelu, ConvSpec, GemmMNKPadding, 1, 64, 64, 64, 32, 8, 8, 32, 32, 2, 2, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, ...
|
||||
[2] Line 55:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, ck::Tuple<BF16, BF16>, BF16, PassThrough, PassThrough, ScaleAddScaleAddRelu, ConvSpec, GemmMNKPadding, 1, 64, 64, 32, 32, 8, 8, 32, 32, 2, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, ...
|
||||
[3] Line 56:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, ck::Tuple<BF16, BF16>, BF16, PassThrough, PassThrough, ScaleAddScaleAddRelu, ConvSpec, GemmMNKPadding, 1, 256, 128, 128, 32, 8, 8, 32, 32, 2, 2, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, ...
|
||||
... and 13 more instantiations
|
||||
|
||||
|
||||
====================================================================================================
|
||||
DEVICE OPERATION: DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3
|
||||
====================================================================================================
|
||||
|
||||
Total Template Instantiations: 141
|
||||
Number of Files: 2
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
File: library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_comp_instance.hpp
|
||||
Instantiations in this file: 49
|
||||
|
||||
Sample instantiations:
|
||||
[1] Line 71:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, DsDataTypes, BF16, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 256, 128, 128, 64, 16, 16, 32, 32, 2, 2, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 1, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, ...
|
||||
[2] Line 90:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, DsDataTypes, BF16, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 256, 256, 256, 32, 8, 8, 32, 32, 4, 4, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 0, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, ...
|
||||
[3] Line 91:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, DsDataTypes, BF16, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 256, 128, 128, 64, 8, 8, 32, 32, 2, 2, S<8, 32, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 0, S<8, 32, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, ...
|
||||
... and 46 more instantiations
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
File: library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_mem_instance.hpp
|
||||
Instantiations in this file: 92
|
||||
|
||||
Sample instantiations:
|
||||
[1] Line 69:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, DsDataTypes, BF16, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 128, 32, 16, 64, 8, 8, 16, 16, 1, 1, S<8, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 0, S<8, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, ...
|
||||
[2] Line 70:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, DsDataTypes, BF16, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 64, 16, 16, 128, 8, 8, 16, 16, 1, 1, S<16, 4, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 0, S<16, 4, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, ...
|
||||
[3] Line 71:
|
||||
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, DsDataTypes, BF16, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 64, 16, 16, 64, 8, 8, 16, 16, 1, 1, S<8, 8, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 0, S<8, 8, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, ...
|
||||
... and 89 more instantiations
|
||||
|
||||
|
||||
====================================================================================================
|
||||
DEVICE OPERATION: DeviceGroupedConvFwdMultipleD_Wmma_CShuffle
|
||||
====================================================================================================
|
||||
|
||||
Total Template Instantiations: 42
|
||||
Number of Files: 1
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
File: library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_wmma_instance.hpp
|
||||
Instantiations in this file: 42
|
||||
|
||||
Sample instantiations:
|
||||
[1] Line 62:
|
||||
DeviceGroupedConvFwdMultipleD_Wmma_CShuffle<NDSpatial, ALayout, BLayout, DsLayout, ELayout, F16, F16, F32, F16, DsDatatype, F16, PassThrough, PassThrough, CDEElementOp, ConvSpec, GemmMNKPadding, 1, 128, 64, 64, 32, 8, 16, 16, 2, 2, S<4, 32, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1, S<4, 32, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, ...
|
||||
[2] Line 64:
|
||||
DeviceGroupedConvFwdMultipleD_Wmma_CShuffle<NDSpatial, ALayout, BLayout, DsLayout, ELayout, F16, F16, F32, F16, DsDatatype, F16, PassThrough, PassThrough, CDEElementOp, ConvSpec, GemmMNKPadding, 1, 256, 128, 128, 32, 8, 16, 16, 4, 2, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 1, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, ...
|
||||
[3] Line 65:
|
||||
DeviceGroupedConvFwdMultipleD_Wmma_CShuffle<NDSpatial, ALayout, BLayout, DsLayout, ELayout, F16, F16, F32, F16, DsDatatype, F16, PassThrough, PassThrough, CDEElementOp, ConvSpec, GemmMNKPadding, 1, 256, 64, 256, 32, 8, 16, 16, 2, 4, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 1, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, ...
|
||||
... and 39 more instantiations
|
||||
|
||||
|
||||
====================================================================================================
|
||||
DEVICE OPERATION: DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor
|
||||
====================================================================================================
|
||||
|
||||
Total Template Instantiations: 12
|
||||
Number of Files: 1
|
||||
|
||||
----------------------------------------------------------------------------------------------------
|
||||
File: library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_large_tensor_instance.hpp
|
||||
Instantiations in this file: 12
|
||||
|
||||
Sample instantiations:
|
||||
[1] Line 53:
|
||||
DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, DsDataTypes, BF16, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 1, 64, 64, 64, 32, 8, 8, 32, 32, 2, 2, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, 8, 1, S<4, 16, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 1, ...
|
||||
[2] Line 55:
|
||||
DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, DsDataTypes, BF16, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 1, 256, 256, 128, 32, 8, 8, 32, 32, 4, 2, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 2, 8, 1, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 2, ...
|
||||
[3] Line 56:
|
||||
DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor<NDimSpatial,ALayout,BLayout, DsLayout,ELayout, BF16, BF16, F32, BF16, DsDataTypes, BF16, PassThrough, PassThrough, OutElementOp, ConvSpec, GemmMNKPadding, 1, 256, 256, 128, 32, 8, 8, 32, 32, 4, 2, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, 8, 1, S<4, 64, 1>, S<1, 0, 2>, S<1, 0, 2>, 2, 8, ...
|
||||
... and 9 more instantiations
|
||||
|
||||
@@ -1,290 +0,0 @@
|
||||
================================================================================
|
||||
FORWARD CONVOLUTION DEVICE OPERATIONS REPORT
|
||||
================================================================================
|
||||
|
||||
Total Unique Device Operations: 9
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
Device Operation: DeviceConv2dFwdXdl_C_Shuffle_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K
|
||||
Total Template Instantiations: 61
|
||||
Number of Files: 1
|
||||
|
||||
Files containing this device operation:
|
||||
- library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_c_shuffle_nhwc_kyxc_nhwk_f16_instance.cpp
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
Device Operation: DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K
|
||||
Total Template Instantiations: 156
|
||||
Number of Files: 4
|
||||
|
||||
Files containing this device operation:
|
||||
- library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_bf16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f32_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_int8_instance.cpp
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
Device Operation: DeviceConvFwd
|
||||
Total Template Instantiations: 12
|
||||
Number of Files: 6
|
||||
|
||||
Files containing this device operation:
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/convolution_forward.hpp
|
||||
- library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_c_shuffle_nhwc_kyxc_nhwk_f16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_bf16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f32_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/conv2d_fwd/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_int8_instance.cpp
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
Device Operation: DeviceGroupedConvFwdDlMultipleD_NHWC_KYXC_NHWK
|
||||
Total Template Instantiations: 6
|
||||
Number of Files: 1
|
||||
|
||||
Files containing this device operation:
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_dl_instance.hpp
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
Device Operation: DeviceGroupedConvFwdMultipleABD
|
||||
Total Template Instantiations: 975
|
||||
Number of Files: 195
|
||||
|
||||
Files containing this device operation:
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_bilinear_instance.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_binary_outelementop_instance.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_comp_instance.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_dynamic_op_instance.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_instance.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_mem_instance.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_merged_groups_instance.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_outelementop_instance.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_scale_instance.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_scaleadd_ab_instance.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_scaleadd_scaleadd_relu_instance.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_bias_bnorm_clamp.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_bias_clamp.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_bilinear.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_clamp.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_convinvscale.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_convscale.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_convscale_add.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_convscale_relu.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_dynamic_op.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_scale.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_scaleadd_ab.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_scaleadd_scaleadd_relu.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/quantization/grouped_convolution_bias_forward_perchannel_quantization.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/quantization/grouped_convolution_bias_forward_perlayer_quantization.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/quantization/grouped_convolution_forward_perchannel_quantization.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/quantization/grouped_convolution_forward_perlayer_quantization.hpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv1d_fwd/xdl/device_grouped_conv1d_fwd_xdl_gnwc_gkxc_gnwk_bf16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv1d_fwd/xdl/device_grouped_conv1d_fwd_xdl_gnwc_gkxc_gnwk_f16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv1d_fwd/xdl/device_grouped_conv1d_fwd_xdl_gnwc_gkxc_gnwk_f32_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv1d_fwd/xdl/device_grouped_conv1d_fwd_xdl_gnwc_gkxc_gnwk_int8_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/dl/device_grouped_conv2d_fwd_dl_gnhwc_gkyxc_gnhwk_f16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/dl/device_grouped_conv2d_fwd_dl_gnhwc_gkyxc_gnhwk_f32_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/dl/device_grouped_conv2d_fwd_dl_nhwgc_gkyxc_nhwgk_f16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/dl/device_grouped_conv2d_fwd_dl_nhwgc_gkyxc_nhwgk_f32_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_gnhwc_gkyxc_gnhwk_f16_1x1p0_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_gnhwc_gkyxc_gnhwk_f16_1x1s1p0_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_gnhwc_gkyxc_gnhwk_f16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_gnhwc_gkyxc_gnhwk_f16_oddc_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_gnhwc_gkyxc_gnhwk_i8_1x1p0_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_gnhwc_gkyxc_gnhwk_i8_1x1s1p0_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_gnhwc_gkyxc_gnhwk_i8_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_gnhwc_gkyxc_gnhwk_i8_oddc_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_nhwgc_gkyxc_nhwgk_f16_1x1p0_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_nhwgc_gkyxc_nhwgk_f16_1x1s1p0_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_nhwgc_gkyxc_nhwgk_f16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_nhwgc_gkyxc_nhwgk_f16_oddc_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_nhwgc_gkyxc_nhwgk_i8_1x1p0_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_nhwgc_gkyxc_nhwgk_i8_1x1s1p0_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_nhwgc_gkyxc_nhwgk_i8_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/wmma/device_grouped_conv2d_fwd_wmma_nhwgc_gkyxc_nhwgk_i8_oddc_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_bf16_comp_2x_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_bf16_comp_instance.in
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_bf16_comp_part2_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f16_comp_2x_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f16_comp_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f16_comp_part2_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f32_comp_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_bf16_comp_2x_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_bf16_comp_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_bf16_comp_part2_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f16_comp_2x_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f16_comp_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f16_comp_part2_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_comp_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_tf32_comp_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_int8_comp_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_gnhwc_gkyxc_gnhwk_bf16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_gnhwc_gkyxc_gnhwk_f16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_gnhwc_gkyxc_gnhwk_f32_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_bf16_16x16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_bf16_instance.in
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f16_16x16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f16_instance.in
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f32_16x16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f32_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_ngchw_gkyxc_ngkhw_bf16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_ngchw_gkyxc_ngkhw_f16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_ngchw_gkyxc_ngkhw_f32_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_ngchw_gkyxc_ngkhw_int8_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_bf16_16x16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_bf16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f16_16x16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_16x16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_tf32_16x16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_tf32_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_int8_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/large_tensor/device_grouped_conv2d_fwd_xdl_large_tensor_nhwgc_gkyxc_nhwgk_bf16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/large_tensor/device_grouped_conv2d_fwd_xdl_large_tensor_nhwgc_gkyxc_nhwgk_f16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/large_tensor/device_grouped_conv2d_fwd_xdl_large_tensor_nhwgc_gkyxc_nhwgk_f32_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/large_tensor/device_grouped_conv2d_fwd_xdl_large_tensor_nhwgc_gkyxc_nhwgk_f32_tf32_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/large_tensor/device_grouped_conv2d_fwd_xdl_large_tensor_nhwgc_gkyxc_nhwgk_int8_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_bf16_mem_inter_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_bf16_mem_intra_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f16_mem_inter_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f16_mem_intra_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f32_mem_inter_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f32_mem_intra_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_bf16_mem_inter_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_bf16_mem_intra_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f16_mem_inter_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f16_mem_intra_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_mem_inter_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_mem_intra_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_tf32_mem_inter_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_tf32_mem_intra_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_int8_mem_inter_instance.in
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/mem/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_int8_mem_intra_instance.in
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/merged_groups/device_grouped_conv2d_fwd_xdl_merged_groups_ngchw_gkcyx_ngkhw_bf16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/merged_groups/device_grouped_conv2d_fwd_xdl_merged_groups_ngchw_gkcyx_ngkhw_f16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/merged_groups/device_grouped_conv2d_fwd_xdl_merged_groups_ngchw_gkcyx_ngkhw_f32_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/merged_groups/device_grouped_conv2d_fwd_xdl_merged_groups_nhwgc_gkyxc_nhwgk_bf16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/merged_groups/device_grouped_conv2d_fwd_xdl_merged_groups_nhwgc_gkyxc_nhwgk_f16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/merged_groups/device_grouped_conv2d_fwd_xdl_merged_groups_nhwgc_gkyxc_nhwgk_f32_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/merged_groups/device_grouped_conv2d_fwd_xdl_merged_groups_nhwgc_gkyxc_nhwgk_f32_tf32_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/merged_groups/device_grouped_conv2d_fwd_xdl_merged_groups_nhwgc_gkyxc_nhwgk_int8_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_gndhwc_gkzyxc_gndhwk_f16_1x1p0_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_gndhwc_gkzyxc_gndhwk_f16_1x1s1p0_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_gndhwc_gkzyxc_gndhwk_f16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_gndhwc_gkzyxc_gndhwk_f16_oddc_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_gndhwc_gkzyxc_gndhwk_i8_1x1p0_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_gndhwc_gkzyxc_gndhwk_i8_1x1s1p0_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_gndhwc_gkzyxc_gndhwk_i8_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_gndhwc_gkzyxc_gndhwk_i8_oddc_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_ndhwgc_gkzyxc_ndhwgk_f16_1x1p0_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_ndhwgc_gkzyxc_ndhwgk_f16_1x1s1p0_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_ndhwgc_gkzyxc_ndhwgk_f16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_ndhwgc_gkzyxc_ndhwgk_f16_oddc_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_ndhwgc_gkzyxc_ndhwgk_i8_1x1p0_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_ndhwgc_gkzyxc_ndhwgk_i8_1x1s1p0_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_ndhwgc_gkzyxc_ndhwgk_i8_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/wmma/device_grouped_conv3d_fwd_wmma_ndhwgc_gkzyxc_ndhwgk_i8_oddc_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_bf16_comp_instance.in
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f16_comp_instance.in
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f32_comp_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f32_tf32_comp_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_bf16_comp_2x_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_bf16_comp_instance.in
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_bf16_comp_part2_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f16_comp_2x_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f16_comp_instance.in
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f16_comp_part2_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/comp/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f32_comp_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_gndhwc_gkzyxc_gndhwk_bf16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_gndhwc_gkzyxc_gndhwk_f16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_gndhwc_gkzyxc_gndhwk_f32_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_gndhwc_gkzyxc_gndhwk_int8_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_bf16_16x16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_bf16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_bf8_fp8_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_bf8_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f16_16x16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f16_comp_fp8_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f32_16x16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f32_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f32_tf32_16x16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f32_tf32_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_fp8_bf8_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_fp8_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_int8_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_bf16_16x16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_bf16_instance.in
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f16_16x16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f16_instance.in
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f32_16x16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f32_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/large_tensor/device_grouped_conv3d_fwd_xdl_large_tensor_ndhwgc_gkzyxc_ndhwgk_bf16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/large_tensor/device_grouped_conv3d_fwd_xdl_large_tensor_ndhwgc_gkzyxc_ndhwgk_f16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/large_tensor/device_grouped_conv3d_fwd_xdl_large_tensor_ndhwgc_gkzyxc_ndhwgk_f32_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/large_tensor/device_grouped_conv3d_fwd_xdl_large_tensor_ndhwgc_gkzyxc_ndhwgk_f32_tf32_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_bf16_mem_inter_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_bf16_mem_intra_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f16_mem_inter_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f16_mem_intra_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f32_mem_inter_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f32_mem_intra_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f32_tf32_mem_inter_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ndhwgc_gkzyxc_ndhwgk_f32_tf32_mem_intra_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_bf16_mem_inter_instance.in
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_bf16_mem_intra_instance.in
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f16_mem_inter_instance.in
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f16_mem_intra_instance.in
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f32_mem_inter_instance.in
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/mem/device_grouped_conv3d_fwd_xdl_ngcdhw_gkczyx_ngkdhw_f32_mem_intra_instance.in
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/merged_groups/device_grouped_conv3d_fwd_xdl_merged_groups_ndhwgc_gkzyxc_ndhwgk_bf16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/merged_groups/device_grouped_conv3d_fwd_xdl_merged_groups_ndhwgc_gkzyxc_ndhwgk_f16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/merged_groups/device_grouped_conv3d_fwd_xdl_merged_groups_ndhwgc_gkzyxc_ndhwgk_f32_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/merged_groups/device_grouped_conv3d_fwd_xdl_merged_groups_ndhwgc_gkzyxc_ndhwgk_f32_tf32_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/merged_groups/device_grouped_conv3d_fwd_xdl_merged_groups_ngcdhw_gkczyx_ngkdhw_bf16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/merged_groups/device_grouped_conv3d_fwd_xdl_merged_groups_ngcdhw_gkczyx_ngkdhw_f16_instance.cpp
|
||||
- library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/xdl/merged_groups/device_grouped_conv3d_fwd_xdl_merged_groups_ngcdhw_gkczyx_ngkdhw_f32_instance.cpp
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
Device Operation: DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle
|
||||
Total Template Instantiations: 711
|
||||
Number of Files: 11
|
||||
|
||||
Files containing this device operation:
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_bilinear_instance.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_binary_outelementop_instance.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_comp_instance.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_dynamic_op_instance.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_instance.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_mem_instance.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_merged_groups_instance.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_outelementop_instance.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_scale_instance.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_scaleadd_ab_instance.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_scaleadd_scaleadd_relu_instance.hpp
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
Device Operation: DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3
|
||||
Total Template Instantiations: 141
|
||||
Number of Files: 2
|
||||
|
||||
Files containing this device operation:
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_comp_instance.hpp
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_mem_instance.hpp
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
Device Operation: DeviceGroupedConvFwdMultipleD_Wmma_CShuffle
|
||||
Total Template Instantiations: 42
|
||||
Number of Files: 1
|
||||
|
||||
Files containing this device operation:
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_wmma_instance.hpp
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
Device Operation: DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor
|
||||
Total Template Instantiations: 12
|
||||
Number of Files: 1
|
||||
|
||||
Files containing this device operation:
|
||||
- library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_large_tensor_instance.hpp
|
||||
Reference in New Issue
Block a user