Test comprehensive dataset (#2685)

* Add CSV-driven convolution test pipeline - Add test_grouped_convnd_fwd_dataset_xdl.cpp with CSV reader functionality - Add complete dataset generation toolchain in test_data/ - Add Jenkins integration with RUN_CONV_COMPREHENSIVE_DATASET parameter - Ready for comprehensive convolution testing with scalable datasets * Update convolution test dataset generation pipeline * add 2d, 3d dataset csv files * Remove CSV test dataset files from repository * Update generate_test_dataset.sh * Fix channel division for MIOpen to CK conversion * Remove unnecessary test files * Fix clang-format-18 formatting issues * TEST: Enable comprehensive dataset tests by default * Fix test_data path in Jenkins - build runs from build directory * Add Python dependencies and debug output for CSV generation * Remove Python package installation - not needed * Add better debugging for generate_test_dataset.sh execution * Fix Jenkinsfile syntax error - escape dollar signs * Add PyTorch to Docker image for convolution test dataset generation - Install PyTorch CPU version for lightweight model execution - Fixes Jenkins CI failures where CSV files were empty due to missing PyTorch - Model generation scripts require PyTorch to extract convolution parameters * Add debugging to understand Jenkins directory structure and CSV file status - Print current working directory - List CSV files in test_data directory - Show line counts of CSV files - Will help diagnose why tests fail in Jenkins * Fix clang-format-18 formatting issues - Applied clang-format-18 to test file - Fixed brace placement and whitespace issues * Add detailed debugging for CSV dataset investigation - Check generated_datasets directory contents - List all CSV files with line counts - Show first 5 lines of main CSV file - Applied clang-format-18 formatting - This will help identify why CSV files are empty in Jenkins * keep testing add pytorch installation in shell script * Use virtual environment for PyTorch installation - Jenkins user doesn't have permission to write to /.local - Create virtual environment in current directory (./pytorch_venv) - Install PyTorch in virtual environment to avoid permission issues - Use PYTHON_CMD variable to run all Python scripts with correct interpreter - Virtual environment will be reused if it already exists * Remove debug code and reduce verbose logging in Jenkins - Remove bash -x and debug commands from Jenkinsfile execute_args - Remove all debug system() calls and getcwd from C++ test file - Remove unistd.h include that was only needed for getcwd - Remove debug print in CSV parser - Add set +x to generate_test_dataset.sh to disable command echo - Redirect Python script stdout to /dev/null for cleaner output This makes Jenkins logs much cleaner while still showing progress messages. * install gpu torch * Clean up and optimize comprehensive dataset test pipeline - Reorder Jenkinsfile execution: build -> generate data -> run test - Remove commented-out debug code from generate_test_dataset.sh - Ensure all files end with proper newline character (POSIX compliance) - Keep useful status messages while removing development debug prints - Set MAX_ITERATIONS=0 for unlimited test generation in production * Add configuration modes to reduce test execution time - Add --mode option (half/full) to generate_model_configs.py - half mode (default): ~278 configs (224 2D + 54 3D) -> ~1,058 total tests - full mode: ~807 configs (672 2D + 135 3D) -> ~3,093 total tests - Update generate_test_dataset.sh to use CONFIG_MODE environment variable - Keeps all model types but reduces parameter combinations intelligently - Fixes Jenkins timeout issue (was running 3,669 tests taking 17+ hours) - Default half mode should complete in ~4-5 hours instead of 17+ hours * Add small mode for quick testing of comprehensive dataset * jenkins pipeline test done * jenkins test done * Trigger CI build * remove test comment and update data generation option as half --------- Co-authored-by: Bartłomiej Kocot <barkocot@amd.com> [ROCm/composable_kernel commit: 19d5327c45]
2026-07-19 02:01:01 +00:00 · 2025-08-26 23:18:05 +03:00
parent c8047ebb8b
commit 7d6f0107bd
5 changed files with 186 additions and 65 deletions
--- a/11
+++ b/11
@@ -1159,11 +1159,16 @@ pipeline {
                    agent{ label rocmnode("gfx90a")}
                    environment{
                        setup_args = "NO_CK_BUILD"
-                        execute_args = """ cd test_data && \
-                                           ./generate_test_dataset.sh && \
-                                           cd ../script && \
+                        execute_args = """ cd ../build && \
                                           ../script/cmake-ck-dev.sh  ../ gfx90a && \
                                           make -j64 test_grouped_convnd_fwd_dataset_xdl && \
+                                           cd ../test_data && \
+                                           # Dataset generation modes:
+                                           # - small: ~60 test cases (minimal, quick testing - 3 models, 2 batch sizes, 2 image sizes)
+                                           # - half: ~300 test cases (moderate coverage - 16 models, 3 batch sizes, 5 image sizes), ~ 17 hours testing time
+                                           # - full: ~600 test cases (comprehensive - 16 models, 5 batch sizes, 9 image sizes), ~ 40 hours testing time
+                                           ./generate_test_dataset.sh half && \
+                                           cd ../build && \
                                           ./bin/test_grouped_convnd_fwd_dataset_xdl"""
                    }
                    steps{
--- a/test/grouped_convnd_fwd/test_grouped_convnd_fwd_dataset_xdl.cpp
+++ b/test/grouped_convnd_fwd/test_grouped_convnd_fwd_dataset_xdl.cpp
@@ -32,7 +32,6 @@ std::vector<ck::utils::conv::ConvParam> load_csv_test_cases(const std::string& f
    while(std::getline(file, line))
    {
        line_number++;
-        std::cout << "Line " << line_number << ": " << line << std::endl;
        // Skip comment lines (starting with #) and empty lines
        if(line.empty() || line[0] == '#')
        {
--- a/test_data/generate_model_configs.py
+++ b/test_data/generate_model_configs.py
@@ -10,8 +10,12 @@ import csv
 import itertools
 import argparse

-def generate_2d_configs():
-    """Generate all 2D model configuration combinations"""
+def generate_2d_configs(mode='full'):
+    """Generate all 2D model configuration combinations
+    
+    Args:
+        mode: 'small' for minimal set (~50 configs), 'half' for reduced set (~250 configs), 'full' for comprehensive set (~500 configs)
+    """
    
    # Define parameter ranges
    models_2d = [
@@ -24,15 +28,37 @@ def generate_2d_configs():
        'shufflenet_v2_x1_0'
    ]
    
-    batch_sizes = [1, 4, 8, 16, 32]
-    
-    # Input dimensions: (height, width)
-    input_dims = [
-        (64, 64), (128, 128), (224, 224), (256, 256), (512, 512),  # Square
-        (224, 320), (224, 448), (320, 224), (448, 224),            # Rectangular
-        (227, 227),  # AlexNet preferred
-        (299, 299)   # Inception preferred
-    ]
+    if mode == 'small':
+        # Minimal set for quick testing
+        batch_sizes = [1, 8]  # Just two batch sizes
+        # Very limited input dimensions - only 2 key sizes
+        input_dims = [
+            (224, 224),  # Standard (most common)
+            (256, 256),  # Medium
+        ]
+        # Use only first 3 models for minimal testing
+        models_2d = models_2d[:3]  # Only resnet18, resnet34, resnet50
+    elif mode == 'half':
+        # Reduced set for faster testing
+        batch_sizes = [1, 8, 32]  # Small, medium, large
+        # Reduced input dimensions - 5 key sizes
+        input_dims = [
+            (64, 64),    # Small
+            (224, 224),  # Standard (most common)
+            (512, 512),  # Large
+            (224, 320),  # Rectangular
+            (227, 227),  # AlexNet preferred
+        ]
+    else:  # full mode
+        # More comprehensive but still limited
+        batch_sizes = [1, 4, 8, 16, 32]
+        # More dimensions but skip some redundant ones
+        input_dims = [
+            (64, 64), (128, 128), (224, 224), (256, 256), (512, 512),  # Square
+            (224, 320), (320, 224),  # Rectangular (reduced from 4)
+            (227, 227),  # AlexNet preferred
+            (299, 299)   # Inception preferred
+        ]
    
    precisions = ['fp32'] #, 'fp16', 'bf16']
    channels = [3]  # Most models expect RGB
@@ -68,19 +94,44 @@ def generate_2d_configs():
    
    return configs

-def generate_3d_configs():
-    """Generate all 3D model configuration combinations"""
+def generate_3d_configs(mode='full'):
+    """Generate all 3D model configuration combinations
+    
+    Args:
+        mode: 'small' for minimal set (~10 configs), 'half' for reduced set (~50 configs), 'full' for comprehensive set (~100 configs)
+    """
    
    models_3d = ['r3d_18', 'mc3_18', 'r2plus1d_18']
    
-    batch_sizes = [1, 2, 4, 8]  # 3D models are more memory intensive
-    temporal_sizes = [8, 16, 32]
-    
-    # 3D input dimensions: (height, width) 
-    input_dims = [
-        (112, 112), (224, 224), (256, 256),  # Standard sizes
-        (224, 320), (320, 224)               # Rectangular
-    ]
+    if mode == 'small':
+        # Minimal set for quick testing
+        batch_sizes = [1, 4]  # Just two batch sizes
+        temporal_sizes = [8]  # Only smallest temporal size
+        # Very limited spatial dimensions
+        input_dims = [
+            (112, 112),  # Standard for 3D
+        ]
+        # Use only first model for minimal testing
+        models_3d = models_3d[:1]  # Only r3d_18
+    elif mode == 'half':
+        # Reduced set for faster testing
+        batch_sizes = [1, 4, 8]  # Skip batch_size=2
+        temporal_sizes = [8, 16]  # Skip 32 (most expensive)
+        # Reduced spatial dimensions
+        input_dims = [
+            (112, 112),  # Small (common for video)
+            (224, 224),  # Standard
+            (224, 320)   # Rectangular
+        ]
+    else:  # full mode
+        # More comprehensive but still reasonable
+        batch_sizes = [1, 2, 4, 8]  # 3D models are more memory intensive
+        temporal_sizes = [8, 16, 32]
+        # More dimensions
+        input_dims = [
+            (112, 112), (224, 224), (256, 256),  # Standard sizes
+            (224, 320), (320, 224)               # Rectangular
+        ]
    
    precisions = ['fp32'] #, 'fp16']  # Skip bf16 for 3D to reduce combinations
    channels = [3]
@@ -142,19 +193,23 @@ def main():
                       help='Output file for 2D configurations')
    parser.add_argument('--output-3d', type=str, default='model_configs_3d.csv', 
                       help='Output file for 3D configurations')
+    parser.add_argument('--mode', choices=['small', 'half', 'full'], default='full',
+                       help='Configuration mode: small (~60 total), half (~300 total) or full (~600 total) (default: half)')
    parser.add_argument('--limit', type=int, 
                       help='Limit number of configurations per type (for testing)')
    
    args = parser.parse_args()
    
+    print(f"Generating {args.mode} model configurations...")
+    
    print("Generating 2D model configurations...")
-    configs_2d = generate_2d_configs()
+    configs_2d = generate_2d_configs(mode=args.mode)
    if args.limit:
        configs_2d = configs_2d[:args.limit]
    save_configs_to_csv(configs_2d, args.output_2d, "2D")
    
    print("Generating 3D model configurations...")
-    configs_3d = generate_3d_configs()
+    configs_3d = generate_3d_configs(mode=args.mode)
    if args.limit:
        configs_3d = configs_3d[:args.limit]
    save_configs_to_csv(configs_3d, args.output_3d, "3D")
@@ -164,4 +219,4 @@ def main():
    print("  Update generate_test_dataset.sh to read from these CSV files")

 if __name__ == "__main__":
-    main()
+    main()
--- a/test_data/generate_test_dataset.sh
+++ b/test_data/generate_test_dataset.sh
@@ -3,26 +3,71 @@
 # This script captures MIOpen commands from PyTorch models and generates test cases

 set -e  # Exit on error
-
-# Check if target files already exist
-# if [ -f "conv_test_set_2d_dataset.csv" ] && [ -f "conv_test_set_3d_dataset.csv" ]; then
-#     echo "Target files already exist:"
-#     [ -f "conv_test_set_2d_dataset.csv" ] && echo "  - conv_test_set_2d_dataset.csv ($(wc -l < conv_test_set_2d_dataset.csv) lines)"
-#     [ -f "conv_test_set_3d_dataset.csv" ] && echo "  - conv_test_set_3d_dataset.csv ($(wc -l < conv_test_set_3d_dataset.csv) lines)"
-#     echo ""
-#     echo "To regenerate, please remove these files first:"
-#     echo "  rm conv_test_set_2d_dataset.csv conv_test_set_3d_dataset.csv"
-#     exit 0
-# fi
+set +x  # Disable command echo (even if called with bash -x)

 echo "=========================================="
 echo "CK Convolution Test Dataset Generator"
 echo "=========================================="

+# Check if PyTorch is installed, if not create a virtual environment
+echo "Checking for PyTorch installation..."
+if ! python3 -c "import torch" 2>/dev/null; then
+    echo "PyTorch not found. Creating virtual environment..."
+    
+    # Create a virtual environment in the current directory
+    VENV_DIR="./pytorch_venv"
+    if [ ! -d "$VENV_DIR" ]; then
+        python3 -m venv $VENV_DIR || {
+            echo "ERROR: Failed to create virtual environment."
+            echo "Creating empty CSV files as fallback..."
+            echo "# 2D Convolution Test Cases" > conv_test_set_2d_dataset.csv
+            echo "# Combined from multiple models" >> conv_test_set_2d_dataset.csv
+            echo "# 3D Convolution Test Cases" > conv_test_set_3d_dataset.csv
+            echo "# Combined from multiple models" >> conv_test_set_3d_dataset.csv
+            exit 1
+        }
+    fi
+    
+    # Activate virtual environment
+    source $VENV_DIR/bin/activate
+    
+    # Install PyTorch in virtual environment with ROCm support
+    echo "Installing PyTorch and torchvision with ROCm support in virtual environment..."
+    # Since we're in a ROCm 6.4.1 environment, we need compatible PyTorch
+    # PyTorch doesn't have 6.4 wheels yet, so we use 6.2 which should be compatible
+    echo "Installing PyTorch with ROCm 6.2 support (compatible with ROCm 6.4)..."
+    pip install torch==2.5.1 torchvision==0.20.1 --index-url https://download.pytorch.org/whl/rocm6.2 || {
+        echo "ERROR: Failed to install PyTorch with ROCm support."
+        echo "Creating empty CSV files as fallback..."
+        echo "# 2D Convolution Test Cases" > conv_test_set_2d_dataset.csv
+        echo "# Combined from multiple models" >> conv_test_set_2d_dataset.csv
+        echo "# 3D Convolution Test Cases" > conv_test_set_3d_dataset.csv
+        echo "# Combined from multiple models" >> conv_test_set_3d_dataset.csv
+        exit 1
+    }
+    echo "PyTorch installed successfully in virtual environment!"
+    
+    # Use the virtual environment's Python for the rest of the script
+    export PYTHON_CMD="$VENV_DIR/bin/python3"
+else
+    echo "PyTorch is already installed."
+    export PYTHON_CMD="python3"
+fi
+
+# Verify PyTorch installation and GPU support
+$PYTHON_CMD -c "import torch; print(f'PyTorch version: {torch.__version__}')"
+$PYTHON_CMD -c "import torch; print(f'CUDA/ROCm available: {torch.cuda.is_available()}')"
+if ! $PYTHON_CMD -c "import torch; import sys; sys.exit(0 if torch.cuda.is_available() else 1)"; then
+    echo "WARNING: PyTorch installed but GPU support not available!"
+    echo "MIOpen commands will not be generated without GPU support."
+    echo "Continuing anyway to generate placeholder data..."
+fi
+
 # Configuration
 OUTPUT_DIR="generated_datasets"
 TIMESTAMP=$(date +"%Y%m%d_%H%M%S")
-MAX_ITERATIONS=0  # Maximum number of iterations per model type (set to 0 for unlimited)
+# Get configuration mode from command line argument (default: full)
+CONFIG_MODE="${1:-full}"  # Configuration mode: 'small', 'half' or 'full'

 # Colors
 RED='\033[0;31m'
@@ -42,8 +87,9 @@ echo "Step 1: Generating model configurations"
 echo "-----------------------------------------"

 # Generate model configuration files (with limit for testing)
-echo "Generating model configuration files..."
-python3 generate_model_configs.py \
+echo "Generating model configuration files (mode: $CONFIG_MODE)..."
+$PYTHON_CMD generate_model_configs.py \
+    --mode $CONFIG_MODE \
    --output-2d $OUTPUT_DIR/model_configs_2d.csv \
    --output-3d $OUTPUT_DIR/model_configs_3d.csv 

@@ -55,10 +101,26 @@ fi

 # Check if running on GPU
 if ! command -v rocm-smi &> /dev/null; then
-    echo "WARNING: ROCm not detected. Models will run on CPU (no MIOpen commands)."
-    echo "For actual MIOpen commands, run this on a system with AMD GPU."
+    echo "ERROR: ROCm not detected. Cannot generate MIOpen commands without GPU."
+    echo "This script requires an AMD GPU with ROCm installed."
+    echo "Creating empty CSV files as placeholder..."
+    echo "# 2D Convolution Test Cases (No GPU available)" > conv_test_set_2d_dataset.csv
+    echo "# 3D Convolution Test Cases (No GPU available)" > conv_test_set_3d_dataset.csv
+    exit 1
 fi

+# Check if GPU is actually accessible
+if ! rocm-smi &> /dev/null; then
+    echo "ERROR: rocm-smi failed. GPU may not be accessible."
+    echo "Creating empty CSV files as placeholder..."
+    echo "# 2D Convolution Test Cases (GPU not accessible)" > conv_test_set_2d_dataset.csv
+    echo "# 3D Convolution Test Cases (GPU not accessible)" > conv_test_set_3d_dataset.csv
+    exit 1
+fi
+
+echo "GPU detected. ROCm version:"
+rocm-smi --showdriverversion || true
+

 echo ""
 echo "Step 2: Running 2D/3D models and capturing MIOpen commands"
@@ -85,22 +147,17 @@ while IFS=',' read -r config_name model batch_size channels height width precisi
    # Increment counter
    CURRENT_CONFIG=$((CURRENT_CONFIG + 1))
    
-    # Stop after MAX_ITERATIONS if set
-    if [ $MAX_ITERATIONS -gt 0 ] && [ $CURRENT_CONFIG -gt $MAX_ITERATIONS ]; then
-        echo -e "${RED}Stopping after $MAX_ITERATIONS iterations (testing mode)${NC}"
-        break
-    fi
    
    # Build configuration command
    CONFIG="--model $model --batch-size $batch_size --channels $channels --height $height --width $width --precision $precision"
    CONFIG_NAME="$config_name"
    
-    echo -e "${GREEN}[${CURRENT_CONFIG}/${TOTAL_CONFIGS}]${NC} ${PURPLE}Running MIOpenDriver${NC} ${CYAN}2D${NC} ${YELLOW}$CONFIG_NAME${NC}: ${BLUE}$CONFIG${NC}"
+    echo -e "${GREEN}[${CURRENT_CONFIG}/${TOTAL_CONFIGS}]${NC} ${CYAN}2D${NC} ${YELLOW}$CONFIG_NAME${NC}"
    
-    # Actual run with logging
-    MIOPEN_ENABLE_LOGGING_CMD=1 python3 run_model_with_miopen.py \
+    # Actual run with logging (suppress stdout, only capture stderr with MIOpen commands)
+    MIOPEN_ENABLE_LOGGING_CMD=1 $PYTHON_CMD run_model_with_miopen.py \
        --model $model --batch-size $batch_size --channels $channels --height $height --width $width --precision $precision \
-        2>> $OUTPUT_DIR/${model}_miopen_log_2d.txt || true 
+        > /dev/null 2>> $OUTPUT_DIR/${model}_miopen_log_2d.txt || true 


 done < $OUTPUT_DIR/model_configs_2d.csv
@@ -125,23 +182,18 @@ while IFS=',' read -r config_name model batch_size channels temporal_size height
    # Increment counter
    CURRENT_3D_CONFIG=$((CURRENT_3D_CONFIG + 1))
    
-    # Stop after MAX_ITERATIONS if set
-    if [ $MAX_ITERATIONS -gt 0 ] && [ $CURRENT_3D_CONFIG -gt $MAX_ITERATIONS ]; then
-        echo -e "${RED}Stopping after $MAX_ITERATIONS iterations (testing mode)${NC}"
-        break
-    fi

    # Build configuration command for 3D models
    CONFIG="--model $model --batch-size $batch_size --channels $channels --temporal-size $temporal_size --height $height --width $width --precision $precision"
    CONFIG_NAME="$config_name"
    
-    echo -e "${GREEN}[${CURRENT_3D_CONFIG}/${TOTAL_3D_CONFIGS}]${NC} ${PURPLE}Running MIOpenDriver${NC} ${CYAN}3D${NC} ${YELLOW}$CONFIG_NAME${NC}: ${BLUE}$CONFIG${NC}"
+    echo -e "${GREEN}[${CURRENT_3D_CONFIG}/${TOTAL_3D_CONFIGS}]${NC} ${CYAN}3D${NC} ${YELLOW}$CONFIG_NAME${NC}"
    
    
-    # Actual run with logging
-    MIOPEN_ENABLE_LOGGING_CMD=1 python3 run_model_with_miopen.py \
+    # Actual run with logging (suppress stdout, only capture stderr with MIOpen commands)
+    MIOPEN_ENABLE_LOGGING_CMD=1 $PYTHON_CMD run_model_with_miopen.py \
        --model $model --batch-size $batch_size --channels $channels --temporal-size $temporal_size --height $height --width $width --precision $precision \
-        2>> $OUTPUT_DIR/${model}_miopen_log_3d.txt || true
+        > /dev/null 2>> $OUTPUT_DIR/${model}_miopen_log_3d.txt || true

 done < $OUTPUT_DIR/model_configs_3d.csv

@@ -159,7 +211,7 @@ for log_file in $OUTPUT_DIR/*_miopen_log_2d.txt; do
        output_csv="$OUTPUT_DIR/${base_name}_cases_2d.csv"
        
        echo "  Converting $log_file -> $output_csv"
-        python3 miopen_to_csv.py \
+        $PYTHON_CMD miopen_to_csv.py \
            --input "$log_file" \
            --output-2d "$output_csv" \
            --model-name "$base_name" \
@@ -176,7 +228,7 @@ for log_file in $OUTPUT_DIR/*_miopen_log_3d.txt; do
        output_csv="$OUTPUT_DIR/${base_name}_cases_3d.csv"
        
        echo "  Converting $log_file -> $output_csv"
-        python3 miopen_to_csv.py \
+        $PYTHON_CMD miopen_to_csv.py \
            --input "$log_file" \
            --output-3d "$output_csv" \
            --model-name "$base_name" \
@@ -259,4 +311,4 @@ echo ""
 echo "To use these datasets:"
 echo "  1. Build the test: cd ../script && make -j64 test_grouped_convnd_fwd_dataset_xdl"
 echo "  2. Run the test: ./bin/test_grouped_convnd_fwd_dataset_xdl"
-echo ""
+echo ""
--- a/test_data/run_model_with_miopen.py
+++ b/test_data/run_model_with_miopen.py
@@ -87,6 +87,16 @@ def main():
    else:
        device = torch.device(args.device)
    
+    # Check if actually running on GPU
+    if device.type == 'cpu':
+        import sys
+        print(f"WARNING: Running on CPU, MIOpen commands will not be generated!", file=sys.stderr)
+        print(f"CUDA/ROCm available: {torch.cuda.is_available()}", file=sys.stderr)
+        if torch.cuda.is_available():
+            print(f"GPU device count: {torch.cuda.device_count()}", file=sys.stderr)
+            print(f"GPU name: {torch.cuda.get_device_name(0) if torch.cuda.device_count() > 0 else 'N/A'}", file=sys.stderr)
+        # Continue anyway for testing purposes
+    
    if not args.quiet:
        print(f"Using device: {device}")