Files
composable_kernel/script/gemm_profile.sh
Aviral Goel 1441a0a7ee Integration of a new pipeline for weight preshuffle into gemm examples (#2516)
* something khushbu can help with

* v1 v2 works with flatmm develop

* v0 v1 v2 numerical error gone

* Fixing numerical error, and interchange preshuffle configs to match with flatmm

* Refactor GEMM pipeline configurations and integrate preshuffle support

- Updated preshuffle pipeline definitions to include multiple versions (V1, V2, V3).
- Changed the pipeline constant from CK_TILE_PIPELINE_PRESHUFFLE to CK_TILE_PIPELINE_PRESHUFFLE_V3 in relevant configurations.
- Removed obsolete code and comments

* clang format

* fix vectorloadsize bug

* add the Preshuffle3

* update kwarp calculation in gemm utils

* update vector size A and B correctly in V2 pipeline; Added few more changes to align with dteng's branch

* fix: add CK_GFX950_SUPPORT macro for gfx950 detection

* default disable rotating buffer

* docs(CHANGELOG): update changelog for rocm 7.0

* Revert "docs(CHANGELOG): update changelog for rocm 7.0"

This reverts commit 2bc16fff84.

* Remove unused Preshuffle V3 pipeline and related code; update gemm function to use Preshuffle V2; clean up comments and formatting in various files.

* revert example/ck_tile/flatmm to its original state

* remove comment added by second author

* switch to xor ALDSDescriptor

* modify the MakeALdsDescriptor()

* temporary profiling script

* getting rid of line marker compiler error

* UniversalWeightPreshufflePipelineAgBgCrPolicy now derives from UniversalGemmBasePolicy

* add a minor fix for the config

* typo fix

* Fix formatting in lambda function for WeightPreshufflePipelineAGmemBGmemCRegV2

* revert change in include/ck_tile/ops/flatmm/pipeline/flatmm_pipeline_agmem_bgmem_creg_v1.hpp

* revert change in include/ck_tile/core/arch/amd_buffer_addressing.hpp

* reenable the GemmSpatiallyLocalTilePartitioner

* make GemmConfigPreshuffle_1 for v1 pipeline, GemmConfigPreshuffle_2 for v2 pipeline

* remove hardcoded true for preshuffle bool template argument

* rename script

* remove gemm_profilie.sh script

* merge conflict resolve

* clang formatted

* typo fix

* Remove duplicate include of block_gemm_areg_bsmem_creg_v2r1.hpp in gemm.hpp

* Remove commented-out code in UniversalWeightPreshufflePipelineAgBgCrPolicy

* Fix missing newline at end of file in run_gemm_example.inc

* Remove unused barrier call in BlockWeightPreshuffleASmemBSmemCRegV1

* addressing review comments

* removing debug code

* addressing review comments

* Revert "addressing review comments"

This reverts commit 29c45192ba.

* updating tile_engine code

* addressing review comments

---------

Co-authored-by: amd-khushbu <khuagarw@amd.com>
Co-authored-by: ThomasNing <thomas.ning@amd.com>
2025-08-01 00:04:54 -07:00

107 lines
2.9 KiB
Bash
Executable File

#!/bin/bash
BIN=./bin/tile_example_gemm_weight_preshuffle
PREC=fp8
VERBOSITY=2
# List of all (m, n, k) triplets
ARGS_LIST=(
"1 2048 5120"
"1 5120 1024"
"2 2048 5120"
"2 5120 1024"
"3 2048 5120"
"3 5120 1024"
"4 2048 5120"
"4 5120 1024"
"5 2048 5120"
"5 5120 1024"
"6 2048 5120"
"6 5120 1024"
"7 2048 5120"
"7 5120 1024"
"8 2048 5120"
"8 5120 1024"
"9 2048 5120"
"9 5120 1024"
"10 2048 5120"
"10 5120 1024"
"11 2048 5120"
"11 5120 1024"
"12 2048 5120"
"12 5120 1024"
"13 2048 5120"
"13 5120 1024"
"14 2048 5120"
"14 5120 1024"
"15 2048 5120"
"15 5120 1024"
"16 2048 5120"
"16 5120 1024"
"2048 5120 1024"
"2048 5120 8192"
"2048 7168 8192"
"2048 8192 3584"
"16384 7168 8192"
"16384 8192 3584"
)
# Output file
OUTPUT_FILE="gemm_profile_results.csv"
# Output header
echo "m,n,k,Pipeline,Time_ms,TFlops,GBps,Verification" > "$OUTPUT_FILE"
# Loop over each argument set
for args in "${ARGS_LIST[@]}"; do
read -r m n k <<< "$args"
echo "Testing: m=$m, n=$n, k=$k"
OUTPUT=$($BIN -m=$m -n=$n -k=$k -prec=$PREC -v=$VERBOSITY 2>/dev/null)
# Extract pipeline information
# Format: "Launching kernel with args: gemm_fp8_pipeline_AGmemBGmemCRegV2_128x256x256x256_16x16x128_16x16_0x0x0"
PIPELINE=$(echo "$OUTPUT" | grep "Launching kernel with args:" | sed -n 's/.*Launching kernel with args: \(.*\)/\1/p')
# Extract TFlops and GB/s from the output
# Format: "Run Gemm kernel with M=3840 N=4096 K=2048 ... : 0.042338 ms, 1521.67 TFlops, 1126.89 GB/s,"
PERF_LINE=$(echo "$OUTPUT" | grep "TFlops")
# Extract verification result
# Format: "The GPU verification result is: correct"
VERIFICATION=$(echo "$OUTPUT" | grep "The GPU verification result is:" | sed -n 's/.*The GPU verification result is: \(.*\)/\1/p')
if [ -n "$PERF_LINE" ]; then
# Extract execution time in ms
TIME_MS=$(echo "$PERF_LINE" | grep -o '[0-9]\+\.[0-9]\+ ms' | grep -o '[0-9]\+\.[0-9]\+')
# Extract TFlops value - more robust regex
TFLOPS=$(echo "$PERF_LINE" | grep -o '[0-9]\+\.[0-9]\+ TFlops' | grep -o '[0-9]\+\.[0-9]\+')
# Extract GB/s value - more robust regex
GBPS=$(echo "$PERF_LINE" | grep -o '[0-9]\+\.[0-9]\+ GB/s' | grep -o '[0-9]\+\.[0-9]\+')
# Use extracted pipeline or default if not found
if [ -z "$PIPELINE" ]; then
PIPELINE="gemm_basic"
fi
# Print to terminal
echo " Pipeline: $PIPELINE"
echo " Time: ${TIME_MS} ms"
echo " TFlops: ${TFLOPS}"
echo " GB/s: ${GBPS}"
# Save to CSV file
echo "$m,$n,$k,$PIPELINE,$TIME_MS,$TFLOPS,$GBPS,$VERIFICATION" >> "$OUTPUT_FILE"
else
echo " ERROR: Could not parse performance data"
echo ""
echo "$m,$n,$k,$PIPELINE,,,,$VERIFICATION" >> "$OUTPUT_FILE"
fi
done
echo "=========================================="
echo "Profile completed!"
echo "Results saved to: $OUTPUT_FILE"
echo "Total tests run: ${#ARGS_LIST[@]}"
echo "=========================================="