mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-04-19 22:39:03 +00:00
Integration of a new pipeline for weight preshuffle into gemm examples (#2516)
* something khushbu can help with * v1 v2 works with flatmm develop * v0 v1 v2 numerical error gone * Fixing numerical error, and interchange preshuffle configs to match with flatmm * Refactor GEMM pipeline configurations and integrate preshuffle support - Updated preshuffle pipeline definitions to include multiple versions (V1, V2, V3). - Changed the pipeline constant from CK_TILE_PIPELINE_PRESHUFFLE to CK_TILE_PIPELINE_PRESHUFFLE_V3 in relevant configurations. - Removed obsolete code and comments * clang format * fix vectorloadsize bug * add the Preshuffle3 * update kwarp calculation in gemm utils * update vector size A and B correctly in V2 pipeline; Added few more changes to align with dteng's branch * fix: add CK_GFX950_SUPPORT macro for gfx950 detection * default disable rotating buffer * docs(CHANGELOG): update changelog for rocm 7.0 * Revert "docs(CHANGELOG): update changelog for rocm 7.0" This reverts commit2bc16fff84. * Remove unused Preshuffle V3 pipeline and related code; update gemm function to use Preshuffle V2; clean up comments and formatting in various files. * revert example/ck_tile/flatmm to its original state * remove comment added by second author * switch to xor ALDSDescriptor * modify the MakeALdsDescriptor() * temporary profiling script * getting rid of line marker compiler error * UniversalWeightPreshufflePipelineAgBgCrPolicy now derives from UniversalGemmBasePolicy * add a minor fix for the config * typo fix * Fix formatting in lambda function for WeightPreshufflePipelineAGmemBGmemCRegV2 * revert change in include/ck_tile/ops/flatmm/pipeline/flatmm_pipeline_agmem_bgmem_creg_v1.hpp * revert change in include/ck_tile/core/arch/amd_buffer_addressing.hpp * reenable the GemmSpatiallyLocalTilePartitioner * make GemmConfigPreshuffle_1 for v1 pipeline, GemmConfigPreshuffle_2 for v2 pipeline * remove hardcoded true for preshuffle bool template argument * rename script * remove gemm_profilie.sh script * merge conflict resolve * clang formatted * typo fix * Remove duplicate include of block_gemm_areg_bsmem_creg_v2r1.hpp in gemm.hpp * Remove commented-out code in UniversalWeightPreshufflePipelineAgBgCrPolicy * Fix missing newline at end of file in run_gemm_example.inc * Remove unused barrier call in BlockWeightPreshuffleASmemBSmemCRegV1 * addressing review comments * removing debug code * addressing review comments * Revert "addressing review comments" This reverts commit29c45192ba. * updating tile_engine code * addressing review comments --------- Co-authored-by: amd-khushbu <khuagarw@amd.com> Co-authored-by: ThomasNing <thomas.ning@amd.com>
This commit is contained in:
107
script/gemm_profile.sh
Executable file
107
script/gemm_profile.sh
Executable file
@@ -0,0 +1,107 @@
|
||||
#!/bin/bash
|
||||
|
||||
BIN=./bin/tile_example_gemm_weight_preshuffle
|
||||
PREC=fp8
|
||||
VERBOSITY=2
|
||||
|
||||
# List of all (m, n, k) triplets
|
||||
ARGS_LIST=(
|
||||
"1 2048 5120"
|
||||
"1 5120 1024"
|
||||
"2 2048 5120"
|
||||
"2 5120 1024"
|
||||
"3 2048 5120"
|
||||
"3 5120 1024"
|
||||
"4 2048 5120"
|
||||
"4 5120 1024"
|
||||
"5 2048 5120"
|
||||
"5 5120 1024"
|
||||
"6 2048 5120"
|
||||
"6 5120 1024"
|
||||
"7 2048 5120"
|
||||
"7 5120 1024"
|
||||
"8 2048 5120"
|
||||
"8 5120 1024"
|
||||
"9 2048 5120"
|
||||
"9 5120 1024"
|
||||
"10 2048 5120"
|
||||
"10 5120 1024"
|
||||
"11 2048 5120"
|
||||
"11 5120 1024"
|
||||
"12 2048 5120"
|
||||
"12 5120 1024"
|
||||
"13 2048 5120"
|
||||
"13 5120 1024"
|
||||
"14 2048 5120"
|
||||
"14 5120 1024"
|
||||
"15 2048 5120"
|
||||
"15 5120 1024"
|
||||
"16 2048 5120"
|
||||
"16 5120 1024"
|
||||
"2048 5120 1024"
|
||||
"2048 5120 8192"
|
||||
"2048 7168 8192"
|
||||
"2048 8192 3584"
|
||||
"16384 7168 8192"
|
||||
"16384 8192 3584"
|
||||
)
|
||||
|
||||
# Output file
|
||||
OUTPUT_FILE="gemm_profile_results.csv"
|
||||
|
||||
# Output header
|
||||
echo "m,n,k,Pipeline,Time_ms,TFlops,GBps,Verification" > "$OUTPUT_FILE"
|
||||
|
||||
# Loop over each argument set
|
||||
for args in "${ARGS_LIST[@]}"; do
|
||||
read -r m n k <<< "$args"
|
||||
|
||||
echo "Testing: m=$m, n=$n, k=$k"
|
||||
OUTPUT=$($BIN -m=$m -n=$n -k=$k -prec=$PREC -v=$VERBOSITY 2>/dev/null)
|
||||
|
||||
# Extract pipeline information
|
||||
# Format: "Launching kernel with args: gemm_fp8_pipeline_AGmemBGmemCRegV2_128x256x256x256_16x16x128_16x16_0x0x0"
|
||||
PIPELINE=$(echo "$OUTPUT" | grep "Launching kernel with args:" | sed -n 's/.*Launching kernel with args: \(.*\)/\1/p')
|
||||
|
||||
# Extract TFlops and GB/s from the output
|
||||
# Format: "Run Gemm kernel with M=3840 N=4096 K=2048 ... : 0.042338 ms, 1521.67 TFlops, 1126.89 GB/s,"
|
||||
PERF_LINE=$(echo "$OUTPUT" | grep "TFlops")
|
||||
|
||||
# Extract verification result
|
||||
# Format: "The GPU verification result is: correct"
|
||||
VERIFICATION=$(echo "$OUTPUT" | grep "The GPU verification result is:" | sed -n 's/.*The GPU verification result is: \(.*\)/\1/p')
|
||||
|
||||
if [ -n "$PERF_LINE" ]; then
|
||||
# Extract execution time in ms
|
||||
TIME_MS=$(echo "$PERF_LINE" | grep -o '[0-9]\+\.[0-9]\+ ms' | grep -o '[0-9]\+\.[0-9]\+')
|
||||
# Extract TFlops value - more robust regex
|
||||
TFLOPS=$(echo "$PERF_LINE" | grep -o '[0-9]\+\.[0-9]\+ TFlops' | grep -o '[0-9]\+\.[0-9]\+')
|
||||
# Extract GB/s value - more robust regex
|
||||
GBPS=$(echo "$PERF_LINE" | grep -o '[0-9]\+\.[0-9]\+ GB/s' | grep -o '[0-9]\+\.[0-9]\+')
|
||||
|
||||
# Use extracted pipeline or default if not found
|
||||
if [ -z "$PIPELINE" ]; then
|
||||
PIPELINE="gemm_basic"
|
||||
fi
|
||||
|
||||
# Print to terminal
|
||||
echo " Pipeline: $PIPELINE"
|
||||
echo " Time: ${TIME_MS} ms"
|
||||
echo " TFlops: ${TFLOPS}"
|
||||
echo " GB/s: ${GBPS}"
|
||||
|
||||
|
||||
# Save to CSV file
|
||||
echo "$m,$n,$k,$PIPELINE,$TIME_MS,$TFLOPS,$GBPS,$VERIFICATION" >> "$OUTPUT_FILE"
|
||||
else
|
||||
echo " ERROR: Could not parse performance data"
|
||||
echo ""
|
||||
echo "$m,$n,$k,$PIPELINE,,,,$VERIFICATION" >> "$OUTPUT_FILE"
|
||||
fi
|
||||
done
|
||||
|
||||
echo "=========================================="
|
||||
echo "Profile completed!"
|
||||
echo "Results saved to: $OUTPUT_FILE"
|
||||
echo "Total tests run: ${#ARGS_LIST[@]}"
|
||||
echo "=========================================="
|
||||
Reference in New Issue
Block a user