mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-04-19 22:39:03 +00:00
feat(grouped_gemm): add preshuffle v2 support to grouped gemm example (#2721)
* docs(README): update readme with new build instructions * feat(grouped_gemm): add support back for non persistent kernel * refactor(grouped_gemm): simplify tensor creation * refactor(grouped_gemm): Persistance is now GemmConfig value for easier management * chore(grouped_gemm): add print statements to ease debugging * WIP(grouped_gemm): add grouped_gemm_preshuffle example and update CMake configuration * fix(tile_gemm_traits): change default value of Preshuffle_ from 0 to false for clarity * WIP(grouped_gemm): add dummy variables to compile the preshuffle pipelines * chore(grouped_gemm): add print statements and variables to debug numerical error with preshuffle * style: clang format work so far * BUG!(grouped_gemm_kernel.hpp): figured out a potential bug in for numerical errors in preshuffle pipeline * fix(grouped_gemm_kernel): add function in the kernel code to dynamically calculate tail_number resolving numerical errors * refactor(gemm_presuffle): make preshuffle pipeline v2 compatible with operator () calls from grouped gemm * chore(grouped_gemm): add/remove debug comments and debug print statements * feat(grouped_gemm): integrate preshuffle pipeline v2 into grouped gemm for all supported shapes * chore(gemm_profile): add new argument combinations * fix: branch cleanup, formatting, refactoring * fix: branch cleanup, formatting, refactoring * chore(changelog): update changelog to reflect new featuer * address review comments & nit
This commit is contained in:
@@ -36,8 +36,13 @@ ARGS_LIST=(
|
||||
"14 5120 1024"
|
||||
"15 2048 5120"
|
||||
"15 5120 1024"
|
||||
"16 64 128"
|
||||
"16 64 256"
|
||||
"16 2048 5120"
|
||||
"16 5120 1024"
|
||||
"512 768 640"
|
||||
"1024 1792 896"
|
||||
"1536 2816 1152"
|
||||
"2048 5120 1024"
|
||||
"2048 5120 8192"
|
||||
"2048 7168 8192"
|
||||
@@ -68,8 +73,8 @@ for args in "${ARGS_LIST[@]}"; do
|
||||
PERF_LINE=$(echo "$OUTPUT" | grep "TFlops")
|
||||
|
||||
# Extract verification result
|
||||
# Format: "The GPU verification result is: correct"
|
||||
VERIFICATION=$(echo "$OUTPUT" | grep "The GPU verification result is:" | sed -n 's/.*The GPU verification result is: \(.*\)/\1/p')
|
||||
# Format: "The GPU verification result is:correct" (note: no space after colon)
|
||||
VERIFICATION=$(echo "$OUTPUT" | grep "The GPU verification result is:" | sed -n 's/.*The GPU verification result is:\(.*\)/\1/p')
|
||||
|
||||
if [ -n "$PERF_LINE" ]; then
|
||||
# Extract execution time in ms
|
||||
@@ -89,6 +94,7 @@ for args in "${ARGS_LIST[@]}"; do
|
||||
echo " Time: ${TIME_MS} ms"
|
||||
echo " TFlops: ${TFLOPS}"
|
||||
echo " GB/s: ${GBPS}"
|
||||
echo " Verification: ${VERIFICATION:-N/A}"
|
||||
|
||||
|
||||
# Save to CSV file
|
||||
|
||||
Reference in New Issue
Block a user