From b31811538f6dfc8363626f032d1b3329a567b609 Mon Sep 17 00:00:00 2001 From: mirchen-amd Date: Tue, 19 Aug 2025 04:19:17 -0400 Subject: [PATCH] Mirchen/gemm blockscale wp segfault fix (#2638) * Add stride validation to prevent segfault in blockscale GEMM * run clang-format * Update profiler/include/profiler/profile_gemm_blockscale_wp_impl.hpp Co-authored-by: rahjain-amd * added stride length checking to more gemm examples in ckprofiler * ran clang format * added validation header and implement in core gemm operations * remove ck_tile transpose and gemm stages from CI (#2646) * update CK build instruction step 4 (#2563) Co-authored-by: Aviral Goel * Fixes to "General 2D Reduction Kernel" (#2535) (#2656) * fix reduce2d - revret the combine_partial_results() chnages - remove auto from function def * clang-format * enable aiter test_mha in daily CI (#2659) * feat(copy_kernel): add basic copy kernel example with beginner friendly documentation (#2582) * feat(copy_kernel): add basic copy kernel example with documentation * docs(CHANGELOG): Updated changelog * chore: performed clang format * Update example/ck_tile/39_copy/copy_basic.cpp Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update example/ck_tile/39_copy/README.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update example/ck_tile/39_copy/README.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update example/ck_tile/39_copy/README.md Co-authored-by: spolifroni-amd * Update example/ck_tile/39_copy/README.md Co-authored-by: spolifroni-amd * Update example/ck_tile/39_copy/README.md Co-authored-by: spolifroni-amd * fix(terminology): follow amd terms * extract elementwise copy to a new kernel * fix(copy_kernel): bug in verification * add comments about vgpr usage * lint and nits * add notes and comments * print hostTensor via stream * print hostTensor via stream --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: spolifroni-amd * [CK_TILE] FMHA BWD Optimization For GFX950 (#2628) * simplify fmha_bwd_kernel MakeKargs & dq_dram_window * simply duplicate * trload pipeline * Try two-stage * add prefetch * optimize & iglp * Fix num_byte calculations to use nhead_k for K & V size (#2653) Simple fix just to calculate the number of bytes correctly for what's reported in the output. I was getting 6200 GB/s which is past the SoL of MI300. Before: ``` ./bin/tile_example_fmha_fwd -prec=bf16 -b=2 -s=1 -s_k=32768 -h=32 -h_k=8 -d=128 -page_block_size=128 -num_splits=8 -iperm=0 -operm=0 -v=0 -kname=1 [bf16|batch|bshd] b:2, h:32/8, s:1/32768, d:128/128, scale_s:0.0883883, bias:n, p_drop:0, lse:0, squant:0, mask:n, v:r, num_splits:8, page_block_size:128, fmha_fwd_splitkv_d128_bf16_batch_b16x64x64x128x64x128_r1x4x1_r1x4x1_w16x16x16_w16x16x16_qr_nwarp_sshuffle_vr_ps_nlogits_nbias_nmask_lse_nsquant_pagedkv, fmha_fwd_splitkv_combine_d128_bf16_batch_b32_unused_ps_nlse_nsquant, 0.173 ms, 6.20 TFlops, 6202.95 GB/s ``` After: ``` ./bin/tile_example_fmha_fwd -prec=bf16 -b=2 -s=1 -s_k=32768 -h=32 -h_k=8 -d=128 -page_block_size=128 -num_splits=8 -iperm=0 -operm=0 -v=0 -kname=1 [bf16|batch|bshd] b:2, h:32/8, s:1/32768, d:128/128, scale_s:0.0883883, bias:n, p_drop:0, lse:0, squant:0, mask:n, v:r, num_splits:8, page_block_size:128, fmha_fwd_splitkv_d128_bf16_batch_b16x64x64x128x64x128_r1x4x1_r1x4x1_w16x16x16_w16x16x16_qr_nwarp_sshuffle_vr_ps_nlogits_nbias_nmask_lse_nsquant_pagedkv, fmha_fwd_splitkv_combine_d128_bf16_batch_b32_unused_ps_nlse_nsquant, 0.163 ms, 6.58 TFlops, 1644.53 GB/s ``` * [CK_TILE] FMHA BWD Decode Pipeline (#2643) * Fix distr * Duplicate block_fmha_bwd_dq_dk_dv_pipeline_trload_kr_ktr_vr * decode 16x16 o2 * fix (#2668) * Optimize fmha fwd decode & prefill for gfx950 (#2641) * Fix for fwd/bwd kernel build filter * fix bwd code * save an example for __bf16 type * temp save, waiting for debug * tempsave, fmha_decode * temp save, change all instance to 1wave * fix async copytest bug * Add block_sync_lds_direct_load utility * fix the s_waitcnt_imm calculation * Improve s_waitcnt_imm calculation * fix vmcnt shift * add input validation and bug fix * remove unnecessary output * move test_copy into test * temp save * tempsave * compile pass * tempsave, trload+asyncload done * tempsave. asynccopy+trload sanity checked * remove unnecessary features * fix the lds alignment caused performance regression * enable prefill overload operator(). * remove all lds bankconflict with xor layouts * enable larger tile size; upgrade xor pattern * upgrade prefill pipeline; simple iglp; consistent data produce and consume order * small refactor * Load Q through lds, implement xor; * add vmcnt guard before load ktile * Add v_permlaneb32 for block_reduce. Disable it as it will cause un-coexecutable packed math in FA * Add XOR fold strategy for hdim<128, but perf dropped; disable it by default; wait further perf debug * add __restrict__ to tr load * merge fa_decode pipeline into fmha_fwd api * remove unnecessary files; rename some files * Remove unnecessary changes * bug fix, clang format; * remove non-necessary change * fix clangformat with 18.1.3 * fix bugs * fix bug * fix bug on non-gfx950 * fix bugs in gemm * fix bug in pki4 * tempsave, update the blocksync functions * change the warp setting for hdim32 fmha fwd * clang format * fix conflict. disable all v-col instance for fmha fwd * Fix the bug * clang format --------- Co-authored-by: Max Podkorytov <4273004+tenpercent@users.noreply.github.com> * Revert "Optimize fmha fwd decode & prefill for gfx950 (#2641)" (#2670) This reverts commit 327bf408dd05b4e4bfb7b72f63f8710f35efa9a4. * added batch stride checking to batched gemm ops in profiler * removed batch stride validation * removed batched stride validation again * Update include/ck/library/utility/profiler_validation_common.hpp Co-authored-by: rahjain-amd * refactor function names * added gemm stride checking to more profiler gemm operations * run clang format * add stride checkign to 01 gemm example * rename from profiler to validation common, used for examples and profiler * build of ckProfiler success * update file headers --------- Co-authored-by: rahjain-amd Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by: geozhai <44495440+geozhai@users.noreply.github.com> Co-authored-by: Aviral Goel Co-authored-by: Yashvardhan Agarwal Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: spolifroni-amd Co-authored-by: Yi DING Co-authored-by: Cameron Shinn Co-authored-by: Mateusz Ozga <110818320+mozga-amd@users.noreply.github.com> Co-authored-by: Haocong WANG Co-authored-by: Max Podkorytov <4273004+tenpercent@users.noreply.github.com> Co-authored-by: asleepzzz [ROCm/composable_kernel commit: 60320e90c11b80411cb2b70c9c5a5976a56abad5] --- example/01_gemm/run_gemm_example.inc | 14 +++++- example/01_gemm/run_gemm_example_v2.inc | 2 +- .../ck/library/utility/validation_common.hpp | 50 +++++++++++++++++++ .../profiler/profile_gemm_ab_scale_impl.hpp | 7 ++- .../profile_gemm_bias_add_reduce_impl.hpp | 6 ++- .../profile_gemm_blockscale_wp_impl.hpp | 5 ++ .../include/profiler/profile_gemm_impl.hpp | 6 ++- .../profiler/profile_gemm_reduce_impl.hpp | 6 ++- .../profiler/profile_gemm_splitk_impl.hpp | 6 ++- .../profiler/profile_gemm_streamk_impl.hpp | 6 ++- .../profiler/profile_gemm_universal_impl.hpp | 4 ++ ...profile_gemm_universal_preshuffle_impl.hpp | 4 ++ .../profile_gemm_universal_reduce_impl.hpp | 6 ++- .../profile_gemm_universal_streamk_impl.hpp | 6 ++- 14 files changed, 118 insertions(+), 10 deletions(-) create mode 100644 include/ck/library/utility/validation_common.hpp mode change 100755 => 100644 profiler/include/profiler/profile_gemm_universal_streamk_impl.hpp diff --git a/example/01_gemm/run_gemm_example.inc b/example/01_gemm/run_gemm_example.inc index 6c5d9f9fba..3e018aad1e 100644 --- a/example/01_gemm/run_gemm_example.inc +++ b/example/01_gemm/run_gemm_example.inc @@ -1,7 +1,8 @@ // SPDX-License-Identifier: MIT -// Copyright (c) 2018-2024, Advanced Micro Devices, Inc. All rights reserved. +// Copyright (c) 2018-2025, Advanced Micro Devices, Inc. All rights reserved. #pragma once +#include "ck/library/utility/validation_common.hpp" template bool run_gemm(const ProblemType& problem_size, const ExecutionConfig& config) @@ -53,6 +54,17 @@ bool run_gemm(const ProblemType& problem_size, const ExecutionConfig& config) StrideB = f_get_default_stride(K, N, StrideB, BLayout{}); StrideC = f_get_default_stride(M, N, StrideC, CLayout{}); + try + { + ck::utils::validate_gemm_strides_abc( + M, N, K, StrideA, StrideB, StrideC); + } + catch(const std::runtime_error& e) + { + std::cerr << "Error: " << e.what() << std::endl; + return false; + } + Tensor a_m_k(f_host_tensor_descriptor(M, K, StrideA, ALayout{})); Tensor b_k_n(f_host_tensor_descriptor(K, N, StrideB, BLayout{})); diff --git a/example/01_gemm/run_gemm_example_v2.inc b/example/01_gemm/run_gemm_example_v2.inc index 4adb6f896b..3d8cf32221 100644 --- a/example/01_gemm/run_gemm_example_v2.inc +++ b/example/01_gemm/run_gemm_example_v2.inc @@ -1,5 +1,5 @@ // SPDX-License-Identifier: MIT -// Copyright (c) 2018-2023, Advanced Micro Devices, Inc. All rights reserved. +// Copyright (c) 2018-2025, Advanced Micro Devices, Inc. All rights reserved. #pragma once diff --git a/include/ck/library/utility/validation_common.hpp b/include/ck/library/utility/validation_common.hpp new file mode 100644 index 0000000000..38933c6d7c --- /dev/null +++ b/include/ck/library/utility/validation_common.hpp @@ -0,0 +1,50 @@ +// SPDX-License-Identifier: MIT +// Copyright (c) 2025, Advanced Micro Devices, Inc. All rights reserved. + +#pragma once + +#include +#include +#include +#include "ck/ck.hpp" +#include "ck/utility/type.hpp" +#include "ck/tensor_operation/gpu/device/tensor_layout.hpp" + +namespace ck { +namespace utils { + +template +inline void +validate_gemm_stride(int M, int N, int stride, const std::string& stride_name = "Stride") +{ + if(ck::is_same_v) + { + if(stride < M) + { + throw std::runtime_error( + "Error: For ColumnMajor layout, " + stride_name + " (" + std::to_string(stride) + + ") must be greater than or equal to dim (" + std::to_string(M) + ")"); + } + } + else // RowMajor + { + if(stride < N) + { + throw std::runtime_error( + "Error: For RowMajor layout, " + stride_name + " (" + std::to_string(stride) + + ") must be greater than or equal to dim (" + std::to_string(N) + ")"); + } + } +} + +// Convenience functions for common GEMM patterns +template +inline void validate_gemm_strides_abc(int M, int N, int K, int StrideA, int StrideB, int StrideC) +{ + validate_gemm_stride(M, K, StrideA, "StrideA"); + validate_gemm_stride(K, N, StrideB, "StrideB"); + validate_gemm_stride(M, N, StrideC, "StrideC"); +} + +} // namespace utils +} // namespace ck diff --git a/profiler/include/profiler/profile_gemm_ab_scale_impl.hpp b/profiler/include/profiler/profile_gemm_ab_scale_impl.hpp index a84ad5269b..d68a1065ab 100644 --- a/profiler/include/profiler/profile_gemm_ab_scale_impl.hpp +++ b/profiler/include/profiler/profile_gemm_ab_scale_impl.hpp @@ -1,5 +1,5 @@ // SPDX-License-Identifier: MIT -// Copyright (c) 2024, Advanced Micro Devices, Inc. All rights reserved. +// Copyright (c) 2024-2025, Advanced Micro Devices, Inc. All rights reserved. #pragma once @@ -19,6 +19,7 @@ #include "ck/library/utility/host_tensor.hpp" #include "ck/library/utility/host_tensor_generator.hpp" #include "ck/library/utility/literals.hpp" +#include "ck/library/utility/validation_common.hpp" #include "ck/library/reference_tensor_operation/cpu/reference_gemm.hpp" namespace ck { @@ -74,6 +75,10 @@ bool profile_gemm_ab_scale_impl(int do_verification, ? ((K + ScaleBlockK - 1) / ScaleBlockK) : ((N + ScaleBlockN - 1) / ScaleBlockN); + ck::utils::validate_gemm_stride(M, K, StrideA, "StrideA"); + ck::utils::validate_gemm_stride(K, N, StrideB, "StrideB"); + ck::utils::validate_gemm_stride(M, N, StrideE, "StrideE"); + Tensor a0_m_k(f_host_tensor_descriptor(M, K, StrideA, ALayout{})); Tensor a1_m_k(f_host_tensor_descriptor((M + ScaleBlockM - 1) / ScaleBlockM, (K + ScaleBlockK - 1) / ScaleBlockK, diff --git a/profiler/include/profiler/profile_gemm_bias_add_reduce_impl.hpp b/profiler/include/profiler/profile_gemm_bias_add_reduce_impl.hpp index c0ffea8a32..405a2359c2 100644 --- a/profiler/include/profiler/profile_gemm_bias_add_reduce_impl.hpp +++ b/profiler/include/profiler/profile_gemm_bias_add_reduce_impl.hpp @@ -1,5 +1,5 @@ // SPDX-License-Identifier: MIT -// Copyright (c) 2018-2023, Advanced Micro Devices, Inc. All rights reserved. +// Copyright (c) 2018-2025, Advanced Micro Devices, Inc. All rights reserved. #pragma once @@ -15,6 +15,7 @@ #include "ck/library/utility/host_tensor.hpp" #include "ck/library/utility/host_tensor_generator.hpp" #include "ck/library/utility/literals.hpp" +#include "ck/library/utility/validation_common.hpp" #include "ck/library/reference_tensor_operation/cpu/reference_gemm.hpp" namespace ck { @@ -93,6 +94,9 @@ void profile_gemm_bias_add_reduce_impl(int do_verification, } }; + ck::utils::validate_gemm_strides_abc( + M, N, K, StrideA, StrideB, StrideC); + Tensor a_m_k(f_host_tensor_descriptor2d(M, K, StrideA, ALayout{})); Tensor b_k_n(f_host_tensor_descriptor2d(K, N, StrideB, BLayout{})); diff --git a/profiler/include/profiler/profile_gemm_blockscale_wp_impl.hpp b/profiler/include/profiler/profile_gemm_blockscale_wp_impl.hpp index 53073a6c75..32bdf05771 100644 --- a/profiler/include/profiler/profile_gemm_blockscale_wp_impl.hpp +++ b/profiler/include/profiler/profile_gemm_blockscale_wp_impl.hpp @@ -20,6 +20,7 @@ #include "ck/library/utility/host_tensor_generator.hpp" #include "ck/library/utility/literals.hpp" #include "ck/library/reference_tensor_operation/cpu/reference_gemm.hpp" +#include "ck/library/utility/validation_common.hpp" namespace ck { namespace profiler { @@ -104,6 +105,10 @@ bool profile_gemm_blockscale_weighpreshuffle_impl(int do_verification, ? ((K + ScaleBlockK - 1) / ScaleBlockK) : ((N + ScaleBlockN - 1) / ScaleBlockN); + ck::utils::validate_gemm_stride(M, K, StrideA, "StrideA"); + ck::utils::validate_gemm_stride(K, N, StrideB, "StrideB"); + ck::utils::validate_gemm_stride(M, N, StrideE, "StrideE"); + Tensor a0_m_k(f_host_tensor_descriptor(M, K, StrideA, ALayout{})); Tensor a1_m_k(f_host_tensor_descriptor((M + ScaleBlockM - 1) / ScaleBlockM, (K + ScaleBlockK - 1) / ScaleBlockK, diff --git a/profiler/include/profiler/profile_gemm_impl.hpp b/profiler/include/profiler/profile_gemm_impl.hpp index d2a38b2a81..fdcb3ad128 100644 --- a/profiler/include/profiler/profile_gemm_impl.hpp +++ b/profiler/include/profiler/profile_gemm_impl.hpp @@ -1,5 +1,5 @@ // SPDX-License-Identifier: MIT -// Copyright (c) 2018-2024, Advanced Micro Devices, Inc. All rights reserved. +// Copyright (c) 2018-2025, Advanced Micro Devices, Inc. All rights reserved. #pragma once @@ -24,6 +24,7 @@ #include "ck/library/utility/literals.hpp" #include "ck/library/reference_tensor_operation/cpu/reference_gemm.hpp" #include "ck/library/utility/fill.hpp" +#include "ck/library/utility/validation_common.hpp" namespace ck { namespace profiler { @@ -64,6 +65,9 @@ int profile_gemm_impl(int do_verification, } }; + ck::utils::validate_gemm_strides_abc( + M, N, K, StrideA, StrideB, StrideC); + Tensor a_m_k(f_host_tensor_descriptor(M, K, StrideA, ALayout{})); Tensor b_k_n(f_host_tensor_descriptor(K, N, StrideB, BLayout{})); Tensor c_m_n_host_result(f_host_tensor_descriptor(M, N, StrideC, CLayout{})); diff --git a/profiler/include/profiler/profile_gemm_reduce_impl.hpp b/profiler/include/profiler/profile_gemm_reduce_impl.hpp index ff801e8afd..a74d2a01d9 100644 --- a/profiler/include/profiler/profile_gemm_reduce_impl.hpp +++ b/profiler/include/profiler/profile_gemm_reduce_impl.hpp @@ -1,5 +1,5 @@ // SPDX-License-Identifier: MIT -// Copyright (c) 2018-2023, Advanced Micro Devices, Inc. All rights reserved. +// Copyright (c) 2018-2025, Advanced Micro Devices, Inc. All rights reserved. #pragma once @@ -15,6 +15,7 @@ #include "ck/library/utility/host_tensor.hpp" #include "ck/library/utility/host_tensor_generator.hpp" #include "ck/library/utility/literals.hpp" +#include "ck/library/utility/validation_common.hpp" #include "ck/library/reference_tensor_operation/cpu/reference_gemm.hpp" namespace ck { @@ -88,6 +89,9 @@ bool profile_gemm_reduce_impl(int do_verification, } }; + ck::utils::validate_gemm_strides_abc( + M, N, K, StrideA, StrideB, StrideC); + Tensor a_m_k(f_host_tensor_descriptor(M, K, StrideA, ALayout{})); Tensor b_k_n(f_host_tensor_descriptor(K, N, StrideB, BLayout{})); diff --git a/profiler/include/profiler/profile_gemm_splitk_impl.hpp b/profiler/include/profiler/profile_gemm_splitk_impl.hpp index 5d5ae1ad15..0640e95aba 100644 --- a/profiler/include/profiler/profile_gemm_splitk_impl.hpp +++ b/profiler/include/profiler/profile_gemm_splitk_impl.hpp @@ -1,5 +1,5 @@ // SPDX-License-Identifier: MIT -// Copyright (c) 2018-2023, Advanced Micro Devices, Inc. All rights reserved. +// Copyright (c) 2018-2025, Advanced Micro Devices, Inc. All rights reserved. #pragma once @@ -19,6 +19,7 @@ #include "ck/library/utility/host_tensor.hpp" #include "ck/library/utility/host_tensor_generator.hpp" #include "ck/library/utility/literals.hpp" +#include "ck/library/utility/validation_common.hpp" #include "ck/library/reference_tensor_operation/cpu/reference_gemm.hpp" namespace ck { @@ -62,6 +63,9 @@ bool profile_gemm_splitk_impl(int do_verification, } }; + ck::utils::validate_gemm_strides_abc( + M, N, K, StrideA, StrideB, StrideC); + Tensor a_m_k(f_host_tensor_descriptor(M, K, StrideA, ALayout{})); Tensor b_k_n(f_host_tensor_descriptor(K, N, StrideB, BLayout{})); Tensor c_m_n_host_result(f_host_tensor_descriptor(M, N, StrideC, CLayout{})); diff --git a/profiler/include/profiler/profile_gemm_streamk_impl.hpp b/profiler/include/profiler/profile_gemm_streamk_impl.hpp index 71b54c1f47..d24ee1c7ea 100644 --- a/profiler/include/profiler/profile_gemm_streamk_impl.hpp +++ b/profiler/include/profiler/profile_gemm_streamk_impl.hpp @@ -1,5 +1,5 @@ // SPDX-License-Identifier: MIT -// Copyright (c) 2018-2022, Advanced Micro Devices, Inc. All rights reserved. +// Copyright (c) 2018-2025, Advanced Micro Devices, Inc. All rights reserved. #pragma once @@ -19,6 +19,7 @@ #include "ck/library/utility/host_tensor.hpp" #include "ck/library/utility/host_tensor_generator.hpp" #include "ck/library/utility/literals.hpp" +#include "ck/library/utility/validation_common.hpp" #include "ck/library/reference_tensor_operation/cpu/reference_gemm.hpp" namespace ck { @@ -59,6 +60,9 @@ bool profile_gemm_streamk_impl(int do_verification, } }; + ck::utils::validate_gemm_strides_abc( + M, N, K, StrideA, StrideB, StrideC); + Tensor a_m_k(f_host_tensor_descriptor(M, K, StrideA, ALayout{})); Tensor b_k_n(f_host_tensor_descriptor(K, N, StrideB, BLayout{})); Tensor c_m_n_host_result(f_host_tensor_descriptor(M, N, StrideC, CLayout{})); diff --git a/profiler/include/profiler/profile_gemm_universal_impl.hpp b/profiler/include/profiler/profile_gemm_universal_impl.hpp index ed62828158..feb75c9660 100644 --- a/profiler/include/profiler/profile_gemm_universal_impl.hpp +++ b/profiler/include/profiler/profile_gemm_universal_impl.hpp @@ -19,6 +19,7 @@ #include "ck/library/utility/host_tensor.hpp" #include "ck/library/utility/host_tensor_generator.hpp" #include "ck/library/utility/literals.hpp" +#include "ck/library/utility/validation_common.hpp" #include "ck/library/reference_tensor_operation/cpu/reference_gemm.hpp" namespace ck { @@ -63,6 +64,9 @@ bool profile_gemm_universal_impl(int do_verification, } }; + ck::utils::validate_gemm_strides_abc( + M, N, K, StrideA, StrideB, StrideC); + Tensor a_m_k(f_host_tensor_descriptor(M, K, StrideA, ALayout{})); Tensor b_k_n(f_host_tensor_descriptor(K, N, StrideB, BLayout{})); Tensor b_k_n_permute(f_host_tensor_descriptor(K, N, StrideB, BLayout{})); diff --git a/profiler/include/profiler/profile_gemm_universal_preshuffle_impl.hpp b/profiler/include/profiler/profile_gemm_universal_preshuffle_impl.hpp index e218143857..271bc6ef59 100644 --- a/profiler/include/profiler/profile_gemm_universal_preshuffle_impl.hpp +++ b/profiler/include/profiler/profile_gemm_universal_preshuffle_impl.hpp @@ -19,6 +19,7 @@ #include "ck/library/utility/host_tensor.hpp" #include "ck/library/utility/host_tensor_generator.hpp" #include "ck/library/utility/literals.hpp" +#include "ck/library/utility/validation_common.hpp" #include "ck/library/reference_tensor_operation/cpu/reference_gemm.hpp" namespace ck { @@ -91,6 +92,9 @@ bool profile_gemm_universal_preshuffle_impl(int do_verification, } }; + ck::utils::validate_gemm_strides_abc( + M, N, K, StrideA, StrideB, StrideC); + Tensor a_m_k(f_host_tensor_descriptor(M, K, StrideA, ALayout{})); Tensor b_k_n(f_host_tensor_descriptor(K, N, StrideB, BLayout{})); Tensor b_k_n_permute(f_host_tensor_descriptor(K, N, StrideB, BLayout{})); diff --git a/profiler/include/profiler/profile_gemm_universal_reduce_impl.hpp b/profiler/include/profiler/profile_gemm_universal_reduce_impl.hpp index d600de0978..a0ee6a6674 100644 --- a/profiler/include/profiler/profile_gemm_universal_reduce_impl.hpp +++ b/profiler/include/profiler/profile_gemm_universal_reduce_impl.hpp @@ -1,5 +1,5 @@ // SPDX-License-Identifier: MIT -// Copyright (c) 2018-2024, Advanced Micro Devices, Inc. All rights reserved. +// Copyright (c) 2018-2025, Advanced Micro Devices, Inc. All rights reserved. #pragma once @@ -19,6 +19,7 @@ #include "ck/library/utility/host_tensor.hpp" #include "ck/library/utility/host_tensor_generator.hpp" #include "ck/library/utility/literals.hpp" +#include "ck/library/utility/validation_common.hpp" #include "ck/library/reference_tensor_operation/cpu/reference_gemm.hpp" namespace ck { @@ -64,6 +65,9 @@ bool profile_gemm_universal_reduce_impl(int do_verification, } }; + ck::utils::validate_gemm_strides_abc( + M, N, K, StrideA, StrideB, StrideC); + Tensor a_m_k(f_host_tensor_descriptor(M, K, StrideA, ALayout{})); Tensor b_k_n(f_host_tensor_descriptor(K, N, StrideB, BLayout{})); Tensor c_m_n_host_result(f_host_tensor_descriptor(M, N, StrideC, CLayout{})); diff --git a/profiler/include/profiler/profile_gemm_universal_streamk_impl.hpp b/profiler/include/profiler/profile_gemm_universal_streamk_impl.hpp old mode 100755 new mode 100644 index 640b192baf..5c859b830d --- a/profiler/include/profiler/profile_gemm_universal_streamk_impl.hpp +++ b/profiler/include/profiler/profile_gemm_universal_streamk_impl.hpp @@ -1,5 +1,5 @@ // SPDX-License-Identifier: MIT -// Copyright (c) 2024, Advanced Micro Devices, Inc. All rights reserved. +// Copyright (c) 2024-2025, Advanced Micro Devices, Inc. All rights reserved. #pragma once @@ -21,6 +21,7 @@ #include "ck/library/utility/host_tensor.hpp" #include "ck/library/utility/host_tensor_generator.hpp" #include "ck/library/utility/literals.hpp" +#include "ck/library/utility/validation_common.hpp" #include "ck/library/reference_tensor_operation/cpu/reference_gemm.hpp" #include "ck/library/reference_tensor_operation/gpu/reference_gemm.hpp" @@ -67,6 +68,9 @@ bool profile_gemm_universal_streamk_impl(int do_verification, } }; + ck::utils::validate_gemm_strides_abc( + M, N, K, StrideA, StrideB, StrideC); + Tensor a_m_k(f_host_tensor_descriptor(M, K, StrideA, ALayout{})); Tensor b_k_n(f_host_tensor_descriptor(K, N, StrideB, BLayout{})); Tensor c_m_n_host_result(f_host_tensor_descriptor(M, N, StrideC, CLayout{}));