Fix the fp8 gemm for large tensors on MI300. (#1011)

* Fix the fp8 conversion * Try clipping value before conversion * Fix return * Simplify with a const * reduce the gemm input tensor values to reduce round-off error * replace if-else with lambda * fix syntax --------- Co-authored-by: Rostyslav Geyyer <rosty.geyyer@amd.com>
2026-05-25 07:14:37 +00:00 · 2023-10-27 21:10:47 -07:00
parent 6fe0bc7e72
commit f46a6ffad8
4 changed files with 10 additions and 8 deletions
--- a/profiler/include/profiler/profile_gemm_impl.hpp
+++ b/profiler/include/profiler/profile_gemm_impl.hpp
@@ -75,8 +75,8 @@ int profile_gemm_impl(int do_verification,
        b_k_n.GenerateTensorValue(GeneratorTensor_2<BDataType>{-5, 5});
        break;
    default:
-        a_m_k.GenerateTensorValue(GeneratorTensor_3<ADataType>{0.0, 1.0});
-        b_k_n.GenerateTensorValue(GeneratorTensor_3<BDataType>{-0.5, 0.5});
+        a_m_k.GenerateTensorValue(GeneratorTensor_3<ADataType>{0.0, 0.1});
+        b_k_n.GenerateTensorValue(GeneratorTensor_3<BDataType>{-0.05, 0.05});
    }

    using AElementOp = ck::tensor_operation::element_wise::PassThrough;