Files
composable_kernel/test/convnd_fwd/conv3d_fwd.cpp
Adam Osewski b32c3df45d Convolution FWD profiler refactor. (#183)
* Convolution ND

* Code unification across dimensions for generating tensor descriptors.
* Example
* Instances

* Move convnd f32 instance file to comply with repo structure.

* Conv 1D tensor layouts.

* Formatting and use ReferenceConv

* Reference ConvFwd supporting 1D and 2D convolution.

* Debug printing TensorLayout name.

* Conv fwd 1D instance f32

* Refactor conv ND example.

Needed to support various conv dimensio.

Needed to support various conv dimensions

* Rename conv nd example director to prevent conflicts.

* Refactor some common utility to single file.

Plus some tests.

* Refactor GetHostTensorDescriptor + UT.

* Add 1D test case.

* Test reference convolution 1d/2d

* Remove some leftovers.

* Fix convolution example error for 1D

* Refactor test check errors utility function.

* Test Conv2D Fwd XDL

* More UT for 1D case.

* Parameterize input & weight initializers.

* Rename example to prevent conflicts.

* Split convnd instance into separate files for 1d/2d

* Address review comments.

* Fix data type for flops/gbytes calculations.

* Assign example number 11.

* 3D cases for convolution utility functions.

* 3D reference convolution.

* Add support for 3D convolution.

* Check for inputs bigger than  2GB.

* Formatting

* Support for bf16/f16/f32/i8 - conv instances + UT.

* Use check_err from test_util.hpp.

* Split convnd test into separate files for each dim.

* Fix data generation and use proper instances.

* Formatting

* Skip tensor initialization if not necessary.

* Fix CMakefiles.

* Remove redundant conv2d_fwd test.

* Lower problem size for conv3D UT.

* 3D case for convnd example.

* Remove leftovers after merge.

* Add Conv Specialization string to GetTypeString

* Skip instance causing numerical errors.

* Small fixes.

* Remove redundant includes.

* Fix namespace name error.

* Script for automatic testing and logging convolution fwd UTs

* Comment out numactl cmd.

* Refine weights initalization and relax rtol for fp16

* Move test_util.hpp to check_err.hpp

* Refine weights initalization and relax rtol for fp16

* Refactor common part of test conv utils.

* Move utility function to single common place.

* Add additional common functions to utility.

* Refactor convnd_fwd_xdl examples.

* Remove redundant files.
* Unify structure.

* Add constructor to ConvParams.

* And add input parameters validation.

* Modify conv examples to use single utility file.

* Remove check_error from host_tensor.hpp

* Get rid of check_indices function.

* Remove bf16_to_f32 function overload for scalars.

* Fix namespace.

* Add half_float::half for check_err.

* Fix conv params size in UT.

* Fix weights initialization for int8.

* Fix weights initialization for int8.

* Add type_convert when store output in ref conv 1D.

* Get back old conv2d_fwd_xdl operation.

* Silence conv debug print.

* format

* clean

* clean

* Fix merge.

* Fix namespace for check_err

* Formatting.

* Fix merge artifacts.

* Remove deleted header.

* Fix some includes and use ck::utils::check_err.

* Remove unused check_indices restored by previous merge.

* Fix namespaces after merge.

* Fix compilation error.

* Small fixes.

* Use common functions.
* Fix filename
* Fix namespaces.

* Fix merge artifact - retrieve removed by accident fun.

* Fix ConvForwardSpecialization.

* Working example of OpInstanceRunEngine for conv2dfwd UT.

* Adhere to coding style rules.

* Formatting and adhere to coding style rules.

* Fix merge artifacts.

* Utility for collecting conv fwd instances.

+ Plus commmon part for parsing cmdline params.

* Refactor FillUniform because of segfault for int8_t.

* Naming convention.

* Elegant version of device mem allocation.

* Use OpInstanceRunEngine in conv fwd nd tests.

* Multiple refinements.

* conditional init
* don't run reference op if not provided.

* Use OpInstanceRunEngine for ckProfiler conv_fwd

* Refactor common tensor fill function to separate file.

* Clean up unused functions.

* Support different init methods.

* Create CMake target for conv_fwd_util.

* Add header for profile_convnd_fwd.cpp

* Fix CMakefiles to link with conv_fwd_util where needed.

* Fix some clutter.

Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>

[ROCm/composable_kernel commit: 1a0cd5d160]
2022-04-21 17:39:39 -05:00

246 lines
11 KiB
C++

#include <half.hpp>
#include <iostream>
#include <stdexcept>
#include <tuple>
#include <vector>
#include "data_type.hpp"
#include "element_wise_operation.hpp"
#include "conv_fwd_util.hpp"
#include "conv_util.hpp"
namespace {
bool test_conv3d_ndhwc()
{
using namespace std::placeholders;
using namespace ck::utils;
namespace ctl = ck::tensor_layout::convolution;
conv::ConvParams params;
params.num_dim_spatial = 3;
params.N = 2;
params.K = 16;
params.C = 4;
params.filter_spatial_lengths = std::vector<ck::index_t>{3, 3, 3};
params.input_spatial_lengths = std::vector<ck::index_t>{16, 16, 16};
params.conv_filter_strides = std::vector<ck::index_t>{1, 1, 1};
params.conv_filter_dilations = std::vector<ck::index_t>{1, 1, 1};
params.input_left_pads = std::vector<ck::index_t>{1, 1, 1};
params.input_right_pads = std::vector<ck::index_t>{1, 1, 1};
std::vector<test::conv::DeviceConvFwdNoOpPtr> conv_ptrs;
test::conv::get_test_convolution_fwd_instance<3>(conv_ptrs);
conv::ConvFwdOpInstance<float, float, float, ctl::NDHWC, ctl::KZYXC, ctl::NDHWK> conv_instance(
params);
auto reference_conv_fwd_fun = std::bind(
conv::run_reference_convolution_forward<3, float, float, float>, params, _1, _2, _3);
OpInstanceRunEngine<float, float, float> run_engine(conv_instance, reference_conv_fwd_fun);
run_engine.SetAtol(1e-5);
run_engine.SetRtol(1e-4);
return run_engine.Test(conv_ptrs);
}
bool test_conv3d_ndhwc_2gb_input()
{
using PassThrough = ck::tensor_operation::element_wise::PassThrough;
using namespace ck::utils;
// >2GB Input
conv::ConvParams params;
params.num_dim_spatial = 3;
params.N = 2;
params.K = 16;
params.C = 32;
params.filter_spatial_lengths = std::vector<ck::index_t>{3, 3, 3};
params.input_spatial_lengths = std::vector<ck::index_t>{32, 1000, 1000};
params.conv_filter_strides = std::vector<ck::index_t>{1, 1, 1};
params.conv_filter_dilations = std::vector<ck::index_t>{1, 1, 1};
params.input_left_pads = std::vector<ck::index_t>{1, 1, 1};
params.input_right_pads = std::vector<ck::index_t>{1, 1, 1};
std::vector<test::conv::DeviceConvFwdNoOpPtr> conv_ptrs;
test::conv::get_test_convolution_fwd_instance<3>(conv_ptrs);
auto arg = conv_ptrs.back()->MakeArgumentPointer(nullptr,
nullptr,
nullptr,
params.N,
params.K,
params.C,
params.input_spatial_lengths,
params.filter_spatial_lengths,
params.GetOutputSpatialLengths(),
params.conv_filter_strides,
params.conv_filter_dilations,
params.input_left_pads,
params.input_right_pads,
PassThrough{},
PassThrough{},
PassThrough{});
return !(conv_ptrs.back()->IsSupportedArgument(arg.get()));
}
bool test_conv3d_ndhwc_2gb_filters()
{
using PassThrough = ck::tensor_operation::element_wise::PassThrough;
using namespace ck::utils;
// >2GB Filters
conv::ConvParams params;
params.num_dim_spatial = 3;
params.N = 2;
params.K = 16;
params.C = 32;
params.filter_spatial_lengths = std::vector<ck::index_t>{4, 1000, 1000};
params.input_spatial_lengths = std::vector<ck::index_t>{16, 16, 16};
params.conv_filter_strides = std::vector<ck::index_t>{1, 1, 1};
params.conv_filter_dilations = std::vector<ck::index_t>{1, 1, 1};
params.input_left_pads = std::vector<ck::index_t>{1, 1, 1};
params.input_right_pads = std::vector<ck::index_t>{1, 1, 1};
std::vector<test::conv::DeviceConvFwdNoOpPtr> conv_ptrs;
test::conv::get_test_convolution_fwd_instance<3>(conv_ptrs);
auto arg = conv_ptrs.back()->MakeArgumentPointer(nullptr,
nullptr,
nullptr,
params.N,
params.K,
params.C,
params.input_spatial_lengths,
params.filter_spatial_lengths,
params.GetOutputSpatialLengths(),
params.conv_filter_strides,
params.conv_filter_dilations,
params.input_left_pads,
params.input_right_pads,
PassThrough{},
PassThrough{},
PassThrough{});
return !(conv_ptrs.back()->IsSupportedArgument(arg.get()));
}
bool test_conv3d_ndhwc_2gb_output()
{
using PassThrough = ck::tensor_operation::element_wise::PassThrough;
using namespace ck::utils;
// >2GB Output
conv::ConvParams params;
params.num_dim_spatial = 3;
params.N = 2;
params.K = 16;
params.C = 2;
params.filter_spatial_lengths = std::vector<ck::index_t>{1, 1, 1};
params.input_spatial_lengths = std::vector<ck::index_t>{1000, 1000, 30};
params.conv_filter_strides = std::vector<ck::index_t>{1, 1, 1};
params.conv_filter_dilations = std::vector<ck::index_t>{1, 1, 1};
params.input_left_pads = std::vector<ck::index_t>{2, 2, 2};
params.input_right_pads = std::vector<ck::index_t>{2, 2, 2};
std::vector<test::conv::DeviceConvFwdNoOpPtr> conv_ptrs;
test::conv::get_test_convolution_fwd_instance<3>(conv_ptrs);
auto arg = conv_ptrs.back()->MakeArgumentPointer(nullptr,
nullptr,
nullptr,
params.N,
params.K,
params.C,
params.input_spatial_lengths,
params.filter_spatial_lengths,
params.GetOutputSpatialLengths(),
params.conv_filter_strides,
params.conv_filter_dilations,
params.input_left_pads,
params.input_right_pads,
PassThrough{},
PassThrough{},
PassThrough{});
return !(conv_ptrs.back()->IsSupportedArgument(arg.get()));
}
template <typename T>
bool test_conv3d_ndhwc_instances(const std::vector<test::conv::DeviceConvFwdNoOpPtr>& conv_ptrs)
{
using namespace std::placeholders;
using namespace ck::utils;
namespace ctl = ck::tensor_layout::convolution;
conv::ConvParams params;
params.N = 64;
params.num_dim_spatial = 3;
params.filter_spatial_lengths = std::vector<ck::index_t>{3, 3, 2};
params.input_spatial_lengths = std::vector<ck::index_t>{32, 32, 2};
params.conv_filter_strides = std::vector<ck::index_t>{2, 2, 2};
params.conv_filter_dilations = std::vector<ck::index_t>{1, 1, 1};
params.input_left_pads = std::vector<ck::index_t>{1, 1, 1};
params.input_right_pads = std::vector<ck::index_t>{1, 1, 1};
conv::ConvFwdOpInstance<T, T, T, ctl::NDHWC, ctl::KZYXC, ctl::NDHWK> conv_instance(params);
auto reference_conv_fwd_fun =
std::bind(conv::run_reference_convolution_forward<3, T, T, T>, params, _1, _2, _3);
OpInstanceRunEngine<T, T, T> run_engine(conv_instance, reference_conv_fwd_fun);
return run_engine.Test(conv_ptrs);
}
bool test_conv3d_ndhwc_bf16_instances()
{
return test_conv3d_ndhwc_instances<ck::bhalf_t>(
ck::utils::conv::ConvolutionFwdInstances<ck::bhalf_t, ck::bhalf_t, ck::bhalf_t>::Get<3>());
}
bool test_conv3d_ndhwc_f16_instances()
{
return test_conv3d_ndhwc_instances<ck::half_t>(
ck::utils::conv::ConvolutionFwdInstances<ck::half_t, ck::half_t, ck::half_t>::Get<3>());
}
bool test_conv3d_ndhwc_f32_instances()
{
return test_conv3d_ndhwc_instances<float>(
ck::utils::conv::ConvolutionFwdInstances<float, float, float>::Get<3>());
}
bool test_conv3d_ndhwc_int8_instances()
{
return test_conv3d_ndhwc_instances<int8_t>(
ck::utils::conv::ConvolutionFwdInstances<int8_t, int8_t, int8_t>::Get<3>());
}
} // anonymous namespace
int main()
{
bool res{true};
res = test_conv3d_ndhwc();
std::cout << "test_conv3d_ndhwc ..... " << (res ? "SUCCESS" : "FAILURE") << std::endl;
res = test_conv3d_ndhwc_2gb_input();
std::cout << "\ntest_conv3d_ndhwc_2gb_input ..... " << (res ? "SUCCESS" : "FAILURE")
<< std::endl;
res = test_conv3d_ndhwc_2gb_filters();
std::cout << "\ntest_conv3d_ndhwc_2gb_filters ..... " << (res ? "SUCCESS" : "FAILURE")
<< std::endl;
res = test_conv3d_ndhwc_2gb_output();
std::cout << "\ntest_conv3d_ndhwc_2gb_output ..... " << (res ? "SUCCESS" : "FAILURE")
<< std::endl;
res = test_conv3d_ndhwc_bf16_instances();
std::cout << "\ntest_conv3d_ndhwc_bf16_instances ..... " << (res ? "SUCCESS" : "FAILURE")
<< std::endl;
res = test_conv3d_ndhwc_f16_instances();
std::cout << "\ntest_conv3d_ndhwc_f16_instances ..... " << (res ? "SUCCESS" : "FAILURE")
<< std::endl;
res = test_conv3d_ndhwc_f32_instances();
std::cout << "\ntest_conv3d_ndhwc_f32_instances ..... " << (res ? "SUCCESS" : "FAILURE")
<< std::endl;
res = test_conv3d_ndhwc_int8_instances();
std::cout << "\ntest_conv3d_ndhwc_int8_instances ..... " << (res ? "SUCCESS" : "FAILURE")
<< std::endl;
return res ? 0 : 1;
}