Dynamic tensor descriptor (#24)

* support dynamic tensor descriptor

* use buffer load OOB feature for padding case

* add navi support

* add int8x4 inference kernel

Co-authored-by: Chao Liu <chao@ixt-rack-81.local.lan>
Co-authored-by: Jing Zhang <jizhan@amd.com>
This commit is contained in:
Chao Liu
2021-03-25 13:51:11 -05:00
committed by GitHub
parent bbcb67d0aa
commit fcbb978828
85 changed files with 14129 additions and 2532 deletions

View File

@@ -158,7 +158,7 @@ struct ParallelTensorFunctor
return indices;
}
void operator()(std::size_t num_thread) const
void operator()(std::size_t num_thread = std::thread::hardware_concurrency()) const
{
std::size_t work_per_thread = (mN1d + num_thread - 1) / num_thread;