Restructure gridwise and blockwise GEMM, add tensor contraction and FWD-v4r5 (#36)

* experimenting magic number division

* overhauling fwd-v4r4 to clearly reflect transformation graph

* added fwd-v4r5

* bug fix for make_dynamic_naive_tensor_descriptor_aligned_v2

* bug fix and added sanity-check in transform_dynamic_tensor_descriptor

* added conv_driver_v2
This commit is contained in:
Chao Liu
2021-06-09 23:53:08 -05:00
committed by GitHub
parent 71d6b19d18
commit 30072aec37
38 changed files with 4791 additions and 2050 deletions

View File

@@ -74,7 +74,7 @@ __host__ __device__ constexpr auto integer_divide_floor(X x, Y y)
template <class X, class Y>
__host__ __device__ constexpr auto integer_divide_ceil(X x, Y y)
{
return (x + y - 1) / y;
return (x + y - Number<1>{}) / y;
}
template <class X, class Y>