cutlass 2.4 documentation only update

This commit is contained in:
Manish Gupta
2020-11-22 18:11:37 -08:00
committed by Dustyn Blasig
parent e6bcdc60cf
commit ccb697bac7
6 changed files with 279 additions and 104 deletions

View File

@@ -51,7 +51,7 @@ f(p, r) = p * stride_h + R - r - 1 + pad_h
g(q, s) = h * stride_w + S - s - 1 + pad_w
```
A [host](/tools/util/include/reference/host/convolution.h) and [device](/tools/util/include/reference/device/convolution.h)
A [host](/tools/util/include/cutlass/util/reference/host/convolution.h) and [device](/tools/util/include/cutlass/util/reference/device/convolution.h)
reference implementation are provided in the CUTLASS Utilities.
This computation may be mapped to the elements of a matrix product as follows.
@@ -347,7 +347,7 @@ creating GEMM-B tile in shared memory.
The improvements covered by optimized iterators are:
- (a) Precomputing kernel-invariant pointer deltas on the host
- (b) Computing cta-invariant mask predicates on device-side iterator ctors
- (c) Use of [fast divmod](include/cutlass/fast_math.h) to map GEMM dimenstions to convolution tensors.
- (c) Use of [fast divmod](/include/cutlass/fast_math.h) to map GEMM dimenstions to convolution tensors.
For example, _optimized_ activation iterator uses fast divmod to map GEMM _M_ to NPQ
for activation iterator