Files
blis/docs/MixedDatatypes.md
Field G. Van Zee 5fec95b99f Implemented mixed-datatype support for gemm.
Details:
- Implemented support for gemm where A, B, and C may have different
  storage datatypes, as well as a computational precision (and implied
  computation domain) that may be different from the storage precision
  of either A or B. This results in 128 different combinations, all
  which are implemented within this commit. (For now, the mixed-datatype
  functionality is only supported via the object API.) If desired, the
  mixed-datatype support may be disabled at configure-time.
- Added a memory-intensive optimization to certain mixed-datatype cases
  that requires a single m-by-n matrix be allocated (temporarily) per
  call to gemm. This optimization aims to avoid the overhead involved in
  repeatedly updating C with general stride, or updating C after a
  typecast from the computation precision. This memory optimization may
  be disabled at configure-time (provided that the mixed-datatype
  support is enabled in the first place).
- Added support for testing mixed-datatype combinations to testsuite.
  The user may test gemm with mixed domains, precisions, both, or
  neither.
- Added a standalone test driver directory for building and running
  mixed-datatype performance experiments.
- Defined a new variation of castm, castnzm, which operates like castm
  except that imaginary values are not touched when casting a real
  operand to a complex operand. (By contrast, in these situations castm
  sets the imaginary components of the destination matrix to zero.)
- Defined bli_obj_imag_is_zero() and substituted calls in lieu of all
  usages of bli_obj_imag_equals() that tested against BLIS_ZERO, and
  also simplified the implementation of bli_obj_imag_equals().
- Fixed bad behavior from bli_obj_is_real() and bli_obj_is_complex()
  when given BLIS_CONSTANT objects.
- Disabled dt_on_output field in auxinfo_t structure as well as all
  accessor functions. Also commented out all usage of accessor
  functions within macrokernels. (Typecasting in the microkernel is
  still feasible, though probably unrealistic for now given the
  additional complexity required.)
- Use void function pointer type (instead of void*) for storing function
  pointers in bli_l0_fpa.c.
- Added documentation for using gemm with mixed datatypes in
  docs/MixedDatatypes.md and example code in examples/oapi/11gemm_md.c.
- Defined level-1d operation xpbyd and level-1m operation xpbym.
- Added xpbym test module to testsuite.
- Updated frame/include/bli_x86_asm_macros.h with additional macros
  (courtsey of Devin Matthews).
2018-10-15 16:37:39 -05:00

10 KiB

Contents

Introduction

This document serves as a guide to users interested in taking advantage of BLIS's support for performing the gemm operation on operands of differing types.

Categories of mixed datatypes

Before going any further, we find it useful to categorize mixed datatype support into four categories:

  1. Fully identical datatypes. This is what people generally think of when they think about the gemm operation: all operands are stored in the same datatype (precision and domain), and the matrix product computation is performed in the arithmetic represented by that datatype. (This category doesn't actually involve mixing datatypes, but it's still worthwhile to define.) Example: matrix C updated by the product of matrix A and matrix B (all matrices double-precision real).

  2. Mixed domain with identical precisions. This category includes all combinations of datatypes where the domain (real or complex) of each operand may vary while the precisions (single or double precision) are held constant across all operands. Example: complex matrix C updated by the product of real matrix A and complex matrix B (all matrices single-precision).

  3. Mixed precision within a single domain. Here, all operands are stored in the same domain (real or complex), however, the precision of each operand may vary. Example: double-precision real matrix C updated by the product of single-precision real matrix A and single-precision real matrix B.

  4. Mixed precision and mixed domain. This category allows both domains and precision of each matrix operand to vary. Example: double-precision complex matrix C updated by the product of single-precision complex matrix A and single-precision real matrix B.

BLIS's implementation of mixed-datatype gemm supports all combinations within all four categories.

Computation precision

Because categories 3 and 4 involve mixing precisions, they come with an added parameter: the computation precision. This parameter specifies the precision in which the matrix multiplication (product) takes place. This precision can be different than the storage precision of matrices A or B, and/or the storage precision of matrix C.

When the computation precision differs from the storage precision of matrix A, it implies that a typecast must occur when BLIS packs matrix A to contiguous storage. Similarly, B may also need to be typecast during packing.

When the computation precision differs from the storage precision of C, it means the result of the matrix product A*B must be typecast just before it is accumulated back into matrix C.

Computation domain

In addition to the computation precision, we also track a computation domain. (Together, they form the computation datatype.) However, for now we do not allow the user to explicitly specify the computation domain. Instead, the computation domain is implied by the domains of A, B, and C. The following table enumerates the six cases where there is at least one operand of each domain, along with the corresponding same-domain cases from category 1 for reference. We also list the total number of floating-point operations performed in each case. In the table, an 'R' denotes a real domain matrix operand while a 'C' denotes a matrix in the complex domain. The R's and C's appear in the following format of C += A * B, where A, B, and C are the matrix operands of gemm.

Case # Mixed domain case Implied computation domain flops performed
1 R += R * R real 2mnk
2 R += R * C real 2mnk
3 R += C * R real 2mnk
4 R += C * C complex 4mnk
5 C += R * R real 2mnk
6 C += R * C complex 4mnk
7 C += C * R complex 4mnk
8 C += C * C complex 8mnk

The computation domain is implied in cases 1 and 8 in the same way that it would be if mixed datatype support were absent entirely. These cases execute 2mnk and 8mnk flops, respectively, as any traditional implementation would.

In cases 2 and 3, we assume the computation domain is real because only B or A, respectively, is complex. Thus, in these cases, the imaginary components of the complex matrix are ignored, allowing us to perform only 2mnk flops.

In case 5, we take the computation domain to be real because A and B are both real, and thus it makes no sense to compute in the complex domain. This means that we need only update the real components of C, leaving the imaginary components untouched. This also results in 2mnk flops being performed.

In case 4, we have complex A and B, allowing us to compute a complex product. However, we can only save the real part of that complex product since the output matrix C is real. Since we cannot update the imaginary component of C (since it is not stored), we avoid computing that half of the update entirely, reducing the flops performed to 4mnk. (Alternatively, one may wish to request real domain computation, in which case the imaginary components of A and B were ignored prior to computing the matrix product. This approach would result in only 2mnk flops being performed.)

In case 6, we wish for both the real and imaginary parts of B to participate in the multiplication by A, with the result updating the corresponding real and imaginary parts of C. Granted, the imaginary part of A is zero, and this is taken advantage of in the computation to optimize performance, as indicated by the 4mnk flop count. But fundamentally this computation executes in the complex domain because both the real and imaginary parts of C are updated. A similar story can be told about case 7.

Performing gemm with mixed datatypes

In BLIS, performing a mixed-datatype gemm operation is easy. However, it will require that the user call gemm through BLIS's object API. For a basic series of examples for using the object-based API, please see the example codes in the examples/oapi directory of the BLIS source distribution.

The first step is to ensure that BLIS is configured with mixed datatype support. Please consult with your current distribution's configure script for the current semantics:

$ ./configure --help

As of this writing, mixed datatype support is enabled by default, and thus no additional options are needed.

With mixed datatype support enabled in BLIS, using the functionality is simply a matter of creating and initializing matrices of different precisions and/or domains.

dim_t  m = 5, n = 4, k = 2;
obj_t  a, b, c;
obj_t* alpha;
obj_t* beta;

bli_obj_create( BLIS_DOUBLE,   m, k, 0, 0, &a );
bli_obj_create( BLIS_FLOAT,    k, n, 0, 0, &b );
bli_obj_create( BLIS_SCOMPLEX, m, n, 0, 0, &c );

alpha = &BLIS_ONE;
beta  = &BLIS_ONE;

bli_randm( &a );
bli_randm( &b );
bli_randm( &c );

Then, you specify the computation precision by setting the computation precision property of matrix C.

bli_obj_set_comp_prec( BLIS_DOUBLE_PREC, &c );

If you do not explicitly specify the computation precision, it will default to the storage precision of C.

With the objects created and the computation precision specified, call bli_gemm() just as you would if the datatypes were identical:

bli_gemm( alpha, &a, &b, beta, &c );

For more examples of using BLIS's object-based API, including methods of initializing an matrix object with arbitrary values, please review the example code found in the examples/oapi directory of the BLIS source distribution.

Known Issues

While BLIS implements 128 mixed-datatype combinations of gemm, there may be odd behavior in the current implementation that does not conform to the reader's expectations. Below is a list of issues that BLIS developers are aware of in the context of mixed-datatype gemm. If any of these issues poses a problem for your application, please contact us by opening an issue.

  • alpha with non-zero imaginary components. Currently, there are many cases of mixed-datatype gemm that do not yet support computing with alpha scalars that have non-zero imaginary components--in other words, values of alpha that are not in the real domain. (By contrast, non-real values for beta are fully supported.) In order to support these use cases, additional code complexity and logic would be required. Thus, we have chosen, for now, to not implement them. If mixed-datatype gemm is invoked with a non-real valued alpha scalar, a runtime error message will be printed and the linked program will abort.

  • Manually specifying the computation domain. As mentioned in the section discussing the computation domain, the computation domain of any case of mixed domain gemm is implied by the operands and thus fixed; the user may not specify a different computation domain, even if the mixed-domain case would reasonably allow for computing in either domain.

Conclusion

For more information and documentation on BLIS, please visit the BLIS github page.

If you found a bug or wish to request a feature, please open an issue.

For general discussion or questions, please join and post a message to the blis-devel mailing list.

Thanks for your interest in BLIS!