## Contents * **[Contents](MixedDatatypes.md#contents)** * **[Introduction](MixedDatatypes.md#introduction)** * **[Categories of mixed datatypes](MixedDatatypes.md#categories-of-mixed-datatypes)** * **[Computation precision](MixedDatatypes.md#computation-precision)** * **[Computation domain](MixedDatatypes.md#computation-domain)** * **[Performing gemm with mixed datatypes](MixedDatatypes.md#performing-gemm-with-mixed-datatypes)** * **[Running the testsuite for gemm with mixed datatypes](MixedDatatypes.md#running-the-testsuite-for-gemm-with-mixed-datatypes)** * **[Known issues](MixedDatatypes.md#known-issues)** * **[Conclusion](MixedDatatypes.md#conclusion)** ## Introduction This document serves as a guide to users interested in taking advantage of BLIS's support for performing the `gemm` operation on operands of differing datatypes (domain and/or precision). For further details on the implementation present in BLIS, please see the latest draft of our paper "Supporting Mixed-domain Mixed-precision Matrix Multiplication within the BLIS Framework" available in the [Citations section](https://github.com/flame/blis/#citations) of the main [BLIS webpage](https://github.com/flame/blis). ## Categories of mixed datatypes Before going any further, we find it useful to categorize mixed datatype support into four categories: 1. **Fully identical datatypes.** This is what people generally think of when they think about the `gemm` operation: all operands are stored in the same datatype (precision and domain), and the matrix product computation is performed in the arithmetic represented by that datatype. (This category doesn't actually involve mixing datatypes, but it's still worthwhile to define.) Example: matrix C updated by the product of matrix A and matrix B (all matrices double-precision real). 2. **Mixed domain with identical precisions.** This category includes all combinations of datatypes where the domain (real or complex) of each operand may vary while the precisions (single or double precision) are held constant across all operands. Example: complex matrix C updated by the product of real matrix A and complex matrix B (all matrices single-precision). 3. **Mixed precision within a single domain.** Here, all operands are stored in the same domain (real or complex), however, the precision of each operand may vary. Example: double-precision real matrix C updated by the product of single-precision real matrix A and single-precision real matrix B. 4. **Mixed precision and mixed domain.** This category allows both domains and precision of each matrix operand to vary. Example: double-precision complex matrix C updated by the product of single-precision complex matrix A and single-precision real matrix B. BLIS's implementation of mixed-datatype `gemm` supports all combinations within all four categories. ### Computation precision Because categories 3 and 4 involve mixing precisions, they come with an added parameter: the *computation precision*. This parameter specifies the precision in which the matrix multiplication (product) takes place. This precision can be different than the storage precision of matrices A or B, and/or the storage precision of matrix C. When the computation precision differs from the storage precision of matrix A, it implies that a typecast must occur when BLIS packs matrix A to contiguous storage. Similarly, B may also need to be typecast during packing. When the computation precision differs from the storage precision of C, it means the result of the matrix product A*B must be typecast just before it is accumulated back into matrix C. ### Computation domain In addition to the computation precision, we also track a computation domain. (Together, they form the computation datatype.) However, for now we do not allow the user to explicitly specify the computation domain. Instead, the computation domain is implied by the domains of A, B, and C. The following table enumerates the six cases where there is at least one operand of each domain, along with the corresponding same-domain cases from category 1 for reference. We also list the total number of floating-point operations performed in each case. In the table, an 'R' denotes a real domain matrix operand while a 'C' denotes a matrix in the complex domain. The R's and C's appear in the following format of C += A * B, where A, B, and C are the matrix operands of `gemm`. | Case # | Mixed domain case | Implied computation domain | flops performed | |--------|:-----------------:|:--------------------------:|:---------------:| | 1 | R += R * R | real | 2mnk | | 2 | R += R * C | real | 2mnk | | 3 | R += C * R | real | 2mnk | | 4 | R += C * C | complex | 4mnk | | 5 | C += R * R | real | 2mnk | | 6 | C += R * C | complex | 4mnk | | 7 | C += C * R | complex | 4mnk | | 8 | C += C * C | complex | 8mnk | The computation domain is implied in cases 1 and 8 in the same way that it would be if mixed datatype support were absent entirely. These cases execute 2mnk and 8mnk flops, respectively, as any traditional implementation would. In cases 2 and 3, we assume the computation domain is real because only B or A, respectively, is complex. Thus, in these cases, the imaginary components of the complex matrix are ignored, allowing us to perform only 2mnk flops. In case 5, we take the computation domain to be real because A and B are both real, and thus it makes no sense to compute in the complex domain. This means that we need only update the real components of C, leaving the imaginary components untouched. This also results in 2mnk flops being performed. In case 4, we have complex A and B, allowing us to compute a complex product. However, we can only save the real part of that complex product since the output matrix C is real. Since we cannot update the imaginary component of C (since it is not stored), we avoid computing that half of the update entirely, reducing the flops performed to 4mnk. (Alternatively, one may wish to request real domain computation, in which case the imaginary components of A and B were ignored *prior* to computing the matrix product. This approach would result in only 2mnk flops being performed.) In case 6, we wish for both the real and imaginary parts of B to participate in the multiplication by A, with the result updating the corresponding real and imaginary parts of C. Granted, the imaginary part of A is zero, and this is taken advantage of in the computation to optimize performance, as indicated by the 4mnk flop count. But fundamentally this computation executes in the complex domain because both the real and imaginary parts of C are updated. A similar story can be told about case 7. ## Performing gemm with mixed datatypes In BLIS, performing a mixed-datatype `gemm` operation is easy. However, it will require that the user call `gemm` through BLIS's object API. For a basic series of examples for using the object-based API, please see the example codes in the `examples/oapi` directory of the BLIS source distribution. The first step is to ensure that BLIS is configured with mixed datatype support. Please consult with your current distribution's `configure` script for the current semantics: ``` $ ./configure --help ``` As of this writing, mixed datatype support is enabled by default, and thus no additional options are needed. With mixed datatype support enabled in BLIS, using the functionality is simply a matter of creating and initializing matrices of different precisions and/or domains. ```c dim_t m = 5, n = 4, k = 2; obj_t a, b, c; obj_t* alpha; obj_t* beta; bli_obj_create( BLIS_DOUBLE, m, k, 0, 0, &a ); bli_obj_create( BLIS_FLOAT, k, n, 0, 0, &b ); bli_obj_create( BLIS_SCOMPLEX, m, n, 0, 0, &c ); alpha = &BLIS_ONE; beta = &BLIS_ONE; bli_randm( &a ); bli_randm( &b ); bli_randm( &c ); ``` Then, you specify the computation precision by setting the computation precision property of matrix C. ```c bli_obj_set_comp_prec( BLIS_DOUBLE_PREC, &c ); ``` If you do not explicitly specify the computation precision, it will default to the *storage* precision of C. With the objects created and the computation precision specified, call `bli_gemm()` just as you would if the datatypes were identical: ```c bli_gemm( alpha, &a, &b, beta, &c ); ``` For more examples of using BLIS's object-based API, including methods of initializing an matrix object with arbitrary values, please review the example code found in the `examples/oapi` directory of the BLIS source distribution. ## Running the testsuite for gemm with mixed datatypes The BLIS testsuite has been retrofitted to test all combinations of datatypes for each matrix operand. For more information on enabling mixed-datatype tests for the `gemm` operation, please see the explanations of the relevant options in the [Testsuite](Testsuite.md) documentation. ## Known issues There may be odd behavior in the current implementation of mixed-datatype `gemm` that does not conform to the reader's expectations. Below is a list of issues that BLIS developers are aware of. If any of these issues poses a problem for your application, please contact us by [opening an issue](https://github.com/flame/blis/issues). * **alpha with non-zero imaginary components.** Currently, there are many cases of mixed-datatype `gemm` that do not yet support computing with `alpha` scalars that have non-zero imaginary components--in other words, values of `alpha` that are not in the real domain. (By contrast, non-real values for `beta` are fully supported.) In order to support these use cases, additional code complexity and logic would be required. Thus, we have chosen, for now, to not implement them. If mixed-datatype `gemm` is invoked with a non-real valued `alpha` scalar, a runtime error message will be printed and the linked program will abort. * **Manually specifying the computation domain.** As mentioned in the section discussing the [computation domain](MixedDatatype.md#computation-domain), the computation domain of any case of mixed domain `gemm` is implied by the operands and thus fixed; the user may not specify a different computation domain, even if the mixed-domain case would reasonably allow for computing in either domain. * **Sandboxes should be used with caution.** When building a `gemm` sandbox in BLIS, please consider either (a) disabling mixed datatype support, or (b) consciously **never** running the testsuite with mixed domain or precision computation enabled. Even the reference `ref99` sandbox implementation in BLIS does not support mixing datatypes. If you do choose to enable a sandbox while also keeping mixed datatype support enabled in BLIS, make sure that the mixing of datatypes is disabled in the testsuite's `input.general` file (unless, of course, you decide to implement all mixed datatype cases within your sandbox). This issue is also discussed in the documentation for [Sandboxes](Sandboxes.md#known-issues). ## Conclusion For more information and documentation on BLIS, please visit the [BLIS github page](https://github.com/flame/blis/). If you found a bug or wish to request a feature, please [open an issue](https://github.com/flame/blis/issues). For general discussion or questions, please join and post a message to the [blis-devel mailing list](http://groups.google.com/group/blis-devel). Thanks for your interest in BLIS!