Files
blis/kernels
Shubham Sharma 4bca7f6f4a DGEMMT optimizations
Details:

1. For lower and upper, non-transpose variants of gemmt, new kernels
   are developed and optimized to compute only the required outputs in
   the diagonal blocks.
2. In the previous implementation, all the 48 outputs of the given
   6x8 block of C matrix are computed and stored into a temporary
   buffer. Later,the required elements are copied into the final C
   output buffer.
3. Changes are made to compute only the required outputs of the 6x8
   block of C matrix and directly stored in the final C output buffer.
4. With this optimization, we are avoiding copy operation and also
   reducing the number of computations.
5. Kernels specific to compute Lower and Upper Variant diagonal
   outputs have been added.
6. SUP Framework changes to integrate the new kernels have been added.
7. These kernels are part of the SUP framework.

AMD-Internal: [CPUPL-2341]
Change-Id: I0ec8f24a0fb19d9b1ef7254732b8e09f06e1486a
2022-08-11 06:16:33 -04:00
..
2022-08-11 06:16:33 -04:00
2020-09-29 16:52:18 -05:00
2022-07-22 03:42:17 -04:00
2022-08-11 01:39:40 -04:00
2021-04-27 11:09:48 +05:30
2020-07-22 18:24:26 +05:30
2022-06-13 10:52:53 +05:30