Files
blis/kernels
Harsh Dave f48ced0811 Optimized dher2 implementation
- Impplemented her2 framework calls for transposed and non
  transposed kernel variants.

- dher2 kernel operate over 4 columns at a time. It computes
  4x4 triangular part of matrix first and remainder part is
  computed in chunk of 4x4 tile upto m rows.

- remainder cases(m < 4) are handled serially.

AMD-Internal: [CPUPL-1968]

Change-Id: I12ae97b2ad673a7fd9b733c607f27b1089142313
2022-05-17 18:13:07 +05:30
..
2021-11-12 08:58:52 +05:30
2022-05-17 18:13:07 +05:30
2021-04-27 11:09:48 +05:30
2021-11-23 10:29:15 +05:30
2021-03-08 19:04:17 +05:30