- Impplemented her2 framework calls for transposed and non
transposed kernel variants.
- dher2 kernel operate over 4 columns at a time. It computes
4x4 triangular part of matrix first and remainder part is
computed in chunk of 4x4 tile upto m rows.
- remainder cases(m < 4) are handled serially.
AMD-Internal: [CPUPL-1968]
Change-Id: I12ae97b2ad673a7fd9b733c607f27b1089142313