Files
blis/frame
Hari Govind S eacad443e3 Optimization for DCOPY and SCOPY API
-  Replaced "vmovupd" with "vmovups" for "bli_scopyv_zen4_asm_avx512"
   kernel.

-  Optimization of loop unrolling for "bli_dcopyv_zen4_asm_avx512"
   and "bli_scopyv_zen4_asm_avx512" kernels.

-  Replaced existing load balancing algorithm for dcopy API with
   "bli_thread_range_sub" algorithm.

-  Included AOCL-dynamic values for optimial number of threads
   for zen5 architecture.

AMD-Internal: [CPUPL-5238]
Change-Id: Ic82bdfad9478c8f75dc5a3dcfed0df85fbcae957
2024-07-24 08:23:07 -04:00
..
2024-07-18 10:06:31 -04:00
2024-07-09 07:53:27 -04:00
2024-07-24 06:36:34 +00:00
2024-07-24 08:23:07 -04:00
2024-06-18 19:31:17 +05:30
2024-06-24 08:50:36 -04:00