Files
blis/addon
mkadavil 1e266bbcbc LPGEMM framework updates to avoid unnecessary pack buffer allocation.
-Currently when any of the downscale API is called, a temporary pack
buffer is allocated (with bli_membrk_acquire_m) by each thread. It is
used to persist intermediate higher precision output accumulated by the
micro-kernel across pc loop when the number of pc iterations is more
than 1 (k > KC). The bli_membrk_acquire_m is a thread safe operation and
uses locks (pthread_mutex) to ensure thread safe checkout of memory/
block from the memory pool.
-However when k < KC, this temporary buffer is not required. But since
this pack buffer is allocated by default in downscale API, the overhead
from locks affects performance when k < KC, m or n is sufficiently small
and the number of threads involved is high. This default allocation is
removed and the pack buffer is now only allocated if k > KC.

AMD-Internal: [CPUPL-3430]
Change-Id: I492586ff4c47bc7480d364efb7af3674e31bd2c1
2023-05-17 19:16:02 +05:30
..
2022-03-31 12:03:27 +05:30