- In level-1 kernels, with multi-threading enabled, only the partial
job was getting executed.
- The bug was in bli_thread_vector_partition and occurred only
when minimum work for a thread >= 1 i.e., when the number of threads
launched is less than number of elements and the number of elements
is not a multiple of the number of threads launched.
AMD-Internal: [CPUPL-3231]
Change-Id: Ie20abb93468282cd6ac2372267714fb80c26d7cc