blis/frame/thread at c3d1a3878ca07bcc2660db8d2e87675e5bb9ca8f - blis - Public git mirror

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-12 18:15:37 +00:00

Files

History

Eashan Dash c3d1a3878c Parallelized Pack and Compute Extension APIs

1. OpenMP based multi-threading parallelism is added for BLAS
   extension APIs of Pack and Compute

2. Both pack and compute APIs are parallelized.

3. Multi-threading of pack and compute APIs done with different
   number of threads can lead to inconsistent results due to
   output difference of the full packed matrix buffer when packed
   with different number of threads.

4. In multi-threaded execution, we ensure output of packed buffer
   is exactly the same as in single threaded execution.

5. Similarly for compute API, read of packed buffer in multi-
   threaded execution is exactly the same as in single-threaded
   execution.

6. Routines are added to compute the offsets for thread workload
   distribution for MT execution.
   1. The offsets are calculated in such a way that it resembles
      the reorder buffer traversal in single threaded reordering.
   2. The panel boundaries (KCxNC) remain as it is accessed in
      single thread, and as a consequence a thread with jc_start
      inside the panel cannot consider NC range for reorder.
   3. It has to work with NC' < NC, and the offset is calulated
      using prev NC panels spanning k dim + cur NC panel spaning
      pc loop cur iteration + (NC - NC') spanning current
      kc0 (<= KC).

7. Routines to ensure the same are added for MT execution
   1. frame/base/bli_pack_compute_utils.c
   2. frame/base/bli_pack_compute_utils.h

AMD-Internal: [CPUPL-3560]
Change-Id: I0dad33e0062519de807c32f6071e61fba976d9ac

2023-11-03 08:47:17 -04:00

..

This check in has changes w.r.t Copyright information, which is changed to (start year) - 2019

2019-05-27 16:24:43 +05:30

bli_l3_compute_decor_openmp.c

Parallelized Pack and Compute Extension APIs

2023-11-03 08:47:17 -04:00

bli_l3_compute_decor_openmp.h

BLAS Extension API - ?gemm_compute()

2023-10-16 08:18:52 -04:00

bli_l3_compute_decor_single.c

Parallelized Pack and Compute Extension APIs

2023-11-03 08:47:17 -04:00

bli_l3_compute_decor_single.h

BLAS Extension API - ?gemm_compute()

2023-10-16 08:18:52 -04:00

bli_l3_compute_decor.h

BLAS Extension API - ?gemm_compute()

2023-10-16 08:18:52 -04:00

bli_l3_decor_openmp.c

Merge commit 'b683d01b' into amd-main

2023-08-21 07:01:38 -04:00

bli_l3_decor_openmp.h

"Merge Selective Packing code from amd branch flame/blis"

2020-08-06 10:09:28 +05:30

bli_l3_decor_pthreads.c

Add err_t* "return" parameter to malloc functions.

2021-03-31 17:09:36 -05:00

bli_l3_decor_pthreads.h

"Merge Selective Packing code from amd branch flame/blis"

2020-08-06 10:09:28 +05:30

bli_l3_decor_single.c

Merge commit 'b683d01b' into amd-main

2023-08-21 07:01:38 -04:00

bli_l3_decor_single.h

"Merge Selective Packing code from amd branch flame/blis"

2020-08-06 10:09:28 +05:30

bli_l3_decor.h

"Merge Selective Packing code from amd branch flame/blis"

2020-08-06 10:09:28 +05:30

bli_l3_sup_decor_openmp.c

Renamed membrk files/vars/functions to pba.

2021-03-27 17:22:14 -05:00

bli_l3_sup_decor_openmp.h

"Merge Selective Packing code from amd branch flame/blis"

2020-08-06 10:09:28 +05:30

bli_l3_sup_decor_pthreads.c

Add err_t* "return" parameter to malloc functions.

2021-03-31 17:09:36 -05:00

bli_l3_sup_decor_pthreads.h

"Merge Selective Packing code from amd branch flame/blis"

2020-08-06 10:09:28 +05:30

bli_l3_sup_decor_single.c

Renamed membrk files/vars/functions to pba.

2021-03-27 17:22:14 -05:00

bli_l3_sup_decor_single.h

"Merge Selective Packing code from amd branch flame/blis"

2020-08-06 10:09:28 +05:30

bli_l3_sup_decor.h

Support multithreading within the sup framework.

2020-08-06 10:09:28 +05:30

bli_pack_full_decor_openmp.c

Parallelized Pack and Compute Extension APIs

2023-11-03 08:47:17 -04:00

bli_pack_full_decor_openmp.h

Added BLAS Extension APIs - Get Size and Pack API

2023-10-04 06:43:59 -04:00

bli_pack_full_decor_single.c

Parallelized Pack and Compute Extension APIs

2023-11-03 08:47:17 -04:00

bli_pack_full_decor_single.h

Added BLAS Extension APIs - Get Size and Pack API

2023-10-04 06:43:59 -04:00

bli_pack_full_decor.h

Added BLAS Extension APIs - Get Size and Pack API

2023-10-04 06:43:59 -04:00

bli_pthread.c

Disabled _self() and _equal() in bli_pthread API.

2021-03-12 19:47:39 -06:00

bli_pthread.h

Disabled _self() and _equal() in bli_pthread API.

2021-03-12 19:47:39 -06:00

bli_thrcomm_openmp.c

Add err_t* "return" parameter to malloc functions.

2021-03-31 17:09:36 -05:00

bli_thrcomm_openmp.h

"Merge Selective Packing code from amd branch flame/blis"

2020-08-06 10:09:28 +05:30

bli_thrcomm_pthreads.c

Switch allocator mutexes to static initialization.

2021-03-27 15:15:09 -05:00

bli_thrcomm_pthreads.h

Switch allocator mutexes to static initialization.

2021-03-27 15:15:09 -05:00

bli_thrcomm_single.c

"Merge Selective Packing code from amd branch flame/blis"

2020-08-06 10:09:28 +05:30

bli_thrcomm_single.h

Replaced use of bool_t type with C99 bool.

2020-08-03 11:27:13 +05:30

bli_thrcomm.c

Cleaned up bool_t usage and various typecasts.

2020-08-03 11:23:40 +05:30

bli_thrcomm.h

"Merge Selective Packing code from amd branch flame/blis"

2020-08-06 10:09:28 +05:30

bli_thread.c

Improvements to xerbla functionality

2023-10-16 08:48:51 -04:00

bli_thread.h

BLAS Extension API - ?gemm_compute()

2023-10-16 08:18:52 -04:00

bli_thrinfo_sup.c

Merge commit 'b683d01b' into amd-main

2023-08-21 07:01:38 -04:00

bli_thrinfo_sup.h

Support multithreading within the sup framework.

2020-03-13 01:09:29 -04:00

bli_thrinfo.c

Add err_t* "return" parameter to malloc functions.

2021-03-31 17:09:36 -05:00

bli_thrinfo.h

Removed export macros from all internal prototypes.

2020-08-03 11:47:18 +05:30

CMakeLists.txt

BLAS Extension API - ?gemm_compute()

2023-10-16 08:18:52 -04:00