Added support for systemless build (no pthreads).

Details:
- Added a configure option, --[enable|disable]-system, which determines
  whether the modest operating system dependencies in BLIS are included.
  The most notable example of this on Linux and BSD/OSX is the use of
  POSIX threads to ensure thread safety for when application-level
  threads call BLIS. When --disable-system is given, the bli_pthreads
  implementation is dummied out entirely, allowing the calling code
  within BLIS to remain unchanged. Why would anyone want to build BLIS
  like this? The motivating example was submitted via #454 in which a
  user wanted to build BLIS for a simulator such as gem5 where thread
  safety may not be a concern (and where the operating system is largely
  absent anyway). Thanks to Stepan Nassyr for suggesting this feature.
- Another, more minor side effect of the --disable-system option is that
  the implementation of bli_clock() unconditionally returns 0.0 instead
  of the time elapsed since some fixed point in the past. The reasoning
  for this is that if the operating system is truly minimal, the system
  function call upon which bli_clock() would normally be implemented
  (e.g. clock_gettime()) may not be available.
- Refactored preprocess-guarded code in bli_pthread.c and bli_pthread.h
  to remove redundancies.
- Removed old comments and commented #include of "bli_pthread_wrap.h"
  from bli_system.h.
- Documented bli_clock() and bli_clock_min_diff() in BLISObjectAPI.md
  and BLISTypedAPI.md, with a note that both are non-functional when
  BLIS is configured with --disable-system.
This commit is contained in:
Field G. Van Zee
2020-11-16 15:55:45 -06:00
parent 88ad841434
commit 9bb23e6c2a
11 changed files with 416 additions and 143 deletions

View File

@@ -31,6 +31,7 @@
* [Specific configuration](BLISObjectAPI.md#specific-configuration)
* [General configuration](BLISObjectAPI.md#general-configuration)
* [Kernel information](BLISObjectAPI.md#kernel-information)
* [Clock functions](BLISObjectAPI.md#clock-functions)
* **[Example code](BLISObjectAPI.md#example-code)**
@@ -2235,6 +2236,54 @@ Possible microkernel types (ie: the return values for `bli_info_get_*_ukr_impl_s
* `BLIS_OPTIMIZED_UKERNEL` (`"optimzd"`): This value is returned when the queried microkernel is provided by an implementation that is neither reference nor virtual, and thus we assume the kernel author would deem it to be "optimized". Such a microkernel may not be optimal in the literal sense of the word, but nonetheless is _intended_ to be optimized, at least relative to the reference microkernels.
* `BLIS_NOTAPPLIC_UKERNEL` (`"notappl"`): This value is returned usually when performing a `gemmtrsm` or `trsm` microkernel type query for any `method` value that is not `BLIS_NAT` (ie: native). That is, induced methods cannot be (purely) used on `trsm`-based microkernels because these microkernels perform more a triangular inversion, which is not matrix multiplication.
## Clock functions
---
#### clock
```c
double bli_clock
(
void
);
```
Return the amount of time that has elapsed since some fixed time in the past. The return values of `bli_clock()` typically feature nanosecond precision, though this is not guaranteed.
**Note:** On Linux, `bli_clock()` is implemented in terms of `clock_gettime()` using the `clockid_t` value of `CLOCK_MONOTONIC`. On OS X, `bli_clock` is implemented in terms of `mach_absolute_time()`. And on Windows, `bli_clock` is implemented in terms of `QueryPerformanceFrequency()`. Please see [frame/base/bli_clock.c](https://github.com/flame/blis/blob/master/frame/base/bli_clock.c) for more details.
**Note:** This function is returns meaningless values when BLIS is configured with `--disable-system`.
---
#### clock_min_diff
```c
double bli_clock_min_diff
(
double time_prev_min,
double time_start
);
```
This function computes an intermediate value, `time_diff`, equal to `bli_clock() - time_start`, and then tentatively prepares to return the minimum value of `time_diff` and `time_min`. If that minimum value is extremely small (close to zero), the function returns `time_min` instead.
This function is meant to be used in conjuction with `bli_clock()` for
performance timing within applications--specifically in loops where only
the fastest timing is of interest. For example:
```c
double t_save = DBL_MAX;
for( i = 0; i < 3; ++i )
{
double t = bli_clock();
bli_gemm( ... );
t_save = bli_clock_min_diff( t_save, t );
}
double gflops = ( 2.0 * m * k * n ) / ( t_save * 1.0e9 );
```
This code calls `bli_gemm()` three times and computes the performance, in GFLOPS, of the fastest of the three executions.
---
# Example code
BLIS provides lots of example code in the [examples/oapi](https://github.com/flame/blis/tree/master/examples/oapi) directory of the BLIS source distribution. The example code in this directory is set up like a tutorial, and so we recommend starting from the beginning. Topics include creating and managing objects, printing vectors and matrices, setting and querying object properties, and calling a representative subset of the computational level-1v, -1m, -2, -3, and utility operations documented above. Please read the `README` contained within the `examples/oapi` directory for further details.

View File

@@ -26,6 +26,7 @@
* [Specific configuration](BLISTypedAPI.md#specific-configuration)
* [General configuration](BLISTypedAPI.md#general-configuration)
* [Kernel information](BLISTypedAPI.md#kernel-information)
* [Clock functions](BLISTypedAPI.md#clock-functions)
* **[Example code](BLISTypedAPI.md#example-code)**
@@ -1902,6 +1903,54 @@ char* bli_info_get_trmm3_impl_string( num_t dt );
char* bli_info_get_trsm_impl_string( num_t dt );
```
## Clock functions
---
#### clock
```c
double bli_clock
(
void
);
```
Return the amount of time that has elapsed since some fixed time in the past. The return values of `bli_clock()` typically feature nanosecond precision, though this is not guaranteed.
**Note:** On Linux, `bli_clock()` is implemented in terms of `clock_gettime()` using the `clockid_t` value of `CLOCK_MONOTONIC`. On OS X, `bli_clock` is implemented in terms of `mach_absolute_time()`. And on Windows, `bli_clock` is implemented in terms of `QueryPerformanceFrequency()`. Please see [frame/base/bli_clock.c](https://github.com/flame/blis/blob/master/frame/base/bli_clock.c) for more details.
**Note:** This function is returns meaningless values when BLIS is configured with `--disable-system`.
---
#### clock_min_diff
```c
double bli_clock_min_diff
(
double time_prev_min,
double time_start
);
```
This function computes an intermediate value, `time_diff`, equal to `bli_clock() - time_start`, and then tentatively prepares to return the minimum value of `time_diff` and `time_min`. If that minimum value is extremely small (close to zero), the function returns `time_min` instead.
This function is meant to be used in conjuction with `bli_clock()` for
performance timing within applications--specifically in loops where only
the fastest timing is of interest. For example:
```c
double t_save = DBL_MAX;
for( i = 0; i < 3; ++i )
{
double t = bli_clock();
bli_gemm( ... );
t_save = bli_clock_min_diff( t_save, t );
}
double gflops = ( 2.0 * m * k * n ) / ( t_save * 1.0e9 );
```
This code calls `bli_gemm()` three times and computes the performance, in GFLOPS, of the fastest of the three executions.
---
# Example code
BLIS provides lots of example code in the [examples/tapi](https://github.com/flame/blis/tree/master/examples/tapi) directory of the BLIS source distribution. The example code in this directory is set up like a tutorial, and so we recommend starting from the beginning. Topics include printing vectors and matrices and calling a representative subset of the computational level-1v, -1m, -2, -3, and utility operations documented above. Please read the `README` contained within the `examples/tapi` directory for further details.