Data Race and Barrier Fixes

* Replaced omp barrier with bli_thread_barrier and added similar fix fo… (#248)

* Replaced omp barrier with bli_thread_barrier and added similar fix for pack compute routine

## **The Sequence is Critical:**

1. **All threads execute `func(...)`** - may access shared resources including communicators
2. **Barrier ** - ensures ALL threads finish their work
3. **Then each thread calls `bli_l3_sup_thrinfo_free()`** - only chief actually frees gl_comm
4. **Safe cleanup** - no use-after-free because all threads are done using gl_comm

## **What Would Happen Without the Barrier:**

```c
// Thread execution timeline WITHOUT barrier:
Time 1: Thread 0 finishes func() early → immediately calls bli_l3_sup_thrinfo_free()
Time 2: Thread 0 (chief) frees gl_comm
Time 3: Thread 1,2,3... still in func() → try to use freed gl_comm → CRASH!
```

## **With the Barrier (Current Safe Code):**

```c
// Thread execution timeline WITH barrier:
Time 1: All threads finish func() at different times
Time 2: ALL threads reach barrier → wait for slowest thread
Time 3: ALL threads proceed past barrier together  
Time 4: Chief frees gl_comm → safe because no one using it anymore
```

* Fixed Data Race in Native code-path (#251)

```c
_Pragma( "omp parallel num_threads(n_threads)" )
{
    // ... thread work ...

    // Free the current thread's thrinfo_t structure.
    bli_l3_thrinfo_free( rntm_p, thread );  // Line 183
}
// *** MISSING BARRIER HERE! ***

// Check the array_t back into the small block allocator...
bli_sba_checkin_array( array );  // Line 200
```

```c
// DANGEROUS execution timeline:
Thread 0 (chief):
    completes func()
    calls bli_l3_cntl_free()
    calls bli_l3_thrinfo_free() → frees gl_comm ✓
    exits OpenMP parallel region
    calls bli_sba_checkin_array(array) → frees array ✗

Thread 1,2,3 (still executing):
    still in func() or bli_l3_cntl_free()
    trying to access freed gl_comm → CRASH!
    trying to access freed array pools → CRASH!
```

This is **exactly the same issue** that PR #702 fixed in other files! The function needs a barrier before threads exit the parallel region to ensure:

1. **All threads complete their work** before any cleanup starts
2. **Global communicator isn't freed** while other threads are using it
3. **Array pools aren't freed** while other threads are accessing them
This commit is contained in:
Varaganti, Kiran
2025-11-07 21:55:24 +05:30
committed by GitHub
parent 7b4e665273
commit 5a73108953
3 changed files with 18 additions and 3 deletions

View File

@@ -112,7 +112,12 @@ void bli_l3_compute_thread_decorator
thread
);
// Free the current thread's thrinfo_t structure.
// NOTE: The barrier here is very important as it prevents memory being
// released by the chief of some thread sub-group before its peers are done
// using it. See PR #702 for more info [1].
// [1] https://github.com/flame/blis/pull/702
bli_thread_barrier( thread );
// Free the current thread's thrinfo_t structure
bli_l3_sup_thrinfo_free( rntm_p, thread );
}

View File

@@ -179,6 +179,13 @@ void bli_l3_thread_decorator
#ifdef PRINT_THRINFO
threads[tid] = thread;
#else
// NOTE: The barrier here is very important as it prevents memory being
// released by the chief of some thread sub-group before its peers are done
// using it. See PR #702 for more info [1].
// [1] https://github.com/flame/blis/pull/702
bli_thread_barrier( thread );
// Free the current thread's thrinfo_t structure.
bli_l3_thrinfo_free( rntm_p, thread );
#endif

View File

@@ -121,8 +121,11 @@ err_t bli_l3_sup_thread_decorator
thread
);
/* Synchronize all threads */
#pragma omp barrier
// NOTE: The barrier here is very important as it prevents memory being
// released by the chief of some thread sub-group before its peers are done
// using it. See PR #702 for more info [1].
// [1] https://github.com/flame/blis/pull/702
bli_thread_barrier( thread );
// Free the current thread's thrinfo_t structure.
bli_l3_sup_thrinfo_free( rntm_p, thread );
}