Files
blis/bench/UnitTests
Varaganti, Kiran bb6545a46b Added new thread control API with global and thread-local variants
CPUPL-7578: New thread control API with global and thread-local variants

Summary: Add new BLIS thread control APIs that provide fine-grained control over threading with proper global and thread-local (TLS) semantics. Fix several correctness issues where set_num_threads() and set_ways() did not properly override each other's state.

New/Modified APIs:

bli_thread_set_num_threads() — Sets thread count globally (updates both global_rntm and tl_rntm)
bli_thread_set_num_threads_local() — Sets thread count for calling thread only (tl_rntm)
bli_thread_get_num_threads() — Returns effective thread count, deriving from ways if set
bli_thread_reset() — Resyncs tl_rntm from global_rntm
bli_thread_set_ways() — Sets loop factorization (jc, pc, ic, jr, ir)
bli_thread_get_is_parallel() — Returns whether parallelism is enabled
bli_thread_get_jc_nt/ic_nt/pc_nt/jr_nt/ir_nt() — Returns individual way values
b77_thread_set_num_threads_local_() — Fortran-compatible wrapper
Bug fixes:

bli_thread_set_num_threads() now clears ways (-1) and sets auto_factor=TRUE on both global_rntm and tl_rntm, so it properly overrides prior BLIS_JC_NT/BLIS_IC_NT environment settings
bli_thread_set_ways() now propagates to global_rntm (inside mutex) and clears stale num_threads on both global_rntm and tl_rntm, so get_num_threads() returns the product of ways instead of a stale value
Fix data race in bli_thread_init_rntm_from_global_rntm() — copy global_rntm under mutex before debug printing
Fix data race in set_num_threads_local() debug print
Test suite (43 tests, 106 assertions):

test_thread_control.c (OpenMP, 23 tests): environment inheritance, global propagation, thread-local isolation, local precedence, per-thread local, reset, nested parallel, edge cases, set_ways, is_parallel, concurrent updates, DGEMM with threads, interleaved settings, persistence, parallel DGEMM, thread pool, reset-to-sync, env ways vs set_num_threads, ways→set_nt→reset, ways→local→reset, round-trip, set_nt→set_ways override, set_ways propagation to new threads
test_thread_control_pthread.c (pthread, 20 tests): equivalent coverage plus concurrent set/reset race condition test, set_nt→set_ways override, set_ways propagation via pthread_create
Files changed (9 files, +2630/-29 lines):

bli_thread.c — Core API implementations and fixes
bli_thread.h — New function declarations
b77_thread.c — Fortran wrapper
test_thread_control.c — OpenMP test suite (23 tests)
test_thread_control_pthread.c — pthread test suite (20 tests)
TEST_THREAD_CONTROL_README.md — Documentation
AMD-Internal: CPUPL-7578
2026-03-06 12:16:17 +05:30
..