mirror of
https://github.com/amd/blis.git
synced 2026-06-17 02:59:32 +00:00
CPUPL-7578: New thread control API with global and thread-local variants Summary: Add new BLIS thread control APIs that provide fine-grained control over threading with proper global and thread-local (TLS) semantics. Fix several correctness issues where set_num_threads() and set_ways() did not properly override each other's state. New/Modified APIs: bli_thread_set_num_threads() — Sets thread count globally (updates both global_rntm and tl_rntm) bli_thread_set_num_threads_local() — Sets thread count for calling thread only (tl_rntm) bli_thread_get_num_threads() — Returns effective thread count, deriving from ways if set bli_thread_reset() — Resyncs tl_rntm from global_rntm bli_thread_set_ways() — Sets loop factorization (jc, pc, ic, jr, ir) bli_thread_get_is_parallel() — Returns whether parallelism is enabled bli_thread_get_jc_nt/ic_nt/pc_nt/jr_nt/ir_nt() — Returns individual way values b77_thread_set_num_threads_local_() — Fortran-compatible wrapper Bug fixes: bli_thread_set_num_threads() now clears ways (-1) and sets auto_factor=TRUE on both global_rntm and tl_rntm, so it properly overrides prior BLIS_JC_NT/BLIS_IC_NT environment settings bli_thread_set_ways() now propagates to global_rntm (inside mutex) and clears stale num_threads on both global_rntm and tl_rntm, so get_num_threads() returns the product of ways instead of a stale value Fix data race in bli_thread_init_rntm_from_global_rntm() — copy global_rntm under mutex before debug printing Fix data race in set_num_threads_local() debug print Test suite (43 tests, 106 assertions): test_thread_control.c (OpenMP, 23 tests): environment inheritance, global propagation, thread-local isolation, local precedence, per-thread local, reset, nested parallel, edge cases, set_ways, is_parallel, concurrent updates, DGEMM with threads, interleaved settings, persistence, parallel DGEMM, thread pool, reset-to-sync, env ways vs set_num_threads, ways→set_nt→reset, ways→local→reset, round-trip, set_nt→set_ways override, set_ways propagation to new threads test_thread_control_pthread.c (pthread, 20 tests): equivalent coverage plus concurrent set/reset race condition test, set_nt→set_ways override, set_ways propagation via pthread_create Files changed (9 files, +2630/-29 lines): bli_thread.c — Core API implementations and fixes bli_thread.h — New function declarations b77_thread.c — Fortran wrapper test_thread_control.c — OpenMP test suite (23 tests) test_thread_control_pthread.c — pthread test suite (20 tests) TEST_THREAD_CONTROL_README.md — Documentation AMD-Internal: CPUPL-7578