Fix concurrency consistency for internals_pp_manager under multiple-interpreters (#5947)

* Add per-interpreter storage for `gil_safe_call_once_and_store`

* Disable thread local cache for `internals_pp_manager`

* Disable thread local cache for `internals_pp_manager` for multi-interpreter only

* Use anonymous namespace to separate these type_ids from other tests with the same class names.

* style: pre-commit fixes

* Revert internals_pp_manager changes

* This is the crux of fix for the subinterpreter_before_main failure.

The pre_init needs to check if it is in a subinterpreter or not. But in 3.13+ this static initializer runs in the main interpreter.  So we need to check this later, during the exec phase.

* Continue to do the ensure in both places, there might be a reason it was where it was...

Should not hurt anything to do it extra times here.

* Change get_num_interpreters_seen to a boolean flag instead.

The count was not used, it was just checked for > 1, we now accomplish this by setting the flag.

* Spelling typo

* Work around older python versions, only need this check for newish versions

* Add more comments for test case

* Add more comments for test case

* Stop traceback propagation

* Re-enable subinterpreter support on ubuntu 3.14 builds

Was disabled in e4873e8

* As suggested, don't use an anonymous namespace.

* Typo in test assert format string

* Use a more appropriate function name

* Fix mod_per_interpreter_gil* output directory on Windows/MSVC

On Windows with MSVC (multi-configuration generators), CMake uses
config-specific properties like LIBRARY_OUTPUT_DIRECTORY_DEBUG when
set, otherwise falls back to LIBRARY_OUTPUT_DIRECTORY/<Config>/.

The main test modules (pybind11_tests, etc.) correctly set both
LIBRARY_OUTPUT_DIRECTORY and the config-specific variants (lines
517-528), so they output directly to tests/.

However, the mod_per_interpreter_gil* modules only copied the base
LIBRARY_OUTPUT_DIRECTORY property, causing them to be placed in
tests/Debug/ instead of tests/.

This mismatch caused test_import_in_subinterpreter_concurrently and
related tests to fail with ModuleNotFoundError on Windows Python 3.14,
because the test code sets sys.path based on pybind11_tests.__file__
(which is in tests/) but tries to import mod_per_interpreter_gil_with_singleton
(which ended up in tests/Debug/).

This bug was previously masked by @pytest.mark.xfail decorators on
these tests. Now that the underlying "Duplicate C++ type registration"
issue is fixed and the xfails are removed, this path issue surfaced.

The fix mirrors the same pattern used for main test targets: also set
LIBRARY_OUTPUT_DIRECTORY_<CONFIG> for each configuration type.

* Remove unneeded `pytest.importorskip`

* Remove comment

---------

Co-authored-by: b-pass <b-pass@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Ralf W. Grosse-Kunstleve <rgrossekunst@nvidia.com>
This commit is contained in:
Xuehai Pan
2025-12-27 02:59:11 +08:00
committed by GitHub
parent 0057e4945d
commit fee2527dfa
10 changed files with 72 additions and 56 deletions

View File

@@ -441,12 +441,11 @@ Note that this is run once for each (sub-)interpreter the module is imported int
possibly concurrently. The PyModuleDef is allowed to be static, but the PyObject* resulting from
PyModuleDef_Init should be treated like any other PyObject (so not shared across interpreters).
*/
#define PYBIND11_MODULE_PYINIT(name, pre_init, ...) \
#define PYBIND11_MODULE_PYINIT(name, ...) \
static int PYBIND11_CONCAT(pybind11_exec_, name)(PyObject *); \
PYBIND11_PLUGIN_IMPL(name) { \
PYBIND11_CHECK_PYTHON_VERSION \
pre_init; \
PYBIND11_ENSURE_INTERNALS_READY \
pybind11::detail::ensure_internals(); \
static ::pybind11::detail::slots_array mod_def_slots = ::pybind11::detail::init_slots( \
&PYBIND11_CONCAT(pybind11_exec_, name), ##__VA_ARGS__); \
static PyModuleDef def{/* m_base */ PyModuleDef_HEAD_INIT, \
@@ -465,6 +464,7 @@ PyModuleDef_Init should be treated like any other PyObject (so not shared across
static void PYBIND11_CONCAT(pybind11_init_, name)(::pybind11::module_ &); \
int PYBIND11_CONCAT(pybind11_exec_, name)(PyObject * pm) { \
try { \
pybind11::detail::ensure_internals(); \
auto m = pybind11::reinterpret_borrow<::pybind11::module_>(pm); \
if (!pybind11::detail::get_cached_module(m.attr("__spec__").attr("name"))) { \
PYBIND11_CONCAT(pybind11_init_, name)(m); \
@@ -518,8 +518,7 @@ PyModuleDef_Init should be treated like any other PyObject (so not shared across
\endrst */
#define PYBIND11_MODULE(name, variable, ...) \
PYBIND11_MODULE_PYINIT( \
name, (pybind11::detail::get_num_interpreters_seen() += 1), ##__VA_ARGS__) \
PYBIND11_MODULE_PYINIT(name, ##__VA_ARGS__) \
PYBIND11_MODULE_EXEC(name, variable)
// pop gnu-zero-variadic-macro-arguments

View File

@@ -418,11 +418,12 @@ inline PyThreadState *get_thread_state_unchecked() {
#endif
}
/// We use this counter to figure out if there are or have been multiple subinterpreters active at
/// any point. This must never decrease while any interpreter may be running in any thread!
inline std::atomic<int64_t> &get_num_interpreters_seen() {
static std::atomic<int64_t> counter(0);
return counter;
/// We use this to figure out if there are or have been multiple subinterpreters active at any
/// point. This must never go from true to false while any interpreter may be running in any
/// thread!
inline std::atomic_bool &has_seen_non_main_interpreter() {
static std::atomic_bool multi(false);
return multi;
}
template <class T,
@@ -649,7 +650,7 @@ public:
/// acquire the GIL. Will never return nullptr.
std::unique_ptr<InternalsType> *get_pp() {
#ifdef PYBIND11_HAS_SUBINTERPRETER_SUPPORT
if (get_num_interpreters_seen() > 1) {
if (has_seen_non_main_interpreter()) {
// Whenever the interpreter changes on the current thread we need to invalidate the
// internals_pp so that it can be pulled from the interpreter's state dict. That is
// slow, so we use the current PyThreadState to check if it is necessary.
@@ -675,7 +676,7 @@ public:
/// Drop all the references we're currently holding.
void unref() {
#ifdef PYBIND11_HAS_SUBINTERPRETER_SUPPORT
if (get_num_interpreters_seen() > 1) {
if (has_seen_non_main_interpreter()) {
last_istate_tls() = nullptr;
internals_p_tls() = nullptr;
return;
@@ -686,7 +687,7 @@ public:
void destroy() {
#ifdef PYBIND11_HAS_SUBINTERPRETER_SUPPORT
if (get_num_interpreters_seen() > 1) {
if (has_seen_non_main_interpreter()) {
auto *tstate = get_thread_state_unchecked();
// this could be called without an active interpreter, just use what was cached
if (!tstate || tstate->interp == last_istate_tls()) {
@@ -791,6 +792,16 @@ PYBIND11_NOINLINE internals &get_internals() {
return *internals_ptr;
}
inline void ensure_internals() {
pybind11::detail::get_internals_pp_manager().unref();
#ifdef PYBIND11_HAS_SUBINTERPRETER_SUPPORT
if (PyInterpreterState_Get() != PyInterpreterState_Main()) {
has_seen_non_main_interpreter() = true;
}
#endif
pybind11::detail::get_internals();
}
inline internals_pp_manager<local_internals> &get_local_internals_pp_manager() {
// Use the address of this static itself as part of the key, so that the value is uniquely tied
// to where the module is loaded in memory

View File

@@ -58,7 +58,7 @@
PYBIND11_WARNING_PUSH
PYBIND11_WARNING_DISABLE_CLANG("-Wgnu-zero-variadic-macro-arguments")
#define PYBIND11_EMBEDDED_MODULE(name, variable, ...) \
PYBIND11_MODULE_PYINIT(name, {}, ##__VA_ARGS__) \
PYBIND11_MODULE_PYINIT(name, ##__VA_ARGS__) \
::pybind11::detail::embedded_module PYBIND11_CONCAT(pybind11_module_, name)( \
PYBIND11_TOSTRING(name), PYBIND11_CONCAT(PyInit_, name)); \
PYBIND11_MODULE_EXEC(name, variable)
@@ -202,7 +202,7 @@ inline void initialize_interpreter(bool init_signal_handlers = true,
#endif
// There is exactly one interpreter alive currently.
detail::get_num_interpreters_seen() = 1;
detail::has_seen_non_main_interpreter() = false;
}
/** \rst
@@ -242,12 +242,12 @@ inline void initialize_interpreter(bool init_signal_handlers = true,
\endrst */
inline void finalize_interpreter() {
// get rid of any thread-local interpreter cache that currently exists
if (detail::get_num_interpreters_seen() > 1) {
if (detail::has_seen_non_main_interpreter()) {
detail::get_internals_pp_manager().unref();
detail::get_local_internals_pp_manager().unref();
// We know there can be no other interpreter alive now, so we can lower the count
detail::get_num_interpreters_seen() = 1;
// We know there can be no other interpreter alive now
detail::has_seen_non_main_interpreter() = false;
}
// Re-fetch the internals pointer-to-pointer (but not the internals itself, which might not
@@ -265,8 +265,8 @@ inline void finalize_interpreter() {
// avoid undefined behaviors when initializing another interpreter
detail::get_local_internals_pp_manager().destroy();
// We know there is no interpreter alive now, so we can reset the count
detail::get_num_interpreters_seen() = 0;
// We know there is no interpreter alive now, so we can reset the multi-flag
detail::has_seen_non_main_interpreter() = false;
}
/** \rst

View File

@@ -232,7 +232,7 @@ private:
// Indicator of fast path for single-interpreter case.
bool is_last_storage_valid() const {
return is_initialized_by_at_least_one_interpreter_
&& detail::get_num_interpreters_seen() == 1;
&& !detail::has_seen_non_main_interpreter();
}
// Get the unique key for this storage instance in the interpreter's state dict.

View File

@@ -109,7 +109,7 @@ public:
// upon success, the new interpreter is activated in this thread
result.istate_ = result.creation_tstate_->interp;
detail::get_num_interpreters_seen() += 1; // there are now many interpreters
detail::has_seen_non_main_interpreter() = true;
detail::get_internals(); // initialize internals.tstate, amongst other things...
// In 3.13+ this state should be deleted right away, and the memory will be reused for