Fix concurrency consistency for internals_pp_manager under multiple-interpreters (#5947)

* Add per-interpreter storage for `gil_safe_call_once_and_store`

* Disable thread local cache for `internals_pp_manager`

* Disable thread local cache for `internals_pp_manager` for multi-interpreter only

* Use anonymous namespace to separate these type_ids from other tests with the same class names.

* style: pre-commit fixes

* Revert internals_pp_manager changes

* This is the crux of fix for the subinterpreter_before_main failure.

The pre_init needs to check if it is in a subinterpreter or not. But in 3.13+ this static initializer runs in the main interpreter.  So we need to check this later, during the exec phase.

* Continue to do the ensure in both places, there might be a reason it was where it was...

Should not hurt anything to do it extra times here.

* Change get_num_interpreters_seen to a boolean flag instead.

The count was not used, it was just checked for > 1, we now accomplish this by setting the flag.

* Spelling typo

* Work around older python versions, only need this check for newish versions

* Add more comments for test case

* Add more comments for test case

* Stop traceback propagation

* Re-enable subinterpreter support on ubuntu 3.14 builds

Was disabled in e4873e8

* As suggested, don't use an anonymous namespace.

* Typo in test assert format string

* Use a more appropriate function name

* Fix mod_per_interpreter_gil* output directory on Windows/MSVC

On Windows with MSVC (multi-configuration generators), CMake uses
config-specific properties like LIBRARY_OUTPUT_DIRECTORY_DEBUG when
set, otherwise falls back to LIBRARY_OUTPUT_DIRECTORY/<Config>/.

The main test modules (pybind11_tests, etc.) correctly set both
LIBRARY_OUTPUT_DIRECTORY and the config-specific variants (lines
517-528), so they output directly to tests/.

However, the mod_per_interpreter_gil* modules only copied the base
LIBRARY_OUTPUT_DIRECTORY property, causing them to be placed in
tests/Debug/ instead of tests/.

This mismatch caused test_import_in_subinterpreter_concurrently and
related tests to fail with ModuleNotFoundError on Windows Python 3.14,
because the test code sets sys.path based on pybind11_tests.__file__
(which is in tests/) but tries to import mod_per_interpreter_gil_with_singleton
(which ended up in tests/Debug/).

This bug was previously masked by @pytest.mark.xfail decorators on
these tests. Now that the underlying "Duplicate C++ type registration"
issue is fixed and the xfails are removed, this path issue surfaced.

The fix mirrors the same pattern used for main test targets: also set
LIBRARY_OUTPUT_DIRECTORY_<CONFIG> for each configuration type.

* Remove unneeded `pytest.importorskip`

* Remove comment

---------

Co-authored-by: b-pass <b-pass@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Ralf W. Grosse-Kunstleve <rgrossekunst@nvidia.com>
This commit is contained in:
Xuehai Pan
2025-12-27 02:59:11 +08:00
committed by GitHub
parent 0057e4945d
commit fee2527dfa
10 changed files with 72 additions and 56 deletions

View File

@@ -88,7 +88,7 @@ jobs:
cmake-args: -DCMAKE_CXX_STANDARD=20 -DPYBIND11_DISABLE_HANDLE_TYPE_NAME_DEFAULT_IMPLEMENTATION=ON cmake-args: -DCMAKE_CXX_STANDARD=20 -DPYBIND11_DISABLE_HANDLE_TYPE_NAME_DEFAULT_IMPLEMENTATION=ON
- runs-on: ubuntu-latest - runs-on: ubuntu-latest
python-version: '3.14' python-version: '3.14'
cmake-args: -DCMAKE_CXX_STANDARD=14 -DCMAKE_CXX_FLAGS="-DPYBIND11_HAS_SUBINTERPRETER_SUPPORT=0" cmake-args: -DCMAKE_CXX_STANDARD=14
- runs-on: ubuntu-latest - runs-on: ubuntu-latest
python-version: 'pypy-3.10' python-version: 'pypy-3.10'
cmake-args: -DCMAKE_CXX_STANDARD=14 cmake-args: -DCMAKE_CXX_STANDARD=14

View File

@@ -441,12 +441,11 @@ Note that this is run once for each (sub-)interpreter the module is imported int
possibly concurrently. The PyModuleDef is allowed to be static, but the PyObject* resulting from possibly concurrently. The PyModuleDef is allowed to be static, but the PyObject* resulting from
PyModuleDef_Init should be treated like any other PyObject (so not shared across interpreters). PyModuleDef_Init should be treated like any other PyObject (so not shared across interpreters).
*/ */
#define PYBIND11_MODULE_PYINIT(name, pre_init, ...) \ #define PYBIND11_MODULE_PYINIT(name, ...) \
static int PYBIND11_CONCAT(pybind11_exec_, name)(PyObject *); \ static int PYBIND11_CONCAT(pybind11_exec_, name)(PyObject *); \
PYBIND11_PLUGIN_IMPL(name) { \ PYBIND11_PLUGIN_IMPL(name) { \
PYBIND11_CHECK_PYTHON_VERSION \ PYBIND11_CHECK_PYTHON_VERSION \
pre_init; \ pybind11::detail::ensure_internals(); \
PYBIND11_ENSURE_INTERNALS_READY \
static ::pybind11::detail::slots_array mod_def_slots = ::pybind11::detail::init_slots( \ static ::pybind11::detail::slots_array mod_def_slots = ::pybind11::detail::init_slots( \
&PYBIND11_CONCAT(pybind11_exec_, name), ##__VA_ARGS__); \ &PYBIND11_CONCAT(pybind11_exec_, name), ##__VA_ARGS__); \
static PyModuleDef def{/* m_base */ PyModuleDef_HEAD_INIT, \ static PyModuleDef def{/* m_base */ PyModuleDef_HEAD_INIT, \
@@ -465,6 +464,7 @@ PyModuleDef_Init should be treated like any other PyObject (so not shared across
static void PYBIND11_CONCAT(pybind11_init_, name)(::pybind11::module_ &); \ static void PYBIND11_CONCAT(pybind11_init_, name)(::pybind11::module_ &); \
int PYBIND11_CONCAT(pybind11_exec_, name)(PyObject * pm) { \ int PYBIND11_CONCAT(pybind11_exec_, name)(PyObject * pm) { \
try { \ try { \
pybind11::detail::ensure_internals(); \
auto m = pybind11::reinterpret_borrow<::pybind11::module_>(pm); \ auto m = pybind11::reinterpret_borrow<::pybind11::module_>(pm); \
if (!pybind11::detail::get_cached_module(m.attr("__spec__").attr("name"))) { \ if (!pybind11::detail::get_cached_module(m.attr("__spec__").attr("name"))) { \
PYBIND11_CONCAT(pybind11_init_, name)(m); \ PYBIND11_CONCAT(pybind11_init_, name)(m); \
@@ -518,8 +518,7 @@ PyModuleDef_Init should be treated like any other PyObject (so not shared across
\endrst */ \endrst */
#define PYBIND11_MODULE(name, variable, ...) \ #define PYBIND11_MODULE(name, variable, ...) \
PYBIND11_MODULE_PYINIT( \ PYBIND11_MODULE_PYINIT(name, ##__VA_ARGS__) \
name, (pybind11::detail::get_num_interpreters_seen() += 1), ##__VA_ARGS__) \
PYBIND11_MODULE_EXEC(name, variable) PYBIND11_MODULE_EXEC(name, variable)
// pop gnu-zero-variadic-macro-arguments // pop gnu-zero-variadic-macro-arguments

View File

@@ -418,11 +418,12 @@ inline PyThreadState *get_thread_state_unchecked() {
#endif #endif
} }
/// We use this counter to figure out if there are or have been multiple subinterpreters active at /// We use this to figure out if there are or have been multiple subinterpreters active at any
/// any point. This must never decrease while any interpreter may be running in any thread! /// point. This must never go from true to false while any interpreter may be running in any
inline std::atomic<int64_t> &get_num_interpreters_seen() { /// thread!
static std::atomic<int64_t> counter(0); inline std::atomic_bool &has_seen_non_main_interpreter() {
return counter; static std::atomic_bool multi(false);
return multi;
} }
template <class T, template <class T,
@@ -649,7 +650,7 @@ public:
/// acquire the GIL. Will never return nullptr. /// acquire the GIL. Will never return nullptr.
std::unique_ptr<InternalsType> *get_pp() { std::unique_ptr<InternalsType> *get_pp() {
#ifdef PYBIND11_HAS_SUBINTERPRETER_SUPPORT #ifdef PYBIND11_HAS_SUBINTERPRETER_SUPPORT
if (get_num_interpreters_seen() > 1) { if (has_seen_non_main_interpreter()) {
// Whenever the interpreter changes on the current thread we need to invalidate the // Whenever the interpreter changes on the current thread we need to invalidate the
// internals_pp so that it can be pulled from the interpreter's state dict. That is // internals_pp so that it can be pulled from the interpreter's state dict. That is
// slow, so we use the current PyThreadState to check if it is necessary. // slow, so we use the current PyThreadState to check if it is necessary.
@@ -675,7 +676,7 @@ public:
/// Drop all the references we're currently holding. /// Drop all the references we're currently holding.
void unref() { void unref() {
#ifdef PYBIND11_HAS_SUBINTERPRETER_SUPPORT #ifdef PYBIND11_HAS_SUBINTERPRETER_SUPPORT
if (get_num_interpreters_seen() > 1) { if (has_seen_non_main_interpreter()) {
last_istate_tls() = nullptr; last_istate_tls() = nullptr;
internals_p_tls() = nullptr; internals_p_tls() = nullptr;
return; return;
@@ -686,7 +687,7 @@ public:
void destroy() { void destroy() {
#ifdef PYBIND11_HAS_SUBINTERPRETER_SUPPORT #ifdef PYBIND11_HAS_SUBINTERPRETER_SUPPORT
if (get_num_interpreters_seen() > 1) { if (has_seen_non_main_interpreter()) {
auto *tstate = get_thread_state_unchecked(); auto *tstate = get_thread_state_unchecked();
// this could be called without an active interpreter, just use what was cached // this could be called without an active interpreter, just use what was cached
if (!tstate || tstate->interp == last_istate_tls()) { if (!tstate || tstate->interp == last_istate_tls()) {
@@ -791,6 +792,16 @@ PYBIND11_NOINLINE internals &get_internals() {
return *internals_ptr; return *internals_ptr;
} }
inline void ensure_internals() {
pybind11::detail::get_internals_pp_manager().unref();
#ifdef PYBIND11_HAS_SUBINTERPRETER_SUPPORT
if (PyInterpreterState_Get() != PyInterpreterState_Main()) {
has_seen_non_main_interpreter() = true;
}
#endif
pybind11::detail::get_internals();
}
inline internals_pp_manager<local_internals> &get_local_internals_pp_manager() { inline internals_pp_manager<local_internals> &get_local_internals_pp_manager() {
// Use the address of this static itself as part of the key, so that the value is uniquely tied // Use the address of this static itself as part of the key, so that the value is uniquely tied
// to where the module is loaded in memory // to where the module is loaded in memory

View File

@@ -58,7 +58,7 @@
PYBIND11_WARNING_PUSH PYBIND11_WARNING_PUSH
PYBIND11_WARNING_DISABLE_CLANG("-Wgnu-zero-variadic-macro-arguments") PYBIND11_WARNING_DISABLE_CLANG("-Wgnu-zero-variadic-macro-arguments")
#define PYBIND11_EMBEDDED_MODULE(name, variable, ...) \ #define PYBIND11_EMBEDDED_MODULE(name, variable, ...) \
PYBIND11_MODULE_PYINIT(name, {}, ##__VA_ARGS__) \ PYBIND11_MODULE_PYINIT(name, ##__VA_ARGS__) \
::pybind11::detail::embedded_module PYBIND11_CONCAT(pybind11_module_, name)( \ ::pybind11::detail::embedded_module PYBIND11_CONCAT(pybind11_module_, name)( \
PYBIND11_TOSTRING(name), PYBIND11_CONCAT(PyInit_, name)); \ PYBIND11_TOSTRING(name), PYBIND11_CONCAT(PyInit_, name)); \
PYBIND11_MODULE_EXEC(name, variable) PYBIND11_MODULE_EXEC(name, variable)
@@ -202,7 +202,7 @@ inline void initialize_interpreter(bool init_signal_handlers = true,
#endif #endif
// There is exactly one interpreter alive currently. // There is exactly one interpreter alive currently.
detail::get_num_interpreters_seen() = 1; detail::has_seen_non_main_interpreter() = false;
} }
/** \rst /** \rst
@@ -242,12 +242,12 @@ inline void initialize_interpreter(bool init_signal_handlers = true,
\endrst */ \endrst */
inline void finalize_interpreter() { inline void finalize_interpreter() {
// get rid of any thread-local interpreter cache that currently exists // get rid of any thread-local interpreter cache that currently exists
if (detail::get_num_interpreters_seen() > 1) { if (detail::has_seen_non_main_interpreter()) {
detail::get_internals_pp_manager().unref(); detail::get_internals_pp_manager().unref();
detail::get_local_internals_pp_manager().unref(); detail::get_local_internals_pp_manager().unref();
// We know there can be no other interpreter alive now, so we can lower the count // We know there can be no other interpreter alive now
detail::get_num_interpreters_seen() = 1; detail::has_seen_non_main_interpreter() = false;
} }
// Re-fetch the internals pointer-to-pointer (but not the internals itself, which might not // Re-fetch the internals pointer-to-pointer (but not the internals itself, which might not
@@ -265,8 +265,8 @@ inline void finalize_interpreter() {
// avoid undefined behaviors when initializing another interpreter // avoid undefined behaviors when initializing another interpreter
detail::get_local_internals_pp_manager().destroy(); detail::get_local_internals_pp_manager().destroy();
// We know there is no interpreter alive now, so we can reset the count // We know there is no interpreter alive now, so we can reset the multi-flag
detail::get_num_interpreters_seen() = 0; detail::has_seen_non_main_interpreter() = false;
} }
/** \rst /** \rst

View File

@@ -232,7 +232,7 @@ private:
// Indicator of fast path for single-interpreter case. // Indicator of fast path for single-interpreter case.
bool is_last_storage_valid() const { bool is_last_storage_valid() const {
return is_initialized_by_at_least_one_interpreter_ return is_initialized_by_at_least_one_interpreter_
&& detail::get_num_interpreters_seen() == 1; && !detail::has_seen_non_main_interpreter();
} }
// Get the unique key for this storage instance in the interpreter's state dict. // Get the unique key for this storage instance in the interpreter's state dict.

View File

@@ -109,7 +109,7 @@ public:
// upon success, the new interpreter is activated in this thread // upon success, the new interpreter is activated in this thread
result.istate_ = result.creation_tstate_->interp; result.istate_ = result.creation_tstate_->interp;
detail::get_num_interpreters_seen() += 1; // there are now many interpreters detail::has_seen_non_main_interpreter() = true;
detail::get_internals(); // initialize internals.tstate, amongst other things... detail::get_internals(); // initialize internals.tstate, amongst other things...
// In 3.13+ this state should be deleted right away, and the memory will be reused for // In 3.13+ this state should be deleted right away, and the memory will be reused for

View File

@@ -592,6 +592,14 @@ if(NOT PYBIND11_CUDA_TESTS)
foreach(mod IN LISTS PYBIND11_MULTIPLE_INTERPRETERS_TEST_MODULES) foreach(mod IN LISTS PYBIND11_MULTIPLE_INTERPRETERS_TEST_MODULES)
set_target_properties("${mod}" PROPERTIES LIBRARY_OUTPUT_DIRECTORY set_target_properties("${mod}" PROPERTIES LIBRARY_OUTPUT_DIRECTORY
"${pybind11_tests_output_directory}") "${pybind11_tests_output_directory}")
# Also set config-specific output directories for multi-configuration generators (MSVC)
if(DEFINED CMAKE_CONFIGURATION_TYPES)
foreach(config ${CMAKE_CONFIGURATION_TYPES})
string(TOUPPER ${config} config)
set_target_properties("${mod}" PROPERTIES LIBRARY_OUTPUT_DIRECTORY_${config}
"${pybind11_tests_output_directory}")
endforeach()
endif()
endforeach() endforeach()
if(PYBIND11_TEST_SMART_HOLDER) if(PYBIND11_TEST_SMART_HOLDER)

View File

@@ -9,6 +9,8 @@ namespace py = pybind11;
# include <pybind11/native_enum.h> # include <pybind11/native_enum.h>
#endif #endif
namespace pybind11_tests {
namespace mod_per_interpreter_gil_with_singleton {
// A singleton class that holds references to certain Python objects // A singleton class that holds references to certain Python objects
// This singleton is per-interpreter using gil_safe_call_once_and_store // This singleton is per-interpreter using gil_safe_call_once_and_store
class MySingleton { class MySingleton {
@@ -95,11 +97,15 @@ enum class MyEnum : int {
TWO = 2, TWO = 2,
THREE = 3, THREE = 3,
}; };
} // namespace mod_per_interpreter_gil_with_singleton
} // namespace pybind11_tests
PYBIND11_MODULE(mod_per_interpreter_gil_with_singleton, PYBIND11_MODULE(mod_per_interpreter_gil_with_singleton,
m, m,
py::mod_gil_not_used(), py::mod_gil_not_used(),
py::multiple_interpreters::per_interpreter_gil()) { py::multiple_interpreters::per_interpreter_gil()) {
using namespace pybind11_tests::mod_per_interpreter_gil_with_singleton;
#ifdef PYBIND11_HAS_SUBINTERPRETER_SUPPORT #ifdef PYBIND11_HAS_SUBINTERPRETER_SUPPORT
m.attr("defined_PYBIND11_HAS_SUBINTERPRETER_SUPPORT") = true; m.attr("defined_PYBIND11_HAS_SUBINTERPRETER_SUPPORT") = true;
#else #else

View File

@@ -90,7 +90,7 @@ def test_independent_subinterpreters():
run_string, create = get_interpreters(modern=True) run_string, create = get_interpreters(modern=True)
m = pytest.importorskip("mod_per_interpreter_gil") import mod_per_interpreter_gil as m
if not m.defined_PYBIND11_HAS_SUBINTERPRETER_SUPPORT: if not m.defined_PYBIND11_HAS_SUBINTERPRETER_SUPPORT:
pytest.skip("Does not have subinterpreter support compiled in") pytest.skip("Does not have subinterpreter support compiled in")
@@ -139,7 +139,7 @@ def test_independent_subinterpreters_modern():
sys.path.insert(0, os.path.dirname(pybind11_tests.__file__)) sys.path.insert(0, os.path.dirname(pybind11_tests.__file__))
m = pytest.importorskip("mod_per_interpreter_gil") import mod_per_interpreter_gil as m
if not m.defined_PYBIND11_HAS_SUBINTERPRETER_SUPPORT: if not m.defined_PYBIND11_HAS_SUBINTERPRETER_SUPPORT:
pytest.skip("Does not have subinterpreter support compiled in") pytest.skip("Does not have subinterpreter support compiled in")
@@ -187,7 +187,7 @@ def test_dependent_subinterpreters():
run_string, create = get_interpreters(modern=False) run_string, create = get_interpreters(modern=False)
m = pytest.importorskip("mod_shared_interpreter_gil") import mod_shared_interpreter_gil as m
if not m.defined_PYBIND11_HAS_SUBINTERPRETER_SUPPORT: if not m.defined_PYBIND11_HAS_SUBINTERPRETER_SUPPORT:
pytest.skip("Does not have subinterpreter support compiled in") pytest.skip("Does not have subinterpreter support compiled in")
@@ -222,15 +222,27 @@ PREAMBLE_CODE = textwrap.dedent(
objects = m.get_objects_in_singleton() objects = m.get_objects_in_singleton()
expected = [ expected = [
type(None), type(None), # static type: shared between interpreters
tuple, tuple, # static type: shared between interpreters
list, list, # static type: shared between interpreters
dict, dict, # static type: shared between interpreters
collections.OrderedDict, collections.OrderedDict, # static type: shared between interpreters
collections.defaultdict, collections.defaultdict, # heap type: dynamically created per interpreter
collections.deque, collections.deque, # heap type: dynamically created per interpreter
] ]
assert objects == expected, f"Expected {{expected!r}}, got {{objects!r}}." # Check that we have the expected objects. Avoid IndexError by checking lengths first.
assert len(objects) == len(expected), (
f"Expected {{expected!r}} ({{len(expected)}}), got {{objects!r}} ({{len(objects)}})."
)
# The first ones are static types shared between interpreters.
assert objects[:-2] == expected[:-2], (
f"Expected static objects {{expected[:-2]!r}}, got {{objects[:-2]!r}}."
)
# The last two are heap types created per-interpreter.
# The expected objects are dynamically imported from `collections`.
assert objects[-2:] == expected[-2:], (
f"Expected heap objects {{expected[-2:]!r}}, got {{objects[-2:]!r}}."
)
assert hasattr(m, 'MyClass'), "Module missing MyClass" assert hasattr(m, 'MyClass'), "Module missing MyClass"
assert hasattr(m, 'MyGlobalError'), "Module missing MyGlobalError" assert hasattr(m, 'MyGlobalError'), "Module missing MyGlobalError"
@@ -240,11 +252,6 @@ PREAMBLE_CODE = textwrap.dedent(
).lstrip() ).lstrip()
@pytest.mark.xfail(
reason="Duplicate C++ type registration under multiple-interpreters, needs investigation.",
# raises=interpreters.ExecutionFailed, # need to import the module
strict=False,
)
@pytest.mark.skipif( @pytest.mark.skipif(
sys.platform.startswith("emscripten"), reason="Requires loadable modules" sys.platform.startswith("emscripten"), reason="Requires loadable modules"
) )
@@ -278,14 +285,9 @@ def check_script_success_in_subprocess(code: str, *, rerun: int = 8) -> None:
f"```\n\n" f"```\n\n"
f"Output:\n" f"Output:\n"
f"{ex.output}" f"{ex.output}"
) from ex ) from None
@pytest.mark.xfail(
reason="Duplicate C++ type registration under multiple-interpreters, needs investigation.",
raises=RuntimeError,
strict=False,
)
@pytest.mark.skipif( @pytest.mark.skipif(
sys.platform.startswith("emscripten"), reason="Requires loadable modules" sys.platform.startswith("emscripten"), reason="Requires loadable modules"
) )
@@ -342,11 +344,6 @@ def test_import_in_subinterpreter_after_main():
) )
@pytest.mark.xfail(
reason="Duplicate C++ type registration under multiple-interpreters, needs investigation.",
raises=RuntimeError,
strict=False,
)
@pytest.mark.skipif( @pytest.mark.skipif(
sys.platform.startswith("emscripten"), reason="Requires loadable modules" sys.platform.startswith("emscripten"), reason="Requires loadable modules"
) )
@@ -427,11 +424,6 @@ def test_import_in_subinterpreter_before_main():
) )
@pytest.mark.xfail(
reason="Duplicate C++ type registration under multiple-interpreters, needs investigation.",
raises=RuntimeError,
strict=False,
)
@pytest.mark.skipif( @pytest.mark.skipif(
sys.platform.startswith("emscripten"), reason="Requires loadable modules" sys.platform.startswith("emscripten"), reason="Requires loadable modules"
) )

View File

@@ -31,7 +31,7 @@ void unsafe_reset_internals_for_single_interpreter() {
py::detail::get_local_internals_pp_manager().unref(); py::detail::get_local_internals_pp_manager().unref();
// we know there are no other interpreters, so we can lower this. SUPER DANGEROUS // we know there are no other interpreters, so we can lower this. SUPER DANGEROUS
py::detail::get_num_interpreters_seen() = 1; py::detail::has_seen_non_main_interpreter() = false;
// now we unref the static global singleton internals // now we unref the static global singleton internals
py::detail::get_internals_pp_manager().unref(); py::detail::get_internals_pp_manager().unref();