Fix concurrency consistency for internals_pp_manager under multiple-interpreters (#5947)

* Add per-interpreter storage for `gil_safe_call_once_and_store`

* Disable thread local cache for `internals_pp_manager`

* Disable thread local cache for `internals_pp_manager` for multi-interpreter only

* Use anonymous namespace to separate these type_ids from other tests with the same class names.

* style: pre-commit fixes

* Revert internals_pp_manager changes

* This is the crux of fix for the subinterpreter_before_main failure.

The pre_init needs to check if it is in a subinterpreter or not. But in 3.13+ this static initializer runs in the main interpreter.  So we need to check this later, during the exec phase.

* Continue to do the ensure in both places, there might be a reason it was where it was...

Should not hurt anything to do it extra times here.

* Change get_num_interpreters_seen to a boolean flag instead.

The count was not used, it was just checked for > 1, we now accomplish this by setting the flag.

* Spelling typo

* Work around older python versions, only need this check for newish versions

* Add more comments for test case

* Add more comments for test case

* Stop traceback propagation

* Re-enable subinterpreter support on ubuntu 3.14 builds

Was disabled in e4873e8

* As suggested, don't use an anonymous namespace.

* Typo in test assert format string

* Use a more appropriate function name

* Fix mod_per_interpreter_gil* output directory on Windows/MSVC

On Windows with MSVC (multi-configuration generators), CMake uses
config-specific properties like LIBRARY_OUTPUT_DIRECTORY_DEBUG when
set, otherwise falls back to LIBRARY_OUTPUT_DIRECTORY/<Config>/.

The main test modules (pybind11_tests, etc.) correctly set both
LIBRARY_OUTPUT_DIRECTORY and the config-specific variants (lines
517-528), so they output directly to tests/.

However, the mod_per_interpreter_gil* modules only copied the base
LIBRARY_OUTPUT_DIRECTORY property, causing them to be placed in
tests/Debug/ instead of tests/.

This mismatch caused test_import_in_subinterpreter_concurrently and
related tests to fail with ModuleNotFoundError on Windows Python 3.14,
because the test code sets sys.path based on pybind11_tests.__file__
(which is in tests/) but tries to import mod_per_interpreter_gil_with_singleton
(which ended up in tests/Debug/).

This bug was previously masked by @pytest.mark.xfail decorators on
these tests. Now that the underlying "Duplicate C++ type registration"
issue is fixed and the xfails are removed, this path issue surfaced.

The fix mirrors the same pattern used for main test targets: also set
LIBRARY_OUTPUT_DIRECTORY_<CONFIG> for each configuration type.

* Remove unneeded `pytest.importorskip`

* Remove comment

---------

Co-authored-by: b-pass <b-pass@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Ralf W. Grosse-Kunstleve <rgrossekunst@nvidia.com>
This commit is contained in:
Xuehai Pan
2025-12-27 02:59:11 +08:00
committed by GitHub
parent 0057e4945d
commit fee2527dfa
10 changed files with 72 additions and 56 deletions

View File

@@ -88,7 +88,7 @@ jobs:
cmake-args: -DCMAKE_CXX_STANDARD=20 -DPYBIND11_DISABLE_HANDLE_TYPE_NAME_DEFAULT_IMPLEMENTATION=ON
- runs-on: ubuntu-latest
python-version: '3.14'
cmake-args: -DCMAKE_CXX_STANDARD=14 -DCMAKE_CXX_FLAGS="-DPYBIND11_HAS_SUBINTERPRETER_SUPPORT=0"
cmake-args: -DCMAKE_CXX_STANDARD=14
- runs-on: ubuntu-latest
python-version: 'pypy-3.10'
cmake-args: -DCMAKE_CXX_STANDARD=14

View File

@@ -441,12 +441,11 @@ Note that this is run once for each (sub-)interpreter the module is imported int
possibly concurrently. The PyModuleDef is allowed to be static, but the PyObject* resulting from
PyModuleDef_Init should be treated like any other PyObject (so not shared across interpreters).
*/
#define PYBIND11_MODULE_PYINIT(name, pre_init, ...) \
#define PYBIND11_MODULE_PYINIT(name, ...) \
static int PYBIND11_CONCAT(pybind11_exec_, name)(PyObject *); \
PYBIND11_PLUGIN_IMPL(name) { \
PYBIND11_CHECK_PYTHON_VERSION \
pre_init; \
PYBIND11_ENSURE_INTERNALS_READY \
pybind11::detail::ensure_internals(); \
static ::pybind11::detail::slots_array mod_def_slots = ::pybind11::detail::init_slots( \
&PYBIND11_CONCAT(pybind11_exec_, name), ##__VA_ARGS__); \
static PyModuleDef def{/* m_base */ PyModuleDef_HEAD_INIT, \
@@ -465,6 +464,7 @@ PyModuleDef_Init should be treated like any other PyObject (so not shared across
static void PYBIND11_CONCAT(pybind11_init_, name)(::pybind11::module_ &); \
int PYBIND11_CONCAT(pybind11_exec_, name)(PyObject * pm) { \
try { \
pybind11::detail::ensure_internals(); \
auto m = pybind11::reinterpret_borrow<::pybind11::module_>(pm); \
if (!pybind11::detail::get_cached_module(m.attr("__spec__").attr("name"))) { \
PYBIND11_CONCAT(pybind11_init_, name)(m); \
@@ -518,8 +518,7 @@ PyModuleDef_Init should be treated like any other PyObject (so not shared across
\endrst */
#define PYBIND11_MODULE(name, variable, ...) \
PYBIND11_MODULE_PYINIT( \
name, (pybind11::detail::get_num_interpreters_seen() += 1), ##__VA_ARGS__) \
PYBIND11_MODULE_PYINIT(name, ##__VA_ARGS__) \
PYBIND11_MODULE_EXEC(name, variable)
// pop gnu-zero-variadic-macro-arguments

View File

@@ -418,11 +418,12 @@ inline PyThreadState *get_thread_state_unchecked() {
#endif
}
/// We use this counter to figure out if there are or have been multiple subinterpreters active at
/// any point. This must never decrease while any interpreter may be running in any thread!
inline std::atomic<int64_t> &get_num_interpreters_seen() {
static std::atomic<int64_t> counter(0);
return counter;
/// We use this to figure out if there are or have been multiple subinterpreters active at any
/// point. This must never go from true to false while any interpreter may be running in any
/// thread!
inline std::atomic_bool &has_seen_non_main_interpreter() {
static std::atomic_bool multi(false);
return multi;
}
template <class T,
@@ -649,7 +650,7 @@ public:
/// acquire the GIL. Will never return nullptr.
std::unique_ptr<InternalsType> *get_pp() {
#ifdef PYBIND11_HAS_SUBINTERPRETER_SUPPORT
if (get_num_interpreters_seen() > 1) {
if (has_seen_non_main_interpreter()) {
// Whenever the interpreter changes on the current thread we need to invalidate the
// internals_pp so that it can be pulled from the interpreter's state dict. That is
// slow, so we use the current PyThreadState to check if it is necessary.
@@ -675,7 +676,7 @@ public:
/// Drop all the references we're currently holding.
void unref() {
#ifdef PYBIND11_HAS_SUBINTERPRETER_SUPPORT
if (get_num_interpreters_seen() > 1) {
if (has_seen_non_main_interpreter()) {
last_istate_tls() = nullptr;
internals_p_tls() = nullptr;
return;
@@ -686,7 +687,7 @@ public:
void destroy() {
#ifdef PYBIND11_HAS_SUBINTERPRETER_SUPPORT
if (get_num_interpreters_seen() > 1) {
if (has_seen_non_main_interpreter()) {
auto *tstate = get_thread_state_unchecked();
// this could be called without an active interpreter, just use what was cached
if (!tstate || tstate->interp == last_istate_tls()) {
@@ -791,6 +792,16 @@ PYBIND11_NOINLINE internals &get_internals() {
return *internals_ptr;
}
inline void ensure_internals() {
pybind11::detail::get_internals_pp_manager().unref();
#ifdef PYBIND11_HAS_SUBINTERPRETER_SUPPORT
if (PyInterpreterState_Get() != PyInterpreterState_Main()) {
has_seen_non_main_interpreter() = true;
}
#endif
pybind11::detail::get_internals();
}
inline internals_pp_manager<local_internals> &get_local_internals_pp_manager() {
// Use the address of this static itself as part of the key, so that the value is uniquely tied
// to where the module is loaded in memory

View File

@@ -58,7 +58,7 @@
PYBIND11_WARNING_PUSH
PYBIND11_WARNING_DISABLE_CLANG("-Wgnu-zero-variadic-macro-arguments")
#define PYBIND11_EMBEDDED_MODULE(name, variable, ...) \
PYBIND11_MODULE_PYINIT(name, {}, ##__VA_ARGS__) \
PYBIND11_MODULE_PYINIT(name, ##__VA_ARGS__) \
::pybind11::detail::embedded_module PYBIND11_CONCAT(pybind11_module_, name)( \
PYBIND11_TOSTRING(name), PYBIND11_CONCAT(PyInit_, name)); \
PYBIND11_MODULE_EXEC(name, variable)
@@ -202,7 +202,7 @@ inline void initialize_interpreter(bool init_signal_handlers = true,
#endif
// There is exactly one interpreter alive currently.
detail::get_num_interpreters_seen() = 1;
detail::has_seen_non_main_interpreter() = false;
}
/** \rst
@@ -242,12 +242,12 @@ inline void initialize_interpreter(bool init_signal_handlers = true,
\endrst */
inline void finalize_interpreter() {
// get rid of any thread-local interpreter cache that currently exists
if (detail::get_num_interpreters_seen() > 1) {
if (detail::has_seen_non_main_interpreter()) {
detail::get_internals_pp_manager().unref();
detail::get_local_internals_pp_manager().unref();
// We know there can be no other interpreter alive now, so we can lower the count
detail::get_num_interpreters_seen() = 1;
// We know there can be no other interpreter alive now
detail::has_seen_non_main_interpreter() = false;
}
// Re-fetch the internals pointer-to-pointer (but not the internals itself, which might not
@@ -265,8 +265,8 @@ inline void finalize_interpreter() {
// avoid undefined behaviors when initializing another interpreter
detail::get_local_internals_pp_manager().destroy();
// We know there is no interpreter alive now, so we can reset the count
detail::get_num_interpreters_seen() = 0;
// We know there is no interpreter alive now, so we can reset the multi-flag
detail::has_seen_non_main_interpreter() = false;
}
/** \rst

View File

@@ -232,7 +232,7 @@ private:
// Indicator of fast path for single-interpreter case.
bool is_last_storage_valid() const {
return is_initialized_by_at_least_one_interpreter_
&& detail::get_num_interpreters_seen() == 1;
&& !detail::has_seen_non_main_interpreter();
}
// Get the unique key for this storage instance in the interpreter's state dict.

View File

@@ -109,7 +109,7 @@ public:
// upon success, the new interpreter is activated in this thread
result.istate_ = result.creation_tstate_->interp;
detail::get_num_interpreters_seen() += 1; // there are now many interpreters
detail::has_seen_non_main_interpreter() = true;
detail::get_internals(); // initialize internals.tstate, amongst other things...
// In 3.13+ this state should be deleted right away, and the memory will be reused for

View File

@@ -592,6 +592,14 @@ if(NOT PYBIND11_CUDA_TESTS)
foreach(mod IN LISTS PYBIND11_MULTIPLE_INTERPRETERS_TEST_MODULES)
set_target_properties("${mod}" PROPERTIES LIBRARY_OUTPUT_DIRECTORY
"${pybind11_tests_output_directory}")
# Also set config-specific output directories for multi-configuration generators (MSVC)
if(DEFINED CMAKE_CONFIGURATION_TYPES)
foreach(config ${CMAKE_CONFIGURATION_TYPES})
string(TOUPPER ${config} config)
set_target_properties("${mod}" PROPERTIES LIBRARY_OUTPUT_DIRECTORY_${config}
"${pybind11_tests_output_directory}")
endforeach()
endif()
endforeach()
if(PYBIND11_TEST_SMART_HOLDER)

View File

@@ -9,6 +9,8 @@ namespace py = pybind11;
# include <pybind11/native_enum.h>
#endif
namespace pybind11_tests {
namespace mod_per_interpreter_gil_with_singleton {
// A singleton class that holds references to certain Python objects
// This singleton is per-interpreter using gil_safe_call_once_and_store
class MySingleton {
@@ -95,11 +97,15 @@ enum class MyEnum : int {
TWO = 2,
THREE = 3,
};
} // namespace mod_per_interpreter_gil_with_singleton
} // namespace pybind11_tests
PYBIND11_MODULE(mod_per_interpreter_gil_with_singleton,
m,
py::mod_gil_not_used(),
py::multiple_interpreters::per_interpreter_gil()) {
using namespace pybind11_tests::mod_per_interpreter_gil_with_singleton;
#ifdef PYBIND11_HAS_SUBINTERPRETER_SUPPORT
m.attr("defined_PYBIND11_HAS_SUBINTERPRETER_SUPPORT") = true;
#else

View File

@@ -90,7 +90,7 @@ def test_independent_subinterpreters():
run_string, create = get_interpreters(modern=True)
m = pytest.importorskip("mod_per_interpreter_gil")
import mod_per_interpreter_gil as m
if not m.defined_PYBIND11_HAS_SUBINTERPRETER_SUPPORT:
pytest.skip("Does not have subinterpreter support compiled in")
@@ -139,7 +139,7 @@ def test_independent_subinterpreters_modern():
sys.path.insert(0, os.path.dirname(pybind11_tests.__file__))
m = pytest.importorskip("mod_per_interpreter_gil")
import mod_per_interpreter_gil as m
if not m.defined_PYBIND11_HAS_SUBINTERPRETER_SUPPORT:
pytest.skip("Does not have subinterpreter support compiled in")
@@ -187,7 +187,7 @@ def test_dependent_subinterpreters():
run_string, create = get_interpreters(modern=False)
m = pytest.importorskip("mod_shared_interpreter_gil")
import mod_shared_interpreter_gil as m
if not m.defined_PYBIND11_HAS_SUBINTERPRETER_SUPPORT:
pytest.skip("Does not have subinterpreter support compiled in")
@@ -222,15 +222,27 @@ PREAMBLE_CODE = textwrap.dedent(
objects = m.get_objects_in_singleton()
expected = [
type(None),
tuple,
list,
dict,
collections.OrderedDict,
collections.defaultdict,
collections.deque,
type(None), # static type: shared between interpreters
tuple, # static type: shared between interpreters
list, # static type: shared between interpreters
dict, # static type: shared between interpreters
collections.OrderedDict, # static type: shared between interpreters
collections.defaultdict, # heap type: dynamically created per interpreter
collections.deque, # heap type: dynamically created per interpreter
]
assert objects == expected, f"Expected {{expected!r}}, got {{objects!r}}."
# Check that we have the expected objects. Avoid IndexError by checking lengths first.
assert len(objects) == len(expected), (
f"Expected {{expected!r}} ({{len(expected)}}), got {{objects!r}} ({{len(objects)}})."
)
# The first ones are static types shared between interpreters.
assert objects[:-2] == expected[:-2], (
f"Expected static objects {{expected[:-2]!r}}, got {{objects[:-2]!r}}."
)
# The last two are heap types created per-interpreter.
# The expected objects are dynamically imported from `collections`.
assert objects[-2:] == expected[-2:], (
f"Expected heap objects {{expected[-2:]!r}}, got {{objects[-2:]!r}}."
)
assert hasattr(m, 'MyClass'), "Module missing MyClass"
assert hasattr(m, 'MyGlobalError'), "Module missing MyGlobalError"
@@ -240,11 +252,6 @@ PREAMBLE_CODE = textwrap.dedent(
).lstrip()
@pytest.mark.xfail(
reason="Duplicate C++ type registration under multiple-interpreters, needs investigation.",
# raises=interpreters.ExecutionFailed, # need to import the module
strict=False,
)
@pytest.mark.skipif(
sys.platform.startswith("emscripten"), reason="Requires loadable modules"
)
@@ -278,14 +285,9 @@ def check_script_success_in_subprocess(code: str, *, rerun: int = 8) -> None:
f"```\n\n"
f"Output:\n"
f"{ex.output}"
) from ex
) from None
@pytest.mark.xfail(
reason="Duplicate C++ type registration under multiple-interpreters, needs investigation.",
raises=RuntimeError,
strict=False,
)
@pytest.mark.skipif(
sys.platform.startswith("emscripten"), reason="Requires loadable modules"
)
@@ -342,11 +344,6 @@ def test_import_in_subinterpreter_after_main():
)
@pytest.mark.xfail(
reason="Duplicate C++ type registration under multiple-interpreters, needs investigation.",
raises=RuntimeError,
strict=False,
)
@pytest.mark.skipif(
sys.platform.startswith("emscripten"), reason="Requires loadable modules"
)
@@ -427,11 +424,6 @@ def test_import_in_subinterpreter_before_main():
)
@pytest.mark.xfail(
reason="Duplicate C++ type registration under multiple-interpreters, needs investigation.",
raises=RuntimeError,
strict=False,
)
@pytest.mark.skipif(
sys.platform.startswith("emscripten"), reason="Requires loadable modules"
)

View File

@@ -31,7 +31,7 @@ void unsafe_reset_internals_for_single_interpreter() {
py::detail::get_local_internals_pp_manager().unref();
// we know there are no other interpreters, so we can lower this. SUPER DANGEROUS
py::detail::get_num_interpreters_seen() = 1;
py::detail::has_seen_non_main_interpreter() = false;
// now we unref the static global singleton internals
py::detail::get_internals_pp_manager().unref();