Files
pybind11/tests/mod_per_interpreter_gil_with_singleton.cpp
Xuehai Pan 0057e4945d Add per-interpreter storage for gil_safe_call_once_and_store (#5933)
* Add new argument to `gil_safe_call_once_and_store::call_once_and_store_result`

* Add per-interpreter storage for `gil_safe_call_once_and_store`

* Make `~gil_safe_call_once_and_store` a no-op

* Fix C++11 compatibility

* Improve thread-safety and add default finalizer

* Try fix thread-safety

* Try fix thread-safety

* Add a warning comment

* Simplify `PYBIND11_INTERNALS_VERSION >= 12`

* Try fix thread-safety

* Try fix thread-safety

* Revert get_pp()

* Update comments

* Move call-once storage out of internals

* Revert internal version bump

* Cleanup outdated comments

* Move atomic_bool alias into pybind11::detail namespace

The `using atomic_bool = ...` declaration was at global scope,
polluting the global namespace. Move it into pybind11::detail
to avoid potential conflicts with user code.

* Add explicit #include <unordered_map> for subinterpreter support

The subinterpreter branch uses std::unordered_map but relied on
transitive includes. Add an explicit include for robustness.

* Remove extraneous semicolon after destructor definition

Style fix: remove trailing semicolon after ~call_once_storage()
destructor body.

* Add comment explaining unused finalize parameter

Clarify why the finalize callback parameter is intentionally ignored
when subinterpreter support is disabled: the storage is process-global
and leaked to avoid destructor calls after interpreter finalization.

* Add comment explaining error_scope usage

Clarify why error_scope is used: to preserve any existing Python
error state that might be cleared or modified by dict_getitemstringref.

* Improve exception safety in get_or_create_call_once_storage_map()

Use std::unique_ptr to hold the newly allocated storage map until
the capsule is successfully created. This prevents a memory leak
if capsule creation throws an exception.

* Add timeout-minutes: 3 to cpptest workflow steps

Add a 3-minute timeout to all C++ test (cpptest) steps across all
platforms to detect hangs early. This uses GitHub Actions' built-in
timeout-minutes property which works on Linux, macOS, and Windows.

* Add progress reporter for test_with_catch Catch2 runner

Add a custom Catch2 streaming reporter that prints one line per test
case as it starts and ends, with immediate flushing to keep CI logs
current. This makes it easy to see where the embedded/interpreter
tests are spending time and to pinpoint which test case is stuck
when builds hang (e.g., free-threading issues).

The reporter:
- Prints "[ RUN      ]" when each test starts
- Prints "[       OK ]" or "[  FAILED  ]" when each test ends
- Prints the Python version once at the start via Py_GetVersion()
- Uses StreamingReporterBase for immediate output (not buffered)
- Is set as the default reporter via CATCH_CONFIG_DEFAULT_REPORTER

This approach gives visibility into all tests without changing their
behavior, turning otherwise opaque 90-minute CI timeouts into
locatable issues in the Catch output.

* clang-format auto-fix (overlooked before)

* Disable "Move Subinterpreter" test on free-threaded Python 3.14+

This test hangs in Py_EndInterpreter() when the subinterpreter is
destroyed from a different thread than it was created on.

The hang was observed:
- Intermittently on macOS with Python 3.14.0t
- Predictably on macOS, Ubuntu, and Windows with Python 3.14.1t and 3.14.2t

Root cause analysis points to an interaction between pybind11's
subinterpreter creation code and CPython's free-threaded runtime,
specifically around PyThreadState_Swap() after PyThreadState_DeleteCurrent().

See detailed analysis: https://github.com/pybind/pybind11/pull/5933

* style: pre-commit fixes

* Add test for gil_safe_call_once_and_store per-interpreter isolation

This test verifies that gil_safe_call_once_and_store provides separate
storage for each interpreter when subinterpreter support is enabled.

The test caches the interpreter ID in the main interpreter, then creates
a subinterpreter and verifies it gets its own cached value (not the main
interpreter's). Without per-interpreter storage, the subinterpreter would
incorrectly see the main interpreter's cached object.

* Add STARTING/DONE timestamps to test_with_catch output

Print UTC timestamps at the beginning and end of the test run to make
it immediately clear when tests started and whether they ran to
completion. The DONE message includes the Catch session result value.

Example output:
  [ STARTING ] 2025-12-21 03:23:20.497Z
  [ PYTHON   ] 3.14.2 ...
  [ RUN      ] Threads
  [       OK ] Threads
  [ DONE     ] 2025-12-21 03:23:20.512Z (result 0)

* Disable stdout buffering in test_with_catch

Ensure test output appears immediately in CI logs by disabling stdout
buffering. Without this, output may be lost if the process is killed
by a timeout, making it difficult to diagnose which test was hanging.

* EXPERIMENT: Re-enable hanging test to verify CI log buffering fix

This is a temporary commit to verify that the unbuffered stdout fix
makes the hanging test visible in CI logs. REVERT THIS COMMIT after
confirming the output appears.

* Revert "Disable stdout buffering in test_with_catch"

This reverts commit 0f8f32a92a.

* Use USES_TERMINAL for cpptest to show output immediately

Ninja buffers subprocess output until completion. When a test hangs,
the output is never shown, making it impossible to diagnose which test
is hanging. USES_TERMINAL gives the command direct terminal access,
bypassing ninja's buffering.

This explains why Windows CI showed test progress but Linux/macOS did
not - Windows uses MSBuild which doesn't buffer the same way.

* Fix clang-tidy performance-avoid-endl warning

Use '\n' instead of std::endl since USES_TERMINAL now handles
output buffering at the CMake level.

* Add SIGTERM handler to show when test is killed by timeout

When a test hangs and is killed by `timeout`, Catch2 marks it as failed
but the process exits before printing [ DONE ]. This made it unclear
whether the test failed normally or was terminated.

The signal handler prints a clear message when SIGTERM is received,
making timeout-related failures obvious in CI logs.

* Fix typo: atleast -> at_least

* Fix GCC warn_unused_result error for write() in signal handler

Assign the return value to a variable to satisfy GCC's warn_unused_result
attribute, then cast to void to suppress unused variable warning.

* Add USES_TERMINAL to other C++ test targets

Apply the same ninja output buffering fix to test_cross_module_rtti
and test_pure_cpp targets. Also add explanatory comments to all
USES_TERMINAL usages.

* Revert "EXPERIMENT: Re-enable hanging test to verify CI log buffering fix"

This reverts commit a3abdeea89.

* Update comment to reference PR #5940 for Move Subinterpreter fix

* Add alias `interpid_t = std::int64_t`

* Add isolation and gc test for `gil_safe_call_once_and_store`

* Add thread local cache for gil_safe_call_once_and_store

* Revert "Add thread local cache for gil_safe_call_once_and_store"

This reverts commit 5d6681956d2d326fe74c7bf80e845c8e8ddb2a7c.

* Revert changes according to code review

* Relocate multiple-interpreters tests

* Add more tests for multiple interpreters

* Remove copy constructor

* Apply suggestions from code review

* Refactor to use per-storage capsule instead

* Update comments

* Update singleton tests

* Use interpreter id type for `get_num_interpreters_seen()`

* Suppress unused variable warning

* HACKING

* Revert "HACKING"

This reverts commit 534235ea55.

* Try fix concurrency

* Test even harder

* Reorg code to avoid duplicates

* Fix unique_ptr::reset -> unique_ptr::release

* Extract reusable functions

* Fix indentation

* Appease warnings for MSVC

* Appease warnings for MSVC

* Appease warnings for MSVC

* Try fix concurrency by not using `get_num_interpreters_seen() > 1`

* Try fix tests

* Make Python path handling more robust

* Update comments and assertion messages

* Revert changes according to code review

* Disable flaky tests

* Use `@pytest.mark.xfail` rather than `pytest.skip`

* Retrigger CI

* Retrigger CI

* Revert file moves

* Refactor atomic_get_or_create_in_state_dict: improve API and fix on_fetch_ bug

Three improvements to atomic_get_or_create_in_state_dict:

1. Return std::pair<Payload*, bool> instead of just Payload*
   - The bool indicates whether storage was newly created (true) or
     already existed (false), following std::map::insert convention.
   - This fixes a bug where on_fetch_ was called even for newly created
     internals, when it should only run for fetched (existing) ones.
     (Identified by @b-pass in code review)

2. Change LeakOnInterpreterShutdown from template param to runtime arg
   - Renamed to `clear_destructor` to describe what it does locally,
     rather than embedding assumptions about why it's used.
   - Reduces template instantiations (header-only library benefits).
   - The check is in the slow path (create) anyway, so negligible cost.

3. Remove unnecessary braces around the fast-path lookup
   - The braces created a nested scope but declared no local variables
     that would benefit from scoping.

* Remove unused PYBIND11_MULTIPLE_INTERPRETERS_TEST_FILES variable

This variable was defined but never used.

---------

Co-authored-by: Ralf W. Grosse-Kunstleve <rgrossekunst@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-12-24 23:33:02 -08:00

142 lines
4.8 KiB
C++

#include <pybind11/pybind11.h>
#include <pybind11/stl.h>
#include <vector>
namespace py = pybind11;
#ifdef PYBIND11_HAS_NATIVE_ENUM
# include <pybind11/native_enum.h>
#endif
// A singleton class that holds references to certain Python objects
// This singleton is per-interpreter using gil_safe_call_once_and_store
class MySingleton {
public:
MySingleton() = default;
~MySingleton() = default;
MySingleton(const MySingleton &) = delete;
MySingleton &operator=(const MySingleton &) = delete;
MySingleton(MySingleton &&) = default;
MySingleton &operator=(MySingleton &&) = default;
static MySingleton &get_instance() {
PYBIND11_CONSTINIT static py::gil_safe_call_once_and_store<MySingleton> storage;
return storage
.call_once_and_store_result([]() -> MySingleton {
MySingleton instance{};
auto emplace = [&instance](const py::handle &obj) -> void {
obj.inc_ref(); // Ensure the object is not GC'd while interpreter is alive
instance.objects.emplace_back(obj);
};
// Example objects to store in the singleton
emplace(py::type::handle_of(py::none())); // static type
emplace(py::type::handle_of(py::tuple())); // static type
emplace(py::type::handle_of(py::list())); // static type
emplace(py::type::handle_of(py::dict())); // static type
emplace(py::module_::import("collections").attr("OrderedDict")); // static type
emplace(py::module_::import("collections").attr("defaultdict")); // heap type
emplace(py::module_::import("collections").attr("deque")); // heap type
assert(instance.objects.size() == 7);
return instance;
})
.get_stored();
}
std::vector<py::handle> &get_objects() { return objects; }
static void init() {
// Ensure the singleton is created
auto &instance = get_instance();
(void) instance; // suppress unused variable warning
assert(instance.objects.size() == 7);
// Register cleanup at interpreter exit
py::module_::import("atexit").attr("register")(py::cpp_function(&MySingleton::clear));
}
static void clear() {
auto &instance = get_instance();
(void) instance; // suppress unused variable warning
assert(instance.objects.size() == 7);
for (const auto &obj : instance.objects) {
obj.dec_ref();
}
instance.objects.clear();
}
private:
std::vector<py::handle> objects;
};
class MyClass {
public:
explicit MyClass(py::ssize_t v) : value(v) {}
py::ssize_t get_value() const { return value; }
private:
py::ssize_t value;
};
class MyGlobalError : public std::runtime_error {
public:
using std::runtime_error::runtime_error;
};
class MyLocalError : public std::runtime_error {
public:
using std::runtime_error::runtime_error;
};
enum class MyEnum : int {
ONE = 1,
TWO = 2,
THREE = 3,
};
PYBIND11_MODULE(mod_per_interpreter_gil_with_singleton,
m,
py::mod_gil_not_used(),
py::multiple_interpreters::per_interpreter_gil()) {
#ifdef PYBIND11_HAS_SUBINTERPRETER_SUPPORT
m.attr("defined_PYBIND11_HAS_SUBINTERPRETER_SUPPORT") = true;
#else
m.attr("defined_PYBIND11_HAS_SUBINTERPRETER_SUPPORT") = false;
#endif
MySingleton::init();
// Ensure py::multiple_interpreters::per_interpreter_gil() works with singletons using
// py::gil_safe_call_once_and_store
m.def(
"get_objects_in_singleton",
[]() -> std::vector<py::handle> { return MySingleton::get_instance().get_objects(); },
"Get the list of objects stored in the singleton");
// Ensure py::multiple_interpreters::per_interpreter_gil() works with class bindings
py::class_<MyClass>(m, "MyClass")
.def(py::init<py::ssize_t>())
.def("get_value", &MyClass::get_value);
// Ensure py::multiple_interpreters::per_interpreter_gil() works with global exceptions
py::register_exception<MyGlobalError>(m, "MyGlobalError");
// Ensure py::multiple_interpreters::per_interpreter_gil() works with local exceptions
py::register_local_exception<MyLocalError>(m, "MyLocalError");
#ifdef PYBIND11_HAS_NATIVE_ENUM
// Ensure py::multiple_interpreters::per_interpreter_gil() works with native_enum
py::native_enum<MyEnum>(m, "MyEnum", "enum.IntEnum")
.value("ONE", MyEnum::ONE)
.value("TWO", MyEnum::TWO)
.value("THREE", MyEnum::THREE)
.finalize();
#else
py::enum_<MyEnum>(m, "MyEnum")
.value("ONE", MyEnum::ONE)
.value("TWO", MyEnum::TWO)
.value("THREE", MyEnum::THREE);
#endif
}