Commit Graph

7 Commits

Author SHA1 Message Date
Sam Gross
4f81a12507 Fix deadlock in test with free threading (#5973)
Importing "widget_module" re-enables the GIL. In current versions of
CPython, this requires pausing all threads attached to all interpreters.
The spinning on sync/num without a py::gil_scoped_release causes
occasional deadlocks.
2026-01-28 23:02:08 -08:00
Xuehai Pan
fee2527dfa Fix concurrency consistency for internals_pp_manager under multiple-interpreters (#5947)
* Add per-interpreter storage for `gil_safe_call_once_and_store`

* Disable thread local cache for `internals_pp_manager`

* Disable thread local cache for `internals_pp_manager` for multi-interpreter only

* Use anonymous namespace to separate these type_ids from other tests with the same class names.

* style: pre-commit fixes

* Revert internals_pp_manager changes

* This is the crux of fix for the subinterpreter_before_main failure.

The pre_init needs to check if it is in a subinterpreter or not. But in 3.13+ this static initializer runs in the main interpreter.  So we need to check this later, during the exec phase.

* Continue to do the ensure in both places, there might be a reason it was where it was...

Should not hurt anything to do it extra times here.

* Change get_num_interpreters_seen to a boolean flag instead.

The count was not used, it was just checked for > 1, we now accomplish this by setting the flag.

* Spelling typo

* Work around older python versions, only need this check for newish versions

* Add more comments for test case

* Add more comments for test case

* Stop traceback propagation

* Re-enable subinterpreter support on ubuntu 3.14 builds

Was disabled in e4873e8

* As suggested, don't use an anonymous namespace.

* Typo in test assert format string

* Use a more appropriate function name

* Fix mod_per_interpreter_gil* output directory on Windows/MSVC

On Windows with MSVC (multi-configuration generators), CMake uses
config-specific properties like LIBRARY_OUTPUT_DIRECTORY_DEBUG when
set, otherwise falls back to LIBRARY_OUTPUT_DIRECTORY/<Config>/.

The main test modules (pybind11_tests, etc.) correctly set both
LIBRARY_OUTPUT_DIRECTORY and the config-specific variants (lines
517-528), so they output directly to tests/.

However, the mod_per_interpreter_gil* modules only copied the base
LIBRARY_OUTPUT_DIRECTORY property, causing them to be placed in
tests/Debug/ instead of tests/.

This mismatch caused test_import_in_subinterpreter_concurrently and
related tests to fail with ModuleNotFoundError on Windows Python 3.14,
because the test code sets sys.path based on pybind11_tests.__file__
(which is in tests/) but tries to import mod_per_interpreter_gil_with_singleton
(which ended up in tests/Debug/).

This bug was previously masked by @pytest.mark.xfail decorators on
these tests. Now that the underlying "Duplicate C++ type registration"
issue is fixed and the xfails are removed, this path issue surfaced.

The fix mirrors the same pattern used for main test targets: also set
LIBRARY_OUTPUT_DIRECTORY_<CONFIG> for each configuration type.

* Remove unneeded `pytest.importorskip`

* Remove comment

---------

Co-authored-by: b-pass <b-pass@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Ralf W. Grosse-Kunstleve <rgrossekunst@nvidia.com>
2025-12-26 13:59:11 -05:00
Xuehai Pan
0057e4945d Add per-interpreter storage for gil_safe_call_once_and_store (#5933)
* Add new argument to `gil_safe_call_once_and_store::call_once_and_store_result`

* Add per-interpreter storage for `gil_safe_call_once_and_store`

* Make `~gil_safe_call_once_and_store` a no-op

* Fix C++11 compatibility

* Improve thread-safety and add default finalizer

* Try fix thread-safety

* Try fix thread-safety

* Add a warning comment

* Simplify `PYBIND11_INTERNALS_VERSION >= 12`

* Try fix thread-safety

* Try fix thread-safety

* Revert get_pp()

* Update comments

* Move call-once storage out of internals

* Revert internal version bump

* Cleanup outdated comments

* Move atomic_bool alias into pybind11::detail namespace

The `using atomic_bool = ...` declaration was at global scope,
polluting the global namespace. Move it into pybind11::detail
to avoid potential conflicts with user code.

* Add explicit #include <unordered_map> for subinterpreter support

The subinterpreter branch uses std::unordered_map but relied on
transitive includes. Add an explicit include for robustness.

* Remove extraneous semicolon after destructor definition

Style fix: remove trailing semicolon after ~call_once_storage()
destructor body.

* Add comment explaining unused finalize parameter

Clarify why the finalize callback parameter is intentionally ignored
when subinterpreter support is disabled: the storage is process-global
and leaked to avoid destructor calls after interpreter finalization.

* Add comment explaining error_scope usage

Clarify why error_scope is used: to preserve any existing Python
error state that might be cleared or modified by dict_getitemstringref.

* Improve exception safety in get_or_create_call_once_storage_map()

Use std::unique_ptr to hold the newly allocated storage map until
the capsule is successfully created. This prevents a memory leak
if capsule creation throws an exception.

* Add timeout-minutes: 3 to cpptest workflow steps

Add a 3-minute timeout to all C++ test (cpptest) steps across all
platforms to detect hangs early. This uses GitHub Actions' built-in
timeout-minutes property which works on Linux, macOS, and Windows.

* Add progress reporter for test_with_catch Catch2 runner

Add a custom Catch2 streaming reporter that prints one line per test
case as it starts and ends, with immediate flushing to keep CI logs
current. This makes it easy to see where the embedded/interpreter
tests are spending time and to pinpoint which test case is stuck
when builds hang (e.g., free-threading issues).

The reporter:
- Prints "[ RUN      ]" when each test starts
- Prints "[       OK ]" or "[  FAILED  ]" when each test ends
- Prints the Python version once at the start via Py_GetVersion()
- Uses StreamingReporterBase for immediate output (not buffered)
- Is set as the default reporter via CATCH_CONFIG_DEFAULT_REPORTER

This approach gives visibility into all tests without changing their
behavior, turning otherwise opaque 90-minute CI timeouts into
locatable issues in the Catch output.

* clang-format auto-fix (overlooked before)

* Disable "Move Subinterpreter" test on free-threaded Python 3.14+

This test hangs in Py_EndInterpreter() when the subinterpreter is
destroyed from a different thread than it was created on.

The hang was observed:
- Intermittently on macOS with Python 3.14.0t
- Predictably on macOS, Ubuntu, and Windows with Python 3.14.1t and 3.14.2t

Root cause analysis points to an interaction between pybind11's
subinterpreter creation code and CPython's free-threaded runtime,
specifically around PyThreadState_Swap() after PyThreadState_DeleteCurrent().

See detailed analysis: https://github.com/pybind/pybind11/pull/5933

* style: pre-commit fixes

* Add test for gil_safe_call_once_and_store per-interpreter isolation

This test verifies that gil_safe_call_once_and_store provides separate
storage for each interpreter when subinterpreter support is enabled.

The test caches the interpreter ID in the main interpreter, then creates
a subinterpreter and verifies it gets its own cached value (not the main
interpreter's). Without per-interpreter storage, the subinterpreter would
incorrectly see the main interpreter's cached object.

* Add STARTING/DONE timestamps to test_with_catch output

Print UTC timestamps at the beginning and end of the test run to make
it immediately clear when tests started and whether they ran to
completion. The DONE message includes the Catch session result value.

Example output:
  [ STARTING ] 2025-12-21 03:23:20.497Z
  [ PYTHON   ] 3.14.2 ...
  [ RUN      ] Threads
  [       OK ] Threads
  [ DONE     ] 2025-12-21 03:23:20.512Z (result 0)

* Disable stdout buffering in test_with_catch

Ensure test output appears immediately in CI logs by disabling stdout
buffering. Without this, output may be lost if the process is killed
by a timeout, making it difficult to diagnose which test was hanging.

* EXPERIMENT: Re-enable hanging test to verify CI log buffering fix

This is a temporary commit to verify that the unbuffered stdout fix
makes the hanging test visible in CI logs. REVERT THIS COMMIT after
confirming the output appears.

* Revert "Disable stdout buffering in test_with_catch"

This reverts commit 0f8f32a92a.

* Use USES_TERMINAL for cpptest to show output immediately

Ninja buffers subprocess output until completion. When a test hangs,
the output is never shown, making it impossible to diagnose which test
is hanging. USES_TERMINAL gives the command direct terminal access,
bypassing ninja's buffering.

This explains why Windows CI showed test progress but Linux/macOS did
not - Windows uses MSBuild which doesn't buffer the same way.

* Fix clang-tidy performance-avoid-endl warning

Use '\n' instead of std::endl since USES_TERMINAL now handles
output buffering at the CMake level.

* Add SIGTERM handler to show when test is killed by timeout

When a test hangs and is killed by `timeout`, Catch2 marks it as failed
but the process exits before printing [ DONE ]. This made it unclear
whether the test failed normally or was terminated.

The signal handler prints a clear message when SIGTERM is received,
making timeout-related failures obvious in CI logs.

* Fix typo: atleast -> at_least

* Fix GCC warn_unused_result error for write() in signal handler

Assign the return value to a variable to satisfy GCC's warn_unused_result
attribute, then cast to void to suppress unused variable warning.

* Add USES_TERMINAL to other C++ test targets

Apply the same ninja output buffering fix to test_cross_module_rtti
and test_pure_cpp targets. Also add explanatory comments to all
USES_TERMINAL usages.

* Revert "EXPERIMENT: Re-enable hanging test to verify CI log buffering fix"

This reverts commit a3abdeea89.

* Update comment to reference PR #5940 for Move Subinterpreter fix

* Add alias `interpid_t = std::int64_t`

* Add isolation and gc test for `gil_safe_call_once_and_store`

* Add thread local cache for gil_safe_call_once_and_store

* Revert "Add thread local cache for gil_safe_call_once_and_store"

This reverts commit 5d6681956d2d326fe74c7bf80e845c8e8ddb2a7c.

* Revert changes according to code review

* Relocate multiple-interpreters tests

* Add more tests for multiple interpreters

* Remove copy constructor

* Apply suggestions from code review

* Refactor to use per-storage capsule instead

* Update comments

* Update singleton tests

* Use interpreter id type for `get_num_interpreters_seen()`

* Suppress unused variable warning

* HACKING

* Revert "HACKING"

This reverts commit 534235ea55.

* Try fix concurrency

* Test even harder

* Reorg code to avoid duplicates

* Fix unique_ptr::reset -> unique_ptr::release

* Extract reusable functions

* Fix indentation

* Appease warnings for MSVC

* Appease warnings for MSVC

* Appease warnings for MSVC

* Try fix concurrency by not using `get_num_interpreters_seen() > 1`

* Try fix tests

* Make Python path handling more robust

* Update comments and assertion messages

* Revert changes according to code review

* Disable flaky tests

* Use `@pytest.mark.xfail` rather than `pytest.skip`

* Retrigger CI

* Retrigger CI

* Revert file moves

* Refactor atomic_get_or_create_in_state_dict: improve API and fix on_fetch_ bug

Three improvements to atomic_get_or_create_in_state_dict:

1. Return std::pair<Payload*, bool> instead of just Payload*
   - The bool indicates whether storage was newly created (true) or
     already existed (false), following std::map::insert convention.
   - This fixes a bug where on_fetch_ was called even for newly created
     internals, when it should only run for fetched (existing) ones.
     (Identified by @b-pass in code review)

2. Change LeakOnInterpreterShutdown from template param to runtime arg
   - Renamed to `clear_destructor` to describe what it does locally,
     rather than embedding assumptions about why it's used.
   - Reduces template instantiations (header-only library benefits).
   - The check is in the slow path (create) anyway, so negligible cost.

3. Remove unnecessary braces around the fast-path lookup
   - The braces created a nested scope but declared no local variables
     that would benefit from scoping.

* Remove unused PYBIND11_MULTIPLE_INTERPRETERS_TEST_FILES variable

This variable was defined but never used.

---------

Co-authored-by: Ralf W. Grosse-Kunstleve <rgrossekunst@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-12-24 23:33:02 -08:00
Ralf W. Grosse-Kunstleve
799f591ec3 Re-enable Move Subinterpreter test for free-threaded Python 3.14 (#5940)
* Remove skip for Move Subinterpreter test on free-threaded Python 3.14+

* Fix deadlock by detaching from the main interpreter before joining the thread.

* style: pre-commit fixes

---------

Co-authored-by: b-pass <b-pass@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-12-22 21:32:41 -08:00
Ralf W. Grosse-Kunstleve
78381e5e28 Improve C++ test infrastructure: progress reporter, timeouts, and skip hanging Move Subinterpreter test (#5942)
* Improve C++ test infrastructure and disable hanging test

This commit improves the C++ test infrastructure to ensure test output
is visible in CI logs, and disables a test that hangs on free-threaded
Python 3.14+.

Changes:

## CI/test infrastructure improvements

- .github/workflows: Added `timeout-minutes: 3` to all C++ test steps
  to prevent indefinite hangs.

- tests/**/CMakeLists.txt: Added `USES_TERMINAL` to C++ test targets
  (cpptest, test_cross_module_rtti, test_pure_cpp) to ensure output is
  shown immediately rather than buffered and possibly lost on crash/timeout.

- tests/test_with_catch/catch.cpp: Added a custom Catch2 progress reporter
  with timestamps, Python version info, and a SIGTERM handler to make test
  execution and failures clearly visible in CI logs.

## Disabled hanging test

- The "Move Subinterpreter" test is disabled on free-threaded Python 3.14+
  due to a hang in Py_EndInterpreter() when the subinterpreter is destroyed
  from a different thread than it was created on. Work on fixing the
  underlying issue will continue under PR #5940.

Context: We were in the dark for months (since we started testing with
Python 3.14t) because CI logs gave no clue about the root cause of hangs.
This led to ignoring intermittent hangs (mostly on macOS). Our hand was
forced only with the Python 3.14.1 release, when hangs became predictable
on all platforms.

For the full development history of these changes, see PR #5933.

* Add test summary to progress reporter

Print the total number of test cases and assertions at the end of the
test run, making it easy to spot if tests are disabled or added.

Example output:
  [  PASSED  ] 20 test cases, 1589 assertions.

* Add PYBIND11_CATCH2_SKIP_IF macro to skip tests at runtime

Catch2 v2 doesn't have native skip support (v3 does with SKIP()).
This macro allows tests to be skipped with a visible message while
still appearing in the test list.

Use this for the Move Subinterpreter test on free-threaded Python 3.14+
so it shows as skipped rather than being conditionally compiled out.

Example output:
  [ RUN      ] Move Subinterpreter
  [ SKIPPED ] Skipped on free-threaded Python 3.14+ (see PR #5940)
  [       OK ] Move Subinterpreter

* Fix clang-tidy bugprone-macro-parentheses warning in PYBIND11_CATCH2_SKIP_IF
2025-12-21 21:25:06 -08:00
Scott Wolchok
3262000195 Add fast_type_map, use it authoritatively for local types and as a hint for global types (ABI breaking) (#5842)
* Add fast_type_map, use it authoritatively for local types and as a hint for global types

nanobind has a similar two-level lookup strategy, added and explained
by
b515b1f7f2

In this PR I've ported this approach to pybind11. To avoid an ABI
break, I've kept the fast maps to the `local_internals`. I think this
should be safe because any particular module should see its
`local_internals` reset at least as often as the global `internals`,
and misses in the fast "hint" map for global types fall back to the
global `internals`.

Performance seems to have improved. Using my patched fork of
pybind11_benchmark
(https://github.com/swolchok/pybind11_benchmark/tree/benchmark-updates,
specifically commit hash b6613d12607104d547b1c10a8145d1b3e9937266), I
run bench.py and observe the MyInt case. Each time, I do 3 runs and
just report all 3.

master, Mac: 75.9, 76.9, 75.3 nsec/loop
this PR, Mac: 73.8, 73.8, 73.6 nsec/loop
master, Linux box: 188, 187, 188 nsec/loop
this PR, Linux box: 164, 165, 164 nsec/loop

Note that the "real" percentage improvement is larger than implied by the
above because master does not yet include #5824.

* simplify unsafe_reset_local_internals in test

* pre-implement PYBIND11_INTERNALS_VERSION 12

* use PYBIND11_INTERNALS_VERSION 12 on Python 3.14 per suggestion

* Implement reviewer comments: revert PY_VERSION_HEX change, fix REVIEW comment, add two-level lookup comments. ci.yml coming separately

* Use the inplace build to smoke test ABI bump?

* [skip ci] Remove "smoke" from comment. This is full testing, just only on a few platforms.

---------

Co-authored-by: Ralf W. Grosse-Kunstleve <rgrossekunst@nvidia.com>
2025-10-05 11:07:25 -07:00
Scott Wolchok
30748f863f Avoid heap allocation for function calls with a small number of args (#5824)
* Avoid heap allocation for function calls with a small number of arguments

We don't have access to llvm::SmallVector or similar, but given the
limited subset of the `std::vector` API that
`function_call::args{,_convert}` need and the "reserve-then-fill"
usage pattern, it is relatively straightforward to implement custom
containers that get the job done.

Seems to improves time to call the collatz function in
pybind/pybind11_benchmark significantly; numbers are a little noisy
but there's a clear improvement from "about 60 ns per call" to "about
45 ns per call" on my machine (M4 Max Mac), as measured with
`timeit.repeat('collatz(4)', 'from pybind11_benchmark import
collatz')`.

* clang-tidy

* more clang-tidy

* clang-tidy NOLINTBEGIN/END instead of NOLINTNEXTLINE

* forgot to increase inline size after removing std::variant

* constexpr arg_vector_small_size, use move instead of swap to hopefully clarify second_pass_convert

* rename test_embed to test_low_level

* rename test_low_level to test_with_catch

* Be careful to NOINLINE slow paths

* rename array/vector members to iarray/hvector. Move comment per request. Add static_asserts for our untagged union implementation per request.

* drop is_standard_layout assertions; see https://github.com/pybind/pybind11/pull/5824#issuecomment-3308616072
2025-09-19 13:44:40 -07:00