Python may leave behind temporary `.pyc.*` files inside `__pycache__`
on some filesystems (e.g. WSL2 mounts). Adding `__pycache__/` ensures
these directories and any leftover files are consistently ignored.
Background: Python writes bytecode to a temp file with an extra suffix
before renaming it to `.pyc`. If the process is interrupted or the
filesystem rename isn’t fully atomic, those temp files may remain.
See: https://docs.python.org/3/library/py_compile.html#py_compile.compile
Occasionally a test will get stuck and run for 6 hours until Github cancels the workflow.
This reduces the timeout to 90 minutes to not waste resources.
Pybind11's tests seem to run in 30 minutes so this should be plenty of time.
* Revert type hint changes to int_ and float_
These two types do not support casting from int-like and float-like types.
* Fix tests
* Add a custom py::float_ caster
The default py::object caster only works if the object is an instance of the type.
py::float_ should accept python int objects as well as float.
This caster will pass through float as usual and cast int to float.
The caster handles the type name so the custom one is not required.
* style: pre-commit fixes
* Fix name
* Fix variable
* Try satisfying the formatter
* Rename test function
* Simplify type caster
* Fix reference counting issue
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Avoid heap allocation for function calls with a small number of arguments
We don't have access to llvm::SmallVector or similar, but given the
limited subset of the `std::vector` API that
`function_call::args{,_convert}` need and the "reserve-then-fill"
usage pattern, it is relatively straightforward to implement custom
containers that get the job done.
Seems to improves time to call the collatz function in
pybind/pybind11_benchmark significantly; numbers are a little noisy
but there's a clear improvement from "about 60 ns per call" to "about
45 ns per call" on my machine (M4 Max Mac), as measured with
`timeit.repeat('collatz(4)', 'from pybind11_benchmark import
collatz')`.
* clang-tidy
* more clang-tidy
* clang-tidy NOLINTBEGIN/END instead of NOLINTNEXTLINE
* forgot to increase inline size after removing std::variant
* constexpr arg_vector_small_size, use move instead of swap to hopefully clarify second_pass_convert
* rename test_embed to test_low_level
* rename test_low_level to test_with_catch
* Be careful to NOINLINE slow paths
* rename array/vector members to iarray/hvector. Move comment per request. Add static_asserts for our untagged union implementation per request.
* drop is_standard_layout assertions; see https://github.com/pybind/pybind11/pull/5824#issuecomment-3308616072
* Use thread_local instead of thread_specific_storage for internals mangement
thread_local is faster.
* Make the pp manager a singleton.
Strictly speaking, since the members are static, the instances must also be singletons or this wouldn't work. They already are, but we can make the class enforce it to be more 'self-documenting'.
* Revert "s/windows-2022/windows-latest/ in .github/workflows/{ci,pip}.yml (#5826)"
This reverts commit 852a4b5010.
* Add module-level skip for Windows build >= 26100 in test_iostream.py
* Changes suggested by at-henryiii
* Use thread_local for loader_life_support to improve performance
As explained in a new code comment, `loader_life_support` needs to be
`thread_local` but does not need to be isolated to a particular
interpreter because any given function call is already going to only
happen on a single interpreter by definiton.
Performance before:
- on M4 Max using pybind/pybind11_benchmark unmodified repo:
```
> python -m timeit --setup 'from pybind11_benchmark import collatz' 'collatz(4)'
5000000 loops, best of 5: 63.8 nsec per loop
```
- Linux server:
```
python -m timeit --setup 'from pybind11_benchmark import collatz' 'collatz(4)' (pytorch)
2000000 loops, best of 5: 120 nsec per loop
```
After:
- M4 Max:
```
python -m timeit --setup 'from pybind11_benchmark import collatz' 'collatz(4)'
5000000 loops, best of 5: 53.1 nsec per loop
```
- Linux server:
```
> python -m timeit --setup 'from pybind11_benchmark import collatz' 'collatz(4)' (pytorch)
2000000 loops, best of 5: 101 nsec per loop
```
A quick profile with perf shows that pthread_setspecific and pthread_getspecific are gone.
Open questions:
- How do we determine whether we can safely use `thread_local`? I see
concerns about old iOS versions on
https://github.com/pybind/pybind11/pull/5705#issuecomment-2922858880
and https://github.com/pybind/pybind11/pull/5709; is there anything
else?
- Do we have a test that covers "function called in one interpreter
calls a C++ function that causes a function call in another
interpreter"? I think it's fine, but can it happen?
- Are we happy with what we think will happen in the case where
multiple extensions compiled with and without this PR interoperate?
I think it's fine -- each dispatch pushes and cleans up its own
state -- but a second opinion is certainly welcome.
* Remove PYBIND11_CAN_USE_THREAD_LOCAL
* clarify comment
* Simplify loader_life_support TLS storage
Replace the `fake_thread_specific_storage` struct with a direct
thread-local pointer managed via a function-local static:
static loader_life_support *& tls_current_frame()
This retains the "stack of frames" behavior via the `parent` link. It also
reduces indirection and clarifies intent.
Note: this form is C++11-compatible; once pybind11 requires C++17, the
helper can be simplified to:
inline static thread_local loader_life_support *tls_current_frame = nullptr;
* loader_life_support: avoid duplicate tls_current_frame() calls
Replace repeated calls with a single local reference:
auto &frame = tls_current_frame();
This ensures the thread_local initialization guard is checked only once
per constructor/destructor call site, avoids potential clang-tidy
complaints, and makes the code more readable. Functional behavior is
unchanged.
* Add REMINDER for next version bump in internals.h
---------
Co-authored-by: Ralf W. Grosse-Kunstleve <rgrossekunst@nvidia.com>
* pytypes.h: constrain accessor::operator= templates so that they do not match calls that should use the special member functions.
Found by an experimental, new clang-tidy check. While we may not know the exact design decisions now, it seems unlikely that the special members were deliberately meant to not be selected (for otherwise they could have been defined differently to make this clear). Rather, it seems like an oversight that the operator templates win in overload resolution, and we should restore the intended resolution.
* Use C++11-compatible facilities
* Use C++11-compatible facilities
* style: pre-commit fixes
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
When comparing buffer types there are some edge cases on some platforms that are equivalent but the format string is not identical.
item_type_is_equivalent_to is more forgiving than direct string comparison.
* Make function_record type subinterpreter safe
* Get rid of static state in implicit conversion
* style: pre-commit fixes
* Fix lambda
* Bump ABI because we added an internals member
* Set __module__ on the type instance to get rid of DepricationWarning
* Work around internal compiler error in CUDA by not using typedef
hopefully
* Make clang-tidy happy
* Use the same __module__ as pybind11_static_property
* style: pre-commit fixes
* Oops, find-replace error
* style: pre-commit fixes
* Move the once initialization to happen more behind the scenes
* Oops, need those casts...
* Undo implicit conversion change, will do a separate PR
* Use local_internals for function_record pointer to avoid ABI bump
* style: pre-commit fixes
* Get rid of this auto for readability
* Change back to using unqualified tp_name, set __module__ attribute, explicitly add Py_TPFLAGS_HEAPTYPE → does not resolve DeprecationWarning :-(
* Revert "Change back to using unqualified tp_name, set __module__ attribute, explicitly add Py_TPFLAGS_HEAPTYPE → does not resolve DeprecationWarning :-("
This reverts commit 9ccd6de9b7.
* Add Py_TPFLAGS_HEAPTYPE to be explicit (more readable).
* Remove obsolete PYBIND11_WARNING_DISABLE_...
* Make tp_plainname_impl, tp_qualname_impl more DRY
* Change PYBIND11_INTERNAL_MODULE_NAME → PYBIND11_DUMMY_MODULE_NAME
* Add a long comment to explain the tp_qualname_impl workaround.
* Rename local_internals::function_record → function_record_py_type
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Ralf W. Grosse-Kunstleve <rwgkio@gmail.com>
Co-authored-by: Ralf W. Grosse-Kunstleve <rgrossekunst@nvidia.com>
* Failing unit test
* Potential fix for the issue of re-importing a multi-phase module
- When a module is successfully imported and exec'd, save its handle in a dict in the interpreter state
- Use a special Py_mod_create slot to look in the cache and return the cached handle if it is in the cache
- Don't re-run the user exec function if the module is in the interpreter's cache (implying it was already successfully imported)
* Oops, need to inline these.
* Clang-Tidy fixes
* Oops, debug code
* Add xfail for this GraalPy bug
* Remove static from these function defs, it was a cut-and-paste error in the first place.
* Fix test comment
* Proper error handling
* Oops
* Split up this line, but still just ignore failure .. if the module doesn't have the right properties to check the cache then just allow exec to run.
* Clean up - already looked up the name, just use that.
* Some compilers complain if the pointer isn't taken here, weird.
* Allow attribute errors to be thrown here, will be converted to import errors by the exception handler.
* Remove bogus incref, unconditionally expect a __spec__.name on the module
* Add PR to test comment
* style: pre-commit fixes
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Replace static bool with thread-specific-storage to make this code sub-interpreter and free-threading safe.
* Make sure there is only one tss value in existence for this.
The previous code had multiple (one for every type pair, as this is a template function), which may have posed a problem for some platforms.
* Make set_flag in implicitly_convertible() non-copyable/movable
set_flag is an RAII guard for a thread-specific reentrancy flag.
Copying or moving it would risk double-resetting or rearming the flag,
breaking the protection. Disable copy/move constructors and assignment
operators to make this explicit.
* Minor cleanup to avoid venturing into UB territory.
* Experiment: Disable `~thread_specific_storage()` body when using GraalPy.
* Try the suggestion to only call TSS_free if the python interpreter is still active.
* Add IsFinalizing check
* Put this back to having a per-template-instance static
---------
Co-authored-by: Ralf W. Grosse-Kunstleve <rgrossekunst@nvidia.com>
Co-authored-by: Ralf W. Grosse-Kunstleve <rwgkio@gmail.com>
* Seems like 3.12 has concurrency issues when creating a subinterpreter, easiest workaround is just to lock during it.
* Only need this for PER_INTERPRETER_GIL
Created using [mini-swe-agent](https://mini-swe-agent.com) and the propmt:
I'd like to find usages of PYBIND11_MODULE in the docs folder and add py::mod_gil_not_used() as a third argument if there ar
e only two arguments. These are examples, and it's really a good idea to always include that now.
I removed a few of the changes.
Signed-off-by: Henry Schreiner <henryschreineriii@gmail.com>
C++20 can be enabled while the C++ runtime is still much older so use the feature macro to check for it.
For example, we are using the latest clang with c++23 on SLES, while the gcc version is 7.
When compiling an application using pybind11 3.0.0, GCC 13.3.0 and
python 3.11.13 the following warning is emitted [1]:
In function 'PyObject* PyCFunction_GET_SELF(PyObject*)',
inlined from 'void pybind11::cpp_function::initialize_generic(unique_function_record&&, const char*, const std::type_info* const*, pybind11::size_t)' at /opt/conda/lib/python3.11/site-packages/pybind11/include/pybind11/pybind11.h:605:30:
/opt/conda/include/python3.11/cpython/methodobject.h:50:16: error: potential null pointer dereference [-Werror=null-dereference]
50 | return _Py_NULL;
| ^~~~~~~~
It stems form the fact that PyCFunction_GET_SELF can return a nullptr.
Let's fail in this case.
[1]: https://gitlab.com/tango-controls/pytango/-/jobs/10671972312#L570
* [skip ci] Small docs/release.rst update, mainly to warn about `git push --tags`.
* Remove mention of `git push --tags`
Co-authored-by: Henry Schreiner <HenrySchreinerIII@gmail.com>
---------
Co-authored-by: Henry Schreiner <HenrySchreinerIII@gmail.com>
* Update docs/changelog.md and change version to v3.0.0 (final)
* [skip ci] Add `|SPEC 4 — Using and Creating Nightly Wheels|` badge in main README.rst