* Add fast_type_map, use it authoritatively for local types and as a hint for global types
nanobind has a similar two-level lookup strategy, added and explained
by
b515b1f7f2
In this PR I've ported this approach to pybind11. To avoid an ABI
break, I've kept the fast maps to the `local_internals`. I think this
should be safe because any particular module should see its
`local_internals` reset at least as often as the global `internals`,
and misses in the fast "hint" map for global types fall back to the
global `internals`.
Performance seems to have improved. Using my patched fork of
pybind11_benchmark
(https://github.com/swolchok/pybind11_benchmark/tree/benchmark-updates,
specifically commit hash b6613d12607104d547b1c10a8145d1b3e9937266), I
run bench.py and observe the MyInt case. Each time, I do 3 runs and
just report all 3.
master, Mac: 75.9, 76.9, 75.3 nsec/loop
this PR, Mac: 73.8, 73.8, 73.6 nsec/loop
master, Linux box: 188, 187, 188 nsec/loop
this PR, Linux box: 164, 165, 164 nsec/loop
Note that the "real" percentage improvement is larger than implied by the
above because master does not yet include #5824.
* simplify unsafe_reset_local_internals in test
* pre-implement PYBIND11_INTERNALS_VERSION 12
* use PYBIND11_INTERNALS_VERSION 12 on Python 3.14 per suggestion
* Implement reviewer comments: revert PY_VERSION_HEX change, fix REVIEW comment, add two-level lookup comments. ci.yml coming separately
* Use the inplace build to smoke test ABI bump?
* [skip ci] Remove "smoke" from comment. This is full testing, just only on a few platforms.
---------
Co-authored-by: Ralf W. Grosse-Kunstleve <rgrossekunst@nvidia.com>
* Avoid heap allocation for function calls with a small number of arguments
We don't have access to llvm::SmallVector or similar, but given the
limited subset of the `std::vector` API that
`function_call::args{,_convert}` need and the "reserve-then-fill"
usage pattern, it is relatively straightforward to implement custom
containers that get the job done.
Seems to improves time to call the collatz function in
pybind/pybind11_benchmark significantly; numbers are a little noisy
but there's a clear improvement from "about 60 ns per call" to "about
45 ns per call" on my machine (M4 Max Mac), as measured with
`timeit.repeat('collatz(4)', 'from pybind11_benchmark import
collatz')`.
* clang-tidy
* more clang-tidy
* clang-tidy NOLINTBEGIN/END instead of NOLINTNEXTLINE
* forgot to increase inline size after removing std::variant
* constexpr arg_vector_small_size, use move instead of swap to hopefully clarify second_pass_convert
* rename test_embed to test_low_level
* rename test_low_level to test_with_catch
* Be careful to NOINLINE slow paths
* rename array/vector members to iarray/hvector. Move comment per request. Add static_asserts for our untagged union implementation per request.
* drop is_standard_layout assertions; see https://github.com/pybind/pybind11/pull/5824#issuecomment-3308616072