mirror of
https://github.com/pybind/pybind11.git
synced 2026-03-14 20:27:47 +00:00
Change function calls to use vectorcall (#5948)
* Make argument_vector re-usable for other types. * Attempt to collect args into array for vectorcall * Revert "Attempt to collect args into array for vectorcall" This reverts commit418a034195. * Implement vectorcall args collector * pre-commit fixes * Checkpoint in moving to METH_FASTCALL * pre-commit fixes * Use the names tuple directly, cleaner code and less reference counting * Fix unit test, the code now holds more references It cannot re-use the incoming tuple as before, because it is no longer a tuple at all. So a new tuple must be created, which then holds references for each member. * Make clangtidy happy * Oops, _v is C++14 * style: pre-commit fixes * Minor code cleanup * Fix signed conversions * Fix args expansion This would be easier with `if constexpr` * style: pre-commit fixes * Code cleanup * fix(tests): Install multiple-interpreter test modules into wheel The `mod_per_interpreter_gil`, `mod_shared_interpreter_gil`, and `mod_per_interpreter_gil_with_singleton` modules were being built but not installed into the wheel when using scikit-build-core (SKBUILD=true). This caused iOS (and potentially Android) CIBW tests to fail with ModuleNotFoundError. Root cause analysis: - The main test targets have install() commands (line 531) - The PYBIND11_MULTIPLE_INTERPRETERS_TEST_MODULES were missing equivalent install() commands - For regular CMake builds, this wasn't a problem because LIBRARY_OUTPUT_DIRECTORY places the modules next to pybind11_tests - For wheel builds, only targets with explicit install() commands are included in the wheel This issue was latent until commitfee2527dchanged the test imports from `pytest.importorskip()` (graceful skip) to direct `import` statements (hard failure), which exposed the missing modules. Failing tests: - test_multiple_interpreters.py::test_independent_subinterpreters - test_multiple_interpreters.py::test_dependent_subinterpreters Error: ModuleNotFoundError: No module named 'mod_per_interpreter_gil' * tests: Pin numpy 2.4.0 for Python 3.14 CI tests Add numpy==2.4.0 requirement for Python 3.14 (both default and free-threaded builds). NumPy 2.4.0 is the first version to provide official PyPI wheels for Python 3.14: - numpy-2.4.0-cp314-cp314-manylinux_2_27_x86_64...whl (default) - numpy-2.4.0-cp314-cp314t-manylinux_2_27_x86_64...whl (free-threaded) Previously, CI was skipping all numpy-dependent tests for Python 3.14 because PIP_ONLY_BINARY was set and no wheels were available: SKIPPED [...] test_numpy_array.py:8: could not import 'numpy': No module named 'numpy' With this change, the full numpy test suite will run on Python 3.14, providing better test coverage for the newest Python version. Note: Using exact pin (==2.4.0) rather than compatible release (~=2.4.0) to ensure reproducible CI results with the first known-working version. * tests: Add verbose flag to CIBW pytest command Add `-v` to the pytest command in tests/pyproject.toml to help diagnose hanging tests in CIBW jobs (particularly iOS). This will show each test name as it runs, making it easier to identify which specific test is hanging. * tests: Skip subinterpreter tests on iOS, add pytest timeout - Add `IOS` platform constant to `tests/env.py` for consistency with existing `ANDROID`, `LINUX`, `MACOS`, `WIN`, `FREEBSD` constants. - Skip `test_multiple_interpreters.py` module on iOS. Subinterpreters are not supported in the iOS simulator environment. These tests were previously skipped implicitly because the modules weren't installed in the wheel; now that they are (commit6ed6d5a8), we need an explicit skip. - Change pytest timeout from 0 (disabled) to 120 seconds. This provides a safety net to catch hanging tests before the CI job times out after hours. Normal test runs complete in 33-55 seconds total (~1100 tests), so 120 seconds per test is very generous. - Add `-v` flag for verbose output to help diagnose any future issues. * More cleanups in argument vector, per comments. * Per Cursor, move all versions to Vectorcall since it has been supported since 3.8. This means getting rid of simple_collector, we can do the same with a constexpr if in the unpacking_collector. * Switch to a bool vec for the used_kwargs flag... This makes more sense and saves a sort, and the small_vector implementation means it will actually take less space than a vector of size_t elements. The most common case is that all kwargs are used. * Fix signedness for clang * Another signedness issue * tests: Disable pytest-timeout for Pyodide (no signal.setitimer) Pyodide runs in a WebAssembly sandbox without POSIX signals, so `signal.setitimer` is not available. This causes pytest-timeout to crash with `AttributeError: module 'signal' has no attribute 'setitimer'` when timeout > 0. Override the test-command for Pyodide to keep timeout=0 (disabled). * Combine temp storage and args into one vector It's a good bit faster at the cost of this one scary reinterpret_cast. * Phrasing * Delete incorrect comment At 6, the struct is 144 bytes (not 128 bytes as the comment said). * Fix push_back * Update push_back in argument_vector.h Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com> * style: pre-commit fixes * Use real types for these instead of object They can be null if you "steal" a null handle. * refactor: Replace small_vector<object> with ref_small_vector for explicit ownership Introduce `ref_small_vector` to manage PyObject* references in `unpacking_collector`, replacing the previous `small_vector<object>` approach. Primary goals: 1. **Maintainability**: The previous implementation relied on `sizeof(object) == sizeof(PyObject*)` and used a reinterpret_cast to pass the object array to vectorcall. This coupling to py::object's internal layout could break if someone adds a debug field or other member to py::handle/py::object in the future. 2. **Readability**: The new `push_back_steal()` vs `push_back_borrow()` API makes reference counting intent explicit at each call site, rather than relying on implicit py::object semantics. 3. **Intuitive code**: Storing `PyObject*` directly and passing it to `_PyObject_Vectorcall` without casts is straightforward and matches what the C API expects. No "scary" reinterpret_cast needed. Additional benefits: - `PyObject*` is trivially copyable, simplifying vector operations - Batch decref in destructor (tight loop vs N individual object destructors) - Self-documenting ownership semantics Design consideration: We considered folding the ref-counting functionality directly into `small_vector` via template specialization for `PyObject*`. We decided against this because: - It would give `small_vector<PyObject*, N>` a different interface than the generic `small_vector<T, N>` (steal/borrow vs push_back) - Someone might want a non-ref-counting `small_vector<PyObject*, N>` - The specialization behavior could surprise users expecting uniform semantics A separate `ref_small_vector` type makes the ref-counting behavior explicit and self-documenting, while keeping `small_vector` generic and predictable. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Ralf W. Grosse-Kunstleve <rgrossekunst@nvidia.com> Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>
This commit is contained in:
@@ -2078,8 +2078,7 @@ using is_pos_only = std::is_same<intrinsic_t<T>, pos_only>;
|
||||
// forward declaration (definition in attr.h)
|
||||
struct function_record;
|
||||
|
||||
/// (Inline size chosen mostly arbitrarily; 6 should pad function_call out to two cache lines
|
||||
/// (16 pointers) in size.)
|
||||
/// Inline size chosen mostly arbitrarily.
|
||||
constexpr std::size_t arg_vector_small_size = 6;
|
||||
|
||||
/// Internal data associated with a single function call
|
||||
@@ -2191,86 +2190,121 @@ private:
|
||||
std::tuple<make_caster<Args>...> argcasters;
|
||||
};
|
||||
|
||||
/// Helper class which collects only positional arguments for a Python function call.
|
||||
/// A fancier version below can collect any argument, but this one is optimal for simple calls.
|
||||
template <return_value_policy policy>
|
||||
class simple_collector {
|
||||
public:
|
||||
template <typename... Ts>
|
||||
explicit simple_collector(Ts &&...values)
|
||||
: m_args(pybind11::make_tuple<policy>(std::forward<Ts>(values)...)) {}
|
||||
|
||||
const tuple &args() const & { return m_args; }
|
||||
dict kwargs() const { return {}; }
|
||||
|
||||
tuple args() && { return std::move(m_args); }
|
||||
|
||||
/// Call a Python function and pass the collected arguments
|
||||
object call(PyObject *ptr) const {
|
||||
PyObject *result = PyObject_CallObject(ptr, m_args.ptr());
|
||||
if (!result) {
|
||||
throw error_already_set();
|
||||
}
|
||||
return reinterpret_steal<object>(result);
|
||||
}
|
||||
|
||||
private:
|
||||
tuple m_args;
|
||||
};
|
||||
// [workaround(intel)] Separate function required here
|
||||
// We need to put this into a separate function because the Intel compiler
|
||||
// fails to compile enable_if_t<!all_of<is_positional<Args>...>::value>
|
||||
// (tested with ICC 2021.1 Beta 20200827).
|
||||
template <typename... Args>
|
||||
constexpr bool args_has_keyword_or_ds() {
|
||||
return any_of<is_keyword_or_ds<Args>...>::value;
|
||||
}
|
||||
|
||||
/// Helper class which collects positional, keyword, * and ** arguments for a Python function call
|
||||
template <return_value_policy policy>
|
||||
class unpacking_collector {
|
||||
public:
|
||||
template <typename... Ts>
|
||||
explicit unpacking_collector(Ts &&...values) {
|
||||
// Tuples aren't (easily) resizable so a list is needed for collection,
|
||||
// but the actual function call strictly requires a tuple.
|
||||
auto args_list = list();
|
||||
using expander = int[];
|
||||
(void) expander{0, (process(args_list, std::forward<Ts>(values)), 0)...};
|
||||
explicit unpacking_collector(Ts &&...values)
|
||||
: m_names(reinterpret_steal<tuple>(
|
||||
handle())) // initialize to null to avoid useless allocation of 0-length tuple
|
||||
{
|
||||
/*
|
||||
Python can sometimes utilize an extra space before the arguments to prepend `self`.
|
||||
This is important enough that there is a special flag for it:
|
||||
PY_VECTORCALL_ARGUMENTS_OFFSET.
|
||||
All we have to do is allocate an extra space at the beginning of this array, and set the
|
||||
flag. Note that the extra space is not passed directly in to vectorcall.
|
||||
*/
|
||||
m_args.reserve(sizeof...(values) + 1);
|
||||
m_args.push_back_null();
|
||||
|
||||
m_args = std::move(args_list);
|
||||
if (args_has_keyword_or_ds<Ts...>()) {
|
||||
list names_list;
|
||||
|
||||
// collect_arguments guarantees this can't be constructed with kwargs before the last
|
||||
// positional so we don't need to worry about Ts... being in anything but normal python
|
||||
// order.
|
||||
using expander = int[];
|
||||
(void) expander{0, (process(names_list, std::forward<Ts>(values)), 0)...};
|
||||
|
||||
m_names = reinterpret_steal<tuple>(PyList_AsTuple(names_list.ptr()));
|
||||
} else {
|
||||
auto not_used
|
||||
= reinterpret_steal<list>(handle()); // initialize as null (to avoid an allocation)
|
||||
|
||||
using expander = int[];
|
||||
(void) expander{0, (process(not_used, std::forward<Ts>(values)), 0)...};
|
||||
}
|
||||
}
|
||||
|
||||
const tuple &args() const & { return m_args; }
|
||||
const dict &kwargs() const & { return m_kwargs; }
|
||||
|
||||
tuple args() && { return std::move(m_args); }
|
||||
dict kwargs() && { return std::move(m_kwargs); }
|
||||
|
||||
/// Call a Python function and pass the collected arguments
|
||||
object call(PyObject *ptr) const {
|
||||
PyObject *result = PyObject_Call(ptr, m_args.ptr(), m_kwargs.ptr());
|
||||
size_t nargs = m_args.size() - 1; // -1 for PY_VECTORCALL_ARGUMENTS_OFFSET (see ctor)
|
||||
if (m_names) {
|
||||
nargs -= m_names.size();
|
||||
}
|
||||
PyObject *result = _PyObject_Vectorcall(
|
||||
ptr, m_args.data() + 1, nargs | PY_VECTORCALL_ARGUMENTS_OFFSET, m_names.ptr());
|
||||
if (!result) {
|
||||
throw error_already_set();
|
||||
}
|
||||
return reinterpret_steal<object>(result);
|
||||
}
|
||||
|
||||
tuple args() const {
|
||||
size_t nargs = m_args.size() - 1; // -1 for PY_VECTORCALL_ARGUMENTS_OFFSET (see ctor)
|
||||
if (m_names) {
|
||||
nargs -= m_names.size();
|
||||
}
|
||||
tuple val(nargs);
|
||||
for (size_t i = 0; i < nargs; ++i) {
|
||||
// +1 for PY_VECTORCALL_ARGUMENTS_OFFSET (see ctor)
|
||||
val[i] = reinterpret_borrow<object>(m_args[i + 1]);
|
||||
}
|
||||
return val;
|
||||
}
|
||||
|
||||
dict kwargs() const {
|
||||
dict val;
|
||||
if (m_names) {
|
||||
size_t offset = m_args.size() - m_names.size();
|
||||
for (size_t i = 0; i < m_names.size(); ++i, ++offset) {
|
||||
val[m_names[i]] = reinterpret_borrow<object>(m_args[offset]);
|
||||
}
|
||||
}
|
||||
return val;
|
||||
}
|
||||
|
||||
private:
|
||||
// normal argument, possibly needing conversion
|
||||
template <typename T>
|
||||
void process(list &args_list, T &&x) {
|
||||
auto o = reinterpret_steal<object>(
|
||||
detail::make_caster<T>::cast(std::forward<T>(x), policy, {}));
|
||||
if (!o) {
|
||||
void process(list & /*names_list*/, T &&x) {
|
||||
handle h = detail::make_caster<T>::cast(std::forward<T>(x), policy, {});
|
||||
if (!h) {
|
||||
#if !defined(PYBIND11_DETAILED_ERROR_MESSAGES)
|
||||
throw cast_error_unable_to_convert_call_arg(std::to_string(args_list.size()));
|
||||
throw cast_error_unable_to_convert_call_arg(std::to_string(m_args.size() - 1));
|
||||
#else
|
||||
throw cast_error_unable_to_convert_call_arg(std::to_string(args_list.size()),
|
||||
throw cast_error_unable_to_convert_call_arg(std::to_string(m_args.size() - 1),
|
||||
type_id<T>());
|
||||
#endif
|
||||
}
|
||||
args_list.append(std::move(o));
|
||||
m_args.push_back_steal(h.ptr()); // cast returns a new reference
|
||||
}
|
||||
|
||||
void process(list &args_list, detail::args_proxy ap) {
|
||||
// * unpacking
|
||||
void process(list & /*names_list*/, detail::args_proxy ap) {
|
||||
if (!ap) {
|
||||
return;
|
||||
}
|
||||
for (auto a : ap) {
|
||||
args_list.append(a);
|
||||
m_args.push_back_borrow(a.ptr());
|
||||
}
|
||||
}
|
||||
|
||||
void process(list & /*args_list*/, arg_v a) {
|
||||
// named argument
|
||||
// NOLINTNEXTLINE(performance-unnecessary-value-param)
|
||||
void process(list &names_list, arg_v a) {
|
||||
assert(names_list);
|
||||
if (!a.name) {
|
||||
#if !defined(PYBIND11_DETAILED_ERROR_MESSAGES)
|
||||
nameless_argument_error();
|
||||
@@ -2278,7 +2312,8 @@ private:
|
||||
nameless_argument_error(a.type);
|
||||
#endif
|
||||
}
|
||||
if (m_kwargs.contains(a.name)) {
|
||||
auto name = str(a.name);
|
||||
if (names_list.contains(name)) {
|
||||
#if !defined(PYBIND11_DETAILED_ERROR_MESSAGES)
|
||||
multiple_values_error();
|
||||
#else
|
||||
@@ -2292,22 +2327,27 @@ private:
|
||||
throw cast_error_unable_to_convert_call_arg(a.name, a.type);
|
||||
#endif
|
||||
}
|
||||
m_kwargs[a.name] = std::move(a.value);
|
||||
names_list.append(std::move(name));
|
||||
m_args.push_back_borrow(a.value.ptr());
|
||||
}
|
||||
|
||||
void process(list & /*args_list*/, detail::kwargs_proxy kp) {
|
||||
// ** unpacking
|
||||
void process(list &names_list, detail::kwargs_proxy kp) {
|
||||
if (!kp) {
|
||||
return;
|
||||
}
|
||||
for (auto k : reinterpret_borrow<dict>(kp)) {
|
||||
if (m_kwargs.contains(k.first)) {
|
||||
assert(names_list);
|
||||
for (auto &&k : reinterpret_borrow<dict>(kp)) {
|
||||
auto name = str(k.first);
|
||||
if (names_list.contains(name)) {
|
||||
#if !defined(PYBIND11_DETAILED_ERROR_MESSAGES)
|
||||
multiple_values_error();
|
||||
#else
|
||||
multiple_values_error(str(k.first));
|
||||
multiple_values_error(name);
|
||||
#endif
|
||||
}
|
||||
m_kwargs[k.first] = k.second;
|
||||
names_list.append(std::move(name));
|
||||
m_args.push_back_borrow(k.second.ptr());
|
||||
}
|
||||
}
|
||||
|
||||
@@ -2333,39 +2373,20 @@ private:
|
||||
}
|
||||
|
||||
private:
|
||||
tuple m_args;
|
||||
dict m_kwargs;
|
||||
ref_small_vector<arg_vector_small_size> m_args;
|
||||
tuple m_names;
|
||||
};
|
||||
|
||||
// [workaround(intel)] Separate function required here
|
||||
// We need to put this into a separate function because the Intel compiler
|
||||
// fails to compile enable_if_t<!all_of<is_positional<Args>...>::value>
|
||||
// (tested with ICC 2021.1 Beta 20200827).
|
||||
template <typename... Args>
|
||||
constexpr bool args_are_all_positional() {
|
||||
return all_of<is_positional<Args>...>::value;
|
||||
}
|
||||
|
||||
/// Collect only positional arguments for a Python function call
|
||||
template <return_value_policy policy,
|
||||
typename... Args,
|
||||
typename = enable_if_t<args_are_all_positional<Args...>()>>
|
||||
simple_collector<policy> collect_arguments(Args &&...args) {
|
||||
return simple_collector<policy>(std::forward<Args>(args)...);
|
||||
}
|
||||
|
||||
/// Collect all arguments, including keywords and unpacking (only instantiated when needed)
|
||||
template <return_value_policy policy,
|
||||
typename... Args,
|
||||
typename = enable_if_t<!args_are_all_positional<Args...>()>>
|
||||
/// Collect all arguments, including keywords and unpacking
|
||||
template <return_value_policy policy, typename... Args>
|
||||
unpacking_collector<policy> collect_arguments(Args &&...args) {
|
||||
// Following argument order rules for generalized unpacking according to PEP 448
|
||||
static_assert(constexpr_last<is_positional, Args...>()
|
||||
< constexpr_first<is_keyword_or_ds, Args...>()
|
||||
&& constexpr_last<is_s_unpacking, Args...>()
|
||||
< constexpr_first<is_ds_unpacking, Args...>(),
|
||||
"Invalid function call: positional args must precede keywords and ** unpacking; "
|
||||
"* unpacking must precede ** unpacking");
|
||||
static_assert(
|
||||
constexpr_last<is_positional, Args...>() < constexpr_first<is_keyword_or_ds, Args...>(),
|
||||
"Invalid function call: positional args must precede keywords and */** unpacking;");
|
||||
static_assert(constexpr_last<is_s_unpacking, Args...>()
|
||||
< constexpr_first<is_ds_unpacking, Args...>(),
|
||||
"Invalid function call: * unpacking must precede ** unpacking");
|
||||
return unpacking_collector<policy>(std::forward<Args>(args)...);
|
||||
}
|
||||
|
||||
|
||||
@@ -66,24 +66,23 @@ union inline_array_or_vector {
|
||||
inline_array iarray;
|
||||
heap_vector hvector;
|
||||
|
||||
static_assert(std::is_trivially_move_constructible<ArrayT>::value,
|
||||
"ArrayT must be trivially move constructible");
|
||||
static_assert(std::is_trivially_destructible<ArrayT>::value,
|
||||
"ArrayT must be trivially destructible");
|
||||
|
||||
inline_array_or_vector() : iarray() {}
|
||||
|
||||
~inline_array_or_vector() {
|
||||
if (!is_inline()) {
|
||||
if (is_inline()) {
|
||||
iarray.~inline_array();
|
||||
} else {
|
||||
hvector.~heap_vector();
|
||||
}
|
||||
}
|
||||
|
||||
// Disable copy ctor and assignment.
|
||||
inline_array_or_vector(const inline_array_or_vector &) = delete;
|
||||
inline_array_or_vector &operator=(const inline_array_or_vector &) = delete;
|
||||
|
||||
inline_array_or_vector(inline_array_or_vector &&rhs) noexcept {
|
||||
if (rhs.is_inline()) {
|
||||
std::memcpy(&iarray, &rhs.iarray, sizeof(iarray));
|
||||
new (&iarray) inline_array(std::move(rhs.iarray));
|
||||
} else {
|
||||
new (&hvector) heap_vector(std::move(rhs.hvector));
|
||||
}
|
||||
@@ -95,17 +94,16 @@ union inline_array_or_vector {
|
||||
return *this;
|
||||
}
|
||||
|
||||
if (rhs.is_inline()) {
|
||||
if (!is_inline()) {
|
||||
hvector.~heap_vector();
|
||||
}
|
||||
std::memcpy(&iarray, &rhs.iarray, sizeof(iarray));
|
||||
if (is_inline()) {
|
||||
iarray.~inline_array();
|
||||
} else {
|
||||
if (is_inline()) {
|
||||
new (&hvector) heap_vector(std::move(rhs.hvector));
|
||||
} else {
|
||||
hvector = std::move(rhs.hvector);
|
||||
}
|
||||
hvector.~heap_vector();
|
||||
}
|
||||
|
||||
if (rhs.is_inline()) {
|
||||
new (&iarray) inline_array(std::move(rhs.iarray));
|
||||
} else {
|
||||
new (&hvector) heap_vector(std::move(rhs.hvector));
|
||||
}
|
||||
return *this;
|
||||
}
|
||||
@@ -126,18 +124,16 @@ union inline_array_or_vector {
|
||||
}
|
||||
};
|
||||
|
||||
// small_vector-like container to avoid heap allocation for N or fewer
|
||||
// arguments.
|
||||
template <std::size_t N>
|
||||
struct argument_vector {
|
||||
template <typename T, std::size_t InlineSize>
|
||||
struct small_vector {
|
||||
public:
|
||||
argument_vector() = default;
|
||||
small_vector() = default;
|
||||
|
||||
// Disable copy ctor and assignment.
|
||||
argument_vector(const argument_vector &) = delete;
|
||||
argument_vector &operator=(const argument_vector &) = delete;
|
||||
argument_vector(argument_vector &&) noexcept = default;
|
||||
argument_vector &operator=(argument_vector &&) noexcept = default;
|
||||
small_vector(const small_vector &) = delete;
|
||||
small_vector &operator=(const small_vector &) = delete;
|
||||
small_vector(small_vector &&) noexcept = default;
|
||||
small_vector &operator=(small_vector &&) noexcept = default;
|
||||
|
||||
std::size_t size() const {
|
||||
if (is_inline()) {
|
||||
@@ -146,7 +142,14 @@ public:
|
||||
return m_repr.hvector.vec.size();
|
||||
}
|
||||
|
||||
handle &operator[](std::size_t idx) {
|
||||
T const *data() const {
|
||||
if (is_inline()) {
|
||||
return m_repr.iarray.arr.data();
|
||||
}
|
||||
return m_repr.hvector.vec.data();
|
||||
}
|
||||
|
||||
T &operator[](std::size_t idx) {
|
||||
assert(idx < size());
|
||||
if (is_inline()) {
|
||||
return m_repr.iarray.arr[idx];
|
||||
@@ -154,7 +157,7 @@ public:
|
||||
return m_repr.hvector.vec[idx];
|
||||
}
|
||||
|
||||
handle operator[](std::size_t idx) const {
|
||||
T const &operator[](std::size_t idx) const {
|
||||
assert(idx < size());
|
||||
if (is_inline()) {
|
||||
return m_repr.iarray.arr[idx];
|
||||
@@ -162,28 +165,28 @@ public:
|
||||
return m_repr.hvector.vec[idx];
|
||||
}
|
||||
|
||||
void push_back(handle x) {
|
||||
void push_back(const T &x) { emplace_back(x); }
|
||||
|
||||
void push_back(T &&x) { emplace_back(std::move(x)); }
|
||||
|
||||
template <typename... Args>
|
||||
void emplace_back(Args &&...x) {
|
||||
if (is_inline()) {
|
||||
auto &ha = m_repr.iarray;
|
||||
if (ha.size == N) {
|
||||
move_to_heap_vector_with_reserved_size(N + 1);
|
||||
push_back_slow_path(x);
|
||||
if (ha.size == InlineSize) {
|
||||
move_to_heap_vector_with_reserved_size(InlineSize + 1);
|
||||
m_repr.hvector.vec.emplace_back(std::forward<Args>(x)...);
|
||||
} else {
|
||||
ha.arr[ha.size++] = x;
|
||||
ha.arr[ha.size++] = T(std::forward<Args>(x)...);
|
||||
}
|
||||
} else {
|
||||
push_back_slow_path(x);
|
||||
m_repr.hvector.vec.emplace_back(std::forward<Args>(x)...);
|
||||
}
|
||||
}
|
||||
|
||||
template <typename Arg>
|
||||
void emplace_back(Arg &&x) {
|
||||
push_back(handle(x));
|
||||
}
|
||||
|
||||
void reserve(std::size_t sz) {
|
||||
if (is_inline()) {
|
||||
if (sz > N) {
|
||||
if (sz > InlineSize) {
|
||||
move_to_heap_vector_with_reserved_size(sz);
|
||||
}
|
||||
} else {
|
||||
@@ -192,7 +195,7 @@ public:
|
||||
}
|
||||
|
||||
private:
|
||||
using repr_type = inline_array_or_vector<handle, N>;
|
||||
using repr_type = inline_array_or_vector<T, InlineSize>;
|
||||
repr_type m_repr;
|
||||
|
||||
PYBIND11_NOINLINE void move_to_heap_vector_with_reserved_size(std::size_t reserved_size) {
|
||||
@@ -201,32 +204,33 @@ private:
|
||||
using heap_vector = typename repr_type::heap_vector;
|
||||
heap_vector hv;
|
||||
hv.vec.reserve(reserved_size);
|
||||
std::copy(ha.arr.begin(), ha.arr.begin() + ha.size, std::back_inserter(hv.vec));
|
||||
static_assert(std::is_nothrow_move_constructible<T>::value,
|
||||
"this conversion is not exception safe");
|
||||
static_assert(std::is_nothrow_move_constructible<heap_vector>::value,
|
||||
"this conversion is not exception safe");
|
||||
std::move(ha.arr.begin(), ha.arr.begin() + ha.size, std::back_inserter(hv.vec));
|
||||
new (&m_repr.hvector) heap_vector(std::move(hv));
|
||||
}
|
||||
|
||||
PYBIND11_NOINLINE void push_back_slow_path(handle x) { m_repr.hvector.vec.push_back(x); }
|
||||
|
||||
PYBIND11_NOINLINE void reserve_slow_path(std::size_t sz) { m_repr.hvector.vec.reserve(sz); }
|
||||
|
||||
bool is_inline() const { return m_repr.is_inline(); }
|
||||
};
|
||||
|
||||
// small_vector-like container to avoid heap allocation for N or fewer
|
||||
// arguments.
|
||||
// Container to avoid heap allocation for kRequestedInlineSize or fewer booleans.
|
||||
template <std::size_t kRequestedInlineSize>
|
||||
struct args_convert_vector {
|
||||
struct small_vector<bool, kRequestedInlineSize> {
|
||||
private:
|
||||
public:
|
||||
args_convert_vector() = default;
|
||||
small_vector() = default;
|
||||
|
||||
// Disable copy ctor and assignment.
|
||||
args_convert_vector(const args_convert_vector &) = delete;
|
||||
args_convert_vector &operator=(const args_convert_vector &) = delete;
|
||||
args_convert_vector(args_convert_vector &&) noexcept = default;
|
||||
args_convert_vector &operator=(args_convert_vector &&) noexcept = default;
|
||||
small_vector(const small_vector &) = delete;
|
||||
small_vector &operator=(const small_vector &) = delete;
|
||||
small_vector(small_vector &&) noexcept = default;
|
||||
small_vector &operator=(small_vector &&) noexcept = default;
|
||||
|
||||
args_convert_vector(std::size_t count, bool value) {
|
||||
small_vector(std::size_t count, bool value) {
|
||||
if (count > kInlineSize) {
|
||||
new (&m_repr.hvector) typename repr_type::heap_vector(count, value);
|
||||
} else {
|
||||
@@ -284,7 +288,24 @@ public:
|
||||
}
|
||||
}
|
||||
|
||||
void swap(args_convert_vector &rhs) noexcept { std::swap(m_repr, rhs.m_repr); }
|
||||
void set(std::size_t idx, bool value = true) {
|
||||
if (is_inline()) {
|
||||
auto &ha = m_repr.iarray;
|
||||
assert(ha.size < kInlineSize);
|
||||
const auto wbi = word_and_bit_index(idx);
|
||||
assert(wbi.word < kWords);
|
||||
assert(wbi.bit < kBitsPerWord);
|
||||
if (value) {
|
||||
ha.arr[wbi.word] |= (static_cast<std::size_t>(1) << wbi.bit);
|
||||
} else {
|
||||
ha.arr[wbi.word] &= ~(static_cast<std::size_t>(1) << wbi.bit);
|
||||
}
|
||||
} else {
|
||||
m_repr.hvector.vec[idx] = value;
|
||||
}
|
||||
}
|
||||
|
||||
void swap(small_vector &rhs) noexcept { std::swap(m_repr, rhs.m_repr); }
|
||||
|
||||
private:
|
||||
struct WordAndBitIndex {
|
||||
@@ -326,5 +347,71 @@ private:
|
||||
bool is_inline() const { return m_repr.is_inline(); }
|
||||
};
|
||||
|
||||
// Container to avoid heap allocation for N or fewer arguments.
|
||||
template <size_t N>
|
||||
using argument_vector = small_vector<handle, N>;
|
||||
|
||||
// Container to avoid heap allocation for N or fewer booleans.
|
||||
template <size_t N>
|
||||
using args_convert_vector = small_vector<bool, N>;
|
||||
|
||||
/// A small_vector of PyObject* that holds references and releases them on destruction.
|
||||
/// This provides explicit ownership semantics without relying on py::object's
|
||||
/// destructor, and avoids the need for reinterpret_cast when passing to vectorcall.
|
||||
template <std::size_t InlineSize>
|
||||
class ref_small_vector {
|
||||
public:
|
||||
ref_small_vector() = default;
|
||||
|
||||
~ref_small_vector() {
|
||||
for (std::size_t i = 0; i < m_ptrs.size(); ++i) {
|
||||
Py_XDECREF(m_ptrs[i]);
|
||||
}
|
||||
}
|
||||
|
||||
// Disable copy (prevent accidental double-decref)
|
||||
ref_small_vector(const ref_small_vector &) = delete;
|
||||
ref_small_vector &operator=(const ref_small_vector &) = delete;
|
||||
|
||||
// Move is allowed
|
||||
ref_small_vector(ref_small_vector &&other) noexcept : m_ptrs(std::move(other.m_ptrs)) {
|
||||
// other.m_ptrs is now empty, so its destructor won't decref anything
|
||||
}
|
||||
|
||||
ref_small_vector &operator=(ref_small_vector &&other) noexcept {
|
||||
if (this != &other) {
|
||||
// Decref our current contents
|
||||
for (std::size_t i = 0; i < m_ptrs.size(); ++i) {
|
||||
Py_XDECREF(m_ptrs[i]);
|
||||
}
|
||||
m_ptrs = std::move(other.m_ptrs);
|
||||
}
|
||||
return *this;
|
||||
}
|
||||
|
||||
/// Add a pointer, taking ownership (no incref, will decref on destruction)
|
||||
void push_back_steal(PyObject *p) { m_ptrs.push_back(p); }
|
||||
|
||||
/// Add a pointer, borrowing (increfs now, will decref on destruction)
|
||||
void push_back_borrow(PyObject *p) {
|
||||
Py_XINCREF(p);
|
||||
m_ptrs.push_back(p);
|
||||
}
|
||||
|
||||
/// Add a null pointer (for PY_VECTORCALL_ARGUMENTS_OFFSET slot)
|
||||
void push_back_null() { m_ptrs.push_back(nullptr); }
|
||||
|
||||
void reserve(std::size_t sz) { m_ptrs.reserve(sz); }
|
||||
|
||||
std::size_t size() const { return m_ptrs.size(); }
|
||||
|
||||
PyObject *operator[](std::size_t idx) const { return m_ptrs[idx]; }
|
||||
|
||||
PyObject *const *data() const { return m_ptrs.data(); }
|
||||
|
||||
private:
|
||||
small_vector<PyObject *, InlineSize> m_ptrs;
|
||||
};
|
||||
|
||||
PYBIND11_NAMESPACE_END(detail)
|
||||
PYBIND11_NAMESPACE_END(PYBIND11_NAMESPACE)
|
||||
|
||||
@@ -684,7 +684,7 @@ protected:
|
||||
rec->def->ml_name = rec->name;
|
||||
rec->def->ml_meth
|
||||
= reinterpret_cast<PyCFunction>(reinterpret_cast<void (*)()>(dispatcher));
|
||||
rec->def->ml_flags = METH_VARARGS | METH_KEYWORDS;
|
||||
rec->def->ml_flags = METH_FASTCALL | METH_KEYWORDS;
|
||||
|
||||
object py_func_rec = detail::function_record_PyObject_New();
|
||||
(reinterpret_cast<detail::function_record_PyObject *>(py_func_rec.ptr()))->cpp_func_rec
|
||||
@@ -847,7 +847,8 @@ protected:
|
||||
}
|
||||
|
||||
/// Main dispatch logic for calls to functions bound using pybind11
|
||||
static PyObject *dispatcher(PyObject *self, PyObject *args_in, PyObject *kwargs_in) {
|
||||
static PyObject *
|
||||
dispatcher(PyObject *self, PyObject *const *args_in_arr, size_t nargsf, PyObject *kwnames_in) {
|
||||
using namespace detail;
|
||||
const function_record *overloads = function_record_ptr_from_PyObject(self);
|
||||
assert(overloads != nullptr);
|
||||
@@ -857,9 +858,9 @@ protected:
|
||||
|
||||
/* Need to know how many arguments + keyword arguments there are to pick the right
|
||||
overload */
|
||||
const auto n_args_in = static_cast<size_t>(PyTuple_GET_SIZE(args_in));
|
||||
const auto n_args_in = static_cast<size_t>(PyVectorcall_NARGS(nargsf));
|
||||
|
||||
handle parent = n_args_in > 0 ? PyTuple_GET_ITEM(args_in, 0) : nullptr,
|
||||
handle parent = n_args_in > 0 ? args_in_arr[0] : nullptr,
|
||||
result = PYBIND11_TRY_NEXT_OVERLOAD;
|
||||
|
||||
auto self_value_and_holder = value_and_holder();
|
||||
@@ -948,7 +949,7 @@ protected:
|
||||
self_value_and_holder.type->dealloc(self_value_and_holder);
|
||||
}
|
||||
|
||||
call.init_self = PyTuple_GET_ITEM(args_in, 0);
|
||||
call.init_self = args_in_arr[0];
|
||||
call.args.emplace_back(reinterpret_cast<PyObject *>(&self_value_and_holder));
|
||||
call.args_convert.push_back(false);
|
||||
++args_copied;
|
||||
@@ -959,17 +960,24 @@ protected:
|
||||
for (; args_copied < args_to_copy; ++args_copied) {
|
||||
const argument_record *arg_rec
|
||||
= args_copied < func.args.size() ? &func.args[args_copied] : nullptr;
|
||||
if (kwargs_in && arg_rec && arg_rec->name
|
||||
&& dict_getitemstring(kwargs_in, arg_rec->name)) {
|
||||
|
||||
/* if the argument is listed in the call site's kwargs, but the argument is
|
||||
also fulfilled positionally, then the call can't match this overload. for
|
||||
example, the call site is: foo(0, key=1) but our overload is foo(key:int) then
|
||||
this call can't be for us, because it would be invalid.
|
||||
*/
|
||||
if (kwnames_in && arg_rec && arg_rec->name
|
||||
&& keyword_index(kwnames_in, arg_rec->name) >= 0) {
|
||||
bad_arg = true;
|
||||
break;
|
||||
}
|
||||
|
||||
handle arg(PyTuple_GET_ITEM(args_in, args_copied));
|
||||
handle arg(args_in_arr[args_copied]);
|
||||
if (arg_rec && !arg_rec->none && arg.is_none()) {
|
||||
bad_arg = true;
|
||||
break;
|
||||
}
|
||||
|
||||
call.args.push_back(arg);
|
||||
call.args_convert.push_back(arg_rec ? arg_rec->convert : true);
|
||||
}
|
||||
@@ -981,20 +989,12 @@ protected:
|
||||
// to copy the rest into a py::args argument.
|
||||
size_t positional_args_copied = args_copied;
|
||||
|
||||
// We'll need to copy this if we steal some kwargs for defaults
|
||||
dict kwargs = reinterpret_borrow<dict>(kwargs_in);
|
||||
|
||||
// 1.5. Fill in any missing pos_only args from defaults if they exist
|
||||
if (args_copied < func.nargs_pos_only) {
|
||||
for (; args_copied < func.nargs_pos_only; ++args_copied) {
|
||||
const auto &arg_rec = func.args[args_copied];
|
||||
handle value;
|
||||
|
||||
if (arg_rec.value) {
|
||||
value = arg_rec.value;
|
||||
}
|
||||
if (value) {
|
||||
call.args.push_back(value);
|
||||
call.args.push_back(arg_rec.value);
|
||||
call.args_convert.push_back(arg_rec.convert);
|
||||
} else {
|
||||
break;
|
||||
@@ -1007,46 +1007,42 @@ protected:
|
||||
}
|
||||
|
||||
// 2. Check kwargs and, failing that, defaults that may help complete the list
|
||||
small_vector<bool, arg_vector_small_size> used_kwargs(
|
||||
kwnames_in ? static_cast<size_t>(PyTuple_GET_SIZE(kwnames_in)) : 0, false);
|
||||
size_t used_kwargs_count = 0;
|
||||
if (args_copied < num_args) {
|
||||
bool copied_kwargs = false;
|
||||
|
||||
for (; args_copied < num_args; ++args_copied) {
|
||||
const auto &arg_rec = func.args[args_copied];
|
||||
|
||||
handle value;
|
||||
if (kwargs_in && arg_rec.name) {
|
||||
value = dict_getitemstring(kwargs.ptr(), arg_rec.name);
|
||||
if (kwnames_in && arg_rec.name) {
|
||||
ssize_t i = keyword_index(kwnames_in, arg_rec.name);
|
||||
if (i >= 0) {
|
||||
value = args_in_arr[n_args_in + static_cast<size_t>(i)];
|
||||
used_kwargs.set(static_cast<size_t>(i), true);
|
||||
used_kwargs_count++;
|
||||
}
|
||||
}
|
||||
|
||||
if (value) {
|
||||
// Consume a kwargs value
|
||||
if (!copied_kwargs) {
|
||||
kwargs = reinterpret_steal<dict>(PyDict_Copy(kwargs.ptr()));
|
||||
copied_kwargs = true;
|
||||
}
|
||||
if (PyDict_DelItemString(kwargs.ptr(), arg_rec.name) == -1) {
|
||||
throw error_already_set();
|
||||
}
|
||||
} else if (arg_rec.value) {
|
||||
if (!value) {
|
||||
value = arg_rec.value;
|
||||
if (!value) {
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
if (!arg_rec.none && value.is_none()) {
|
||||
break;
|
||||
}
|
||||
|
||||
if (value) {
|
||||
// If we're at the py::args index then first insert a stub for it to be
|
||||
// replaced later
|
||||
if (func.has_args && call.args.size() == func.nargs_pos) {
|
||||
call.args.push_back(none());
|
||||
}
|
||||
|
||||
call.args.push_back(value);
|
||||
call.args_convert.push_back(arg_rec.convert);
|
||||
} else {
|
||||
break;
|
||||
// If we're at the py::args index then first insert a stub for it to be
|
||||
// replaced later
|
||||
if (func.has_args && call.args.size() == func.nargs_pos) {
|
||||
call.args.push_back(none());
|
||||
}
|
||||
|
||||
call.args.push_back(value);
|
||||
call.args_convert.push_back(arg_rec.convert);
|
||||
}
|
||||
|
||||
if (args_copied < num_args) {
|
||||
@@ -1056,47 +1052,46 @@ protected:
|
||||
}
|
||||
|
||||
// 3. Check everything was consumed (unless we have a kwargs arg)
|
||||
if (kwargs && !kwargs.empty() && !func.has_kwargs) {
|
||||
if (!func.has_kwargs && used_kwargs_count < used_kwargs.size()) {
|
||||
continue; // Unconsumed kwargs, but no py::kwargs argument to accept them
|
||||
}
|
||||
|
||||
// 4a. If we have a py::args argument, create a new tuple with leftovers
|
||||
if (func.has_args) {
|
||||
tuple extra_args;
|
||||
if (args_to_copy == 0) {
|
||||
// We didn't copy out any position arguments from the args_in tuple, so we
|
||||
// can reuse it directly without copying:
|
||||
extra_args = reinterpret_borrow<tuple>(args_in);
|
||||
} else if (positional_args_copied >= n_args_in) {
|
||||
extra_args = tuple(0);
|
||||
if (positional_args_copied >= n_args_in) {
|
||||
call.args_ref = tuple(0);
|
||||
} else {
|
||||
size_t args_size = n_args_in - positional_args_copied;
|
||||
extra_args = tuple(args_size);
|
||||
tuple extra_args(args_size);
|
||||
for (size_t i = 0; i < args_size; ++i) {
|
||||
extra_args[i] = PyTuple_GET_ITEM(args_in, positional_args_copied + i);
|
||||
extra_args[i] = args_in_arr[positional_args_copied + i];
|
||||
}
|
||||
call.args_ref = std::move(extra_args);
|
||||
}
|
||||
if (call.args.size() <= func.nargs_pos) {
|
||||
call.args.push_back(extra_args);
|
||||
call.args.push_back(call.args_ref);
|
||||
} else {
|
||||
call.args[func.nargs_pos] = extra_args;
|
||||
call.args[func.nargs_pos] = call.args_ref;
|
||||
}
|
||||
call.args_convert.push_back(false);
|
||||
call.args_ref = std::move(extra_args);
|
||||
}
|
||||
|
||||
// 4b. If we have a py::kwargs, pass on any remaining kwargs
|
||||
if (func.has_kwargs) {
|
||||
if (!kwargs.ptr()) {
|
||||
kwargs = dict(); // If we didn't get one, send an empty one
|
||||
dict kwargs;
|
||||
for (size_t i = 0; i < used_kwargs.size(); ++i) {
|
||||
if (!used_kwargs[i]) {
|
||||
kwargs[PyTuple_GET_ITEM(kwnames_in, i)] = args_in_arr[n_args_in + i];
|
||||
}
|
||||
}
|
||||
call.args.push_back(kwargs);
|
||||
call.args_convert.push_back(false);
|
||||
call.kwargs_ref = std::move(kwargs);
|
||||
}
|
||||
|
||||
// 5. Put everything in a vector. Not technically step 5, we've been building it
|
||||
// in `call.args` all along.
|
||||
// 5. Put everything in a vector. Not technically step 5, we've been building it
|
||||
// in `call.args` all along.
|
||||
|
||||
#if defined(PYBIND11_DETAILED_ERROR_MESSAGES)
|
||||
if (call.args.size() != func.nargs || call.args_convert.size() != func.nargs) {
|
||||
pybind11_fail("Internal error: function call dispatcher inserted wrong number "
|
||||
@@ -1227,40 +1222,37 @@ protected:
|
||||
msg += '\n';
|
||||
}
|
||||
msg += "\nInvoked with: ";
|
||||
auto args_ = reinterpret_borrow<tuple>(args_in);
|
||||
bool some_args = false;
|
||||
for (size_t ti = overloads->is_constructor ? 1 : 0; ti < args_.size(); ++ti) {
|
||||
for (size_t ti = overloads->is_constructor ? 1 : 0; ti < n_args_in; ++ti) {
|
||||
if (!some_args) {
|
||||
some_args = true;
|
||||
} else {
|
||||
msg += ", ";
|
||||
}
|
||||
try {
|
||||
msg += pybind11::repr(args_[ti]);
|
||||
msg += pybind11::repr(args_in_arr[ti]);
|
||||
} catch (const error_already_set &) {
|
||||
msg += "<repr raised Error>";
|
||||
}
|
||||
}
|
||||
if (kwargs_in) {
|
||||
auto kwargs = reinterpret_borrow<dict>(kwargs_in);
|
||||
if (!kwargs.empty()) {
|
||||
if (some_args) {
|
||||
msg += "; ";
|
||||
if (kwnames_in && PyTuple_GET_SIZE(kwnames_in) > 0) {
|
||||
if (some_args) {
|
||||
msg += "; ";
|
||||
}
|
||||
msg += "kwargs: ";
|
||||
bool first = true;
|
||||
for (size_t i = 0; i < static_cast<size_t>(PyTuple_GET_SIZE(kwnames_in)); ++i) {
|
||||
if (first) {
|
||||
first = false;
|
||||
} else {
|
||||
msg += ", ";
|
||||
}
|
||||
msg += "kwargs: ";
|
||||
bool first = true;
|
||||
for (const auto &kwarg : kwargs) {
|
||||
if (first) {
|
||||
first = false;
|
||||
} else {
|
||||
msg += ", ";
|
||||
}
|
||||
msg += pybind11::str("{}=").format(kwarg.first);
|
||||
try {
|
||||
msg += pybind11::repr(kwarg.second);
|
||||
} catch (const error_already_set &) {
|
||||
msg += "<repr raised Error>";
|
||||
}
|
||||
msg += reinterpret_borrow<pybind11::str>(PyTuple_GET_ITEM(kwnames_in, i));
|
||||
msg += '=';
|
||||
try {
|
||||
msg += pybind11::repr(args_in_arr[n_args_in + i]);
|
||||
} catch (const error_already_set &) {
|
||||
msg += "<repr raised Error>";
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -1295,6 +1287,28 @@ protected:
|
||||
}
|
||||
return result.ptr();
|
||||
}
|
||||
|
||||
static ssize_t keyword_index(PyObject *haystack, char const *needle) {
|
||||
/* kwargs is usually very small (<= 5 entries). The arg strings are typically interned.
|
||||
* CPython itself implements the search this way, first comparing all pointers ... which is
|
||||
* cheap and will work if the strings are interned. If it fails, then it falls back to a
|
||||
* second lexicographic check. This is wildly expensive for huge argument lists, but those
|
||||
* are incredibly rare so we optimize for the vastly common case of just a couple of args.
|
||||
*/
|
||||
auto n = PyTuple_GET_SIZE(haystack);
|
||||
auto s = reinterpret_steal<pybind11::str>(PyUnicode_InternFromString(needle));
|
||||
for (ssize_t i = 0; i < n; ++i) {
|
||||
if (PyTuple_GET_ITEM(haystack, i) == s.ptr()) {
|
||||
return i;
|
||||
}
|
||||
}
|
||||
for (ssize_t i = 0; i < n; ++i) {
|
||||
if (PyUnicode_Compare(PyTuple_GET_ITEM(haystack, i), s.ptr()) == 0) {
|
||||
return i;
|
||||
}
|
||||
}
|
||||
return -1;
|
||||
}
|
||||
};
|
||||
|
||||
PYBIND11_NAMESPACE_BEGIN(detail)
|
||||
|
||||
@@ -458,7 +458,8 @@ def test_args_refcount():
|
||||
assert refcount(myval) == expected
|
||||
|
||||
exp3 = refcount(myval, myval, myval)
|
||||
assert m.args_refcount(myval, myval, myval) == (exp3, exp3, exp3)
|
||||
# if we have to create a new tuple internally, then it will hold an extra reference for each item in it.
|
||||
assert m.args_refcount(myval, myval, myval) == (exp3 + 3, exp3 + 3, exp3 + 3)
|
||||
assert refcount(myval) == expected
|
||||
|
||||
# This function takes the first arg as a `py::object` and the rest as a `py::args`. Unlike the
|
||||
|
||||
Reference in New Issue
Block a user