Skip to content

Conversation

@swolchok
Copy link
Contributor

Description

This improves the performance of enum_ operators by no longer attempting to funnel them all through a generic implementation, which caused additional overhead related to calling int().

Benchmark results

using https://github.com/swolchok/pybind11_benchmark/tree/8a6f19d17c362dc2060dd8461b502b98c3226a47 (the current tip of the benchmark-updates branch):

Enum equality comparison
Command: python -m timeit --setup 'from pybind11_benchmark import MyEnum; x = MyEnum.ONE' 'x != x'
Times are nsec/loop

M4 Mac, before: 165, 167, 166, 164, 167
Mac, after: 78.9, 78.9, 79.7, 79.9, 80.5

Enum ordering comparison
Command: python -m timeit --setup 'from pybind11_benchmark import MyEnum; x = MyEnum.ONE' 'x < x'

Mac, before: 170, 168, 168, 171, 168
Mac, after: 79.5, 78.8, 80.8, 81.3, 82.3

(i.e., no difference between != and <)

Compare to performance of calling a method of a simple pybinded class:
Command: python -m timeit --setup 'from pybind11_benchmark import MyInt; x = MyInt()' 'x.get()'

Mac: 54.6, 54.6, 54.9, 55.3, 55.3

Also compare to performance using a py::native_enum:
Command: python -m timeit --setup 'from pybind11_benchmark import MyNativeEnum; x = MyNativeEnum.THREE' 'x < x'

Mac: 9.12, 9.13, 9.2, 9.21, 9.34

(I note that the above benchmarks do have a tendency toward monotonically increasing times across runs, but that effect seems to be much smaller than the effect of the code changes.)

Code size:

  • the marginal code cost of 1 py::arithmetic enum_ before this PR as measured on my Mac by adding an extra enum to the pybind11_benchmark (specifically https://github.com/swolchok/pybind11_benchmark/tree/8a6f19d17c362dc2060dd8461b502b98c3226a47) was a little over 8 KiB of __text, plus some about 1000 bytes of __gcc_except_tab and negligible amounts in other sections. After this PR, the marginal cost increases to a little over 17000 bytes of __text, almost 2000 bytes of __gcc_except_tab, and a few hundred bytes in other sections. I believe @Skylion007 previously mentioned that this seemed like a reasonable order of magnitude of marginal cost.
  • interestingly, the baseline size of that commit of pybind11_benchmark had its size decrease: __text fell by about 12500 bytes and __gcc_except_tab fell by a little over 2000 bytes, though there were negligible size increases in other sections.
  • The second commit on this branch, entitled "outline call_impl to save on code size", is specifically a code size mitigation. It is not necessary for correctness and can be dropped if we don't feel it is worthwhile.

Suggested changelog entry:

  • Improve performance of operators for py::enum_s, though py::native_enum is still much faster.

…ementation

test_enum needs a patch because ops are now overloaded and this affects their docstrings.
This does cause more move constructions, as shown by the needed update to test_copy_move. Up to reviewers whether they want more code size or more moves.
@swolchok swolchok requested a review from henryiii as a code owner October 31, 2025 21:33
@swolchok
Copy link
Contributor Author

swolchok commented Nov 3, 2025

test failures look like they're caused by disagreement on how many move operations we're performing and are caused by the "outline call_impl to save on code size" commit specifically. I am unclear about how important it is to minimize the number of move operations we perform, so I've tentatively just added another commit that should make the tests work for C++17, and we can talk about what to do from here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant