Improve performance of enum_ operators by going back to specific implementation #5887

swolchok · 2025-10-31T20:12:00Z

Description

This improves the performance of enum_ operators by no longer attempting to funnel them all through a generic implementation, which caused additional overhead related to calling int().

Benchmark results

using https://github.com/swolchok/pybind11_benchmark/tree/8a6f19d17c362dc2060dd8461b502b98c3226a47 (the current tip of the benchmark-updates branch):

Enum equality comparison
Command: python -m timeit --setup 'from pybind11_benchmark import MyEnum; x = MyEnum.ONE' 'x != x'
Times are nsec/loop

M4 Mac, before: 165, 167, 166, 164, 167
Mac, after: 78.9, 78.9, 79.7, 79.9, 80.5

Enum ordering comparison
Command: python -m timeit --setup 'from pybind11_benchmark import MyEnum; x = MyEnum.ONE' 'x < x'

Mac, before: 170, 168, 168, 171, 168
Mac, after: 79.5, 78.8, 80.8, 81.3, 82.3

(i.e., no difference between != and <)

Compare to performance of calling a method of a simple pybinded class:
Command: python -m timeit --setup 'from pybind11_benchmark import MyInt; x = MyInt()' 'x.get()'

Mac: 54.6, 54.6, 54.9, 55.3, 55.3

Also compare to performance using a py::native_enum:
Command: python -m timeit --setup 'from pybind11_benchmark import MyNativeEnum; x = MyNativeEnum.THREE' 'x < x'

Mac: 9.12, 9.13, 9.2, 9.21, 9.34

(I note that the above benchmarks do have a tendency toward monotonically increasing times across runs, but that effect seems to be much smaller than the effect of the code changes.)

Code size:

the marginal code cost of 1 py::arithmetic enum_ before this PR as measured on my Mac by adding an extra enum to the pybind11_benchmark (specifically https://github.com/swolchok/pybind11_benchmark/tree/8a6f19d17c362dc2060dd8461b502b98c3226a47) was a little over 8 KiB of __text, plus some about 1000 bytes of __gcc_except_tab and negligible amounts in other sections. After this PR, the marginal cost increases to a little over 17000 bytes of __text, almost 2000 bytes of __gcc_except_tab, and a few hundred bytes in other sections. I believe @Skylion007 previously mentioned that this seemed like a reasonable order of magnitude of marginal cost.
interestingly, the baseline size of that commit of pybind11_benchmark had its size decrease: __text fell by about 12500 bytes and __gcc_except_tab fell by a little over 2000 bytes, though there were negligible size increases in other sections.
The second commit on this branch, entitled "outline call_impl to save on code size", is specifically a code size mitigation. It is not necessary for correctness and can be dropped if we don't feel it is worthwhile.

Suggested changelog entry:

Improve performance of operators for py::enum_s, though py::native_enum is still much faster.

…ementation test_enum needs a patch because ops are now overloaded and this affects their docstrings.

This does cause more move constructions, as shown by the needed update to test_copy_move. Up to reviewers whether they want more code size or more moves.

…see mostly-not-red tests

swolchok · 2025-11-03T23:17:31Z

test failures look like they're caused by disagreement on how many move operations we're performing and are caused by the "outline call_impl to save on code size" commit specifically. I am unclear about how important it is to minimize the number of move operations we perform, so I've tentatively just added another commit that should make the tests work for C++17, and we can talk about what to do from here.

swolchok added 3 commits October 31, 2025 12:49

Improve performance of enum_ operators by going back to specific impl…

f34a039

…ementation test_enum needs a patch because ops are now overloaded and this affects their docstrings.

outline call_impl to save on code size

a65e79d

This does cause more move constructions, as shown by the needed update to test_copy_move. Up to reviewers whether they want more code size or more moves.

add function_ref.h to PYBIND11_HEADERS.

ffb981c

swolchok requested a review from henryiii as a code owner October 31, 2025 21:33

swolchok added 2 commits November 3, 2025 15:16

Update test_copy_move tests with C++17 passing values just so we can …

65e5866

…see mostly-not-red tests

Remove stray TODO

729e9f8

fix clang-tidy

f24ded5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve performance of enum_ operators by going back to specific implementation #5887

Improve performance of enum_ operators by going back to specific implementation #5887

swolchok commented Oct 31, 2025

Uh oh!

swolchok commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Improve performance of enum_ operators by going back to specific implementation #5887

Are you sure you want to change the base?

Improve performance of enum_ operators by going back to specific implementation #5887

Conversation

swolchok commented Oct 31, 2025

Description

Benchmark results

Code size:

Suggested changelog entry:

Uh oh!

swolchok commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant