Skip to content

Conversation

JonAnCla
Copy link
Contributor

@JonAnCla JonAnCla commented Oct 15, 2025

Description of changes

This PR makes some performance improvements to the Annotable class which is a base used for many ibis internals, particularly Node, and therefore provides a performance improvement to building all expressions

While doing further investigation around points made in #11641 I found that Signature.bind is a major bottleneck in instantiation of Annotable objects. Signature.bind is a python library function that implements, for a given python function/class, an "interpreter" for mapping passed args/kwargs to actual named args/kwargs. However because it is implemented in python it is orders of magnitude slower (~10-20x) than the cpython code that implements the same process.

In this PR I've therefore replaced calling Signature.bind with following:

  • when an Annotable class is created, also create a "proxy dataclass" using standard python dataclasses, with exactly the same signature
  • when an Annotable is instantiated, use the "proxy dataclass" to bind passed args/kwargs to annotated/named args/kwargs and extract the resulting __dict__ from that dataclass instance

Changes are not that intrusive but a little bit "funky". Performance is improved by 10-50% across all expression building benchmarks (larger expressions benefit more)

This is a POC - a few tests that check Annotable raises when incorrect args/kwargs are passed fail because exceptions raised have slightly different text than before. I think these are all solvable but I wanted to check that the approach is acceptable before continuing

I've attached some profiles. (ipython) code to generate these is below

from ibis.common.grounds import Annotable

class MyAnnotable(Annotable):
    foo: int
    bar: str

import line_profiler
%load_ext line_profiler

%lprun -s -m ibis.common.grounds -m ibis.common.bases [MyAnnotable(foo=42, bar="hello") for _ in range(100)]

profile-after.txt
profile-before.txt

@kszucs and @cpcloud if you could take a look as time allows and let me know thoughts that'd be much appreciated. Thanks!

@kszucs
Copy link
Member

kszucs commented Oct 16, 2025

Python dataclasses render the __init__ method then call exec() to turn it into an actual function, that is why it is faster. We can implement that ourselves based on the signature object to speed up instantiation. I would rather not create an intermediate dataclass since we can generate the initial method with tighter control.

@JonAnCla
Copy link
Contributor Author

Thanks, my one reservation would be that using exec feels a bit icky :)

At least if we delegate that job to dataclasses (which as you say uses it to build the dataclass init method), we're getting a well known & well tested piece of code to do that piece of dirty work

Another thing to consider is that with these changes initialising objects via the internal dataclass is not a bottleneck (checking types using Pattern etc is), so we may not need to speed up the Signature.bind part much further.

Having said the above I don't have a strong opinion and happy to re-implement as preferred, so let me know if you still have same preference/not. Thanks for taking a look!

@kszucs
Copy link
Member

kszucs commented Oct 16, 2025

I actually have a port of GitHub.com/kszucs/koerce using mypyc having better runtime performance than the current cython implementation of koerce. I also managed to speed up signature binding significantly. If you are interested I can share it in the upcoming days/next week. We could offload additional perf critical paths with continuous benchmarking configured ensuring good performance.

@JonAnCla
Copy link
Contributor Author

sounds great, please do share when you have time :)
would this be something you'd hope to get into ibis itself or sit outside as an add on?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants