Replies: 2 comments 5 replies
-
I think the group by needs to keep in memory, hence the performance hit. The filtering step reduces the search space so that the multi column group by is manageable. |
Beta Was this translation helpful? Give feedback.
-
I am trying to pre-calculate the hash, so that groupby only needs to work on one column rather than ten, but am having challenges vectorising the following:
I get
I've looked into the vaex.hash code but couldn't seen an easy way to use it for this purpose. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I have a number of large datasets.
It appears to be much slower to group on two separate columns, than it is to do on the columns individually.
I am doing like in the Pandas example of
I saw the groupby code was added later, and perhaps I am pushing it further than is typically the case - as I didn't see any examples of this in the documentation.
Is there a better approach to tackling this problem?
Would converting to categorical improve performance?
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions