Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aggregate_arrow_all(...) >four times slower in version 1.0.2 compared to 1.0.1 with fields objects #169

Open
sibbiii opened this issue Sep 12, 2023 · 5 comments

Comments

@sibbiii
Copy link
Contributor

sibbiii commented Sep 12, 2023

Hi,

Thanks again for fixing the bugs in Version 1.0.2.
Unfortunately it seems that the new version loads data approx.. >four times slower in case there are nested fields in the schema.
(without nested fields there seems to be no speed difference)

Are you aware of any issue already?
We will post a unit test to reproduce the error here soon.

Sebastian

@blink1073
Copy link
Member

Hi @sibbiii, this is captured in https://jira.mongodb.org/browse/ARROW-179.

@sibbiii
Copy link
Contributor Author

sibbiii commented Sep 13, 2023

Hi @blink1073,

Thanks for this info. The issue is that version 1.0.2 is so incredibly slow now that is unusable to load large datasets. Maybe we should mention this in the release notes (version 1.0.1 is fine) as MongoDB Arrow's primary purpose is to be fast.

If we can help here please let me know,
Sebastian

@blink1073
Copy link
Member

Hi @sibbiii, we are thinking of reverting to the 1.0.1 behavior and documenting the limitation. I just wanted to verify that the 1.0.1 behavior you described in #163 was not a blocker, but more of a desired feature (which we're tracking in ARROW-179).

@sibbiii
Copy link
Contributor Author

sibbiii commented Sep 20, 2023

Note, there were two issues fixed in 1.0.2.

I agree, reverting #136 and documenting the issue is much better than leaving it as slow as it is now. People can add some code afterwards to convert the type of the column as otherwise they have different types depending on whether the ObjectID is at root level or in a nested field.

By the way, the perfect solution would we if one could choose the data type depending on what is defined in the schema, e.g. string or ...

Thanks a lot for your support,
Sebastian

@blink1073
Copy link
Member

Thanks, I filed https://jira.mongodb.org/browse/ARROW-181.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants