aggregate_arrow_all(...) >four times slower in version 1.0.2 compared to 1.0.1 with fields objects #169

sibbiii · 2023-09-12T16:11:19Z

Hi,

Thanks again for fixing the bugs in Version 1.0.2.
Unfortunately it seems that the new version loads data approx.. >four times slower in case there are nested fields in the schema.
(without nested fields there seems to be no speed difference)

Are you aware of any issue already?
We will post a unit test to reproduce the error here soon.

Sebastian

blink1073 · 2023-09-12T17:22:35Z

Hi @sibbiii, this is captured in https://jira.mongodb.org/browse/ARROW-179.

sibbiii · 2023-09-13T08:09:18Z

Hi @blink1073,

Thanks for this info. The issue is that version 1.0.2 is so incredibly slow now that is unusable to load large datasets. Maybe we should mention this in the release notes (version 1.0.1 is fine) as MongoDB Arrow's primary purpose is to be fast.

If we can help here please let me know,
Sebastian

blink1073 · 2023-09-18T22:37:50Z

Hi @sibbiii, we are thinking of reverting to the 1.0.1 behavior and documenting the limitation. I just wanted to verify that the 1.0.1 behavior you described in #163 was not a blocker, but more of a desired feature (which we're tracking in ARROW-179).

sibbiii · 2023-09-20T18:02:43Z

Note, there were two issues fixed in 1.0.2.

I agree, reverting #136 and documenting the issue is much better than leaving it as slow as it is now. People can add some code afterwards to convert the type of the column as otherwise they have different types depending on whether the ObjectID is at root level or in a nested field.

By the way, the perfect solution would we if one could choose the data type depending on what is defined in the schema, e.g. string or ...

Thanks a lot for your support,
Sebastian

blink1073 · 2023-09-20T21:20:03Z

Thanks, I filed https://jira.mongodb.org/browse/ARROW-181.

blink1073 mentioned this issue Oct 16, 2023

ARROW-181 Revert nested document behavior and document limitation #173

Merged

keanamo added the linked-to-jira label Mar 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aggregate_arrow_all(...) >four times slower in version 1.0.2 compared to 1.0.1 with fields objects #169

aggregate_arrow_all(...) >four times slower in version 1.0.2 compared to 1.0.1 with fields objects #169

sibbiii commented Sep 12, 2023

blink1073 commented Sep 12, 2023

sibbiii commented Sep 13, 2023 •

edited

Loading

blink1073 commented Sep 18, 2023

sibbiii commented Sep 20, 2023

blink1073 commented Sep 20, 2023

aggregate_arrow_all(...) >four times slower in version 1.0.2 compared to 1.0.1 with fields objects #169

aggregate_arrow_all(...) >four times slower in version 1.0.2 compared to 1.0.1 with fields objects #169

Comments

sibbiii commented Sep 12, 2023

blink1073 commented Sep 12, 2023

sibbiii commented Sep 13, 2023 • edited Loading

blink1073 commented Sep 18, 2023

sibbiii commented Sep 20, 2023

blink1073 commented Sep 20, 2023

sibbiii commented Sep 13, 2023 •

edited

Loading