Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converting String to ObjectId and then writing to MongoDb using PyMongoArrow #253

Open
xahram opened this issue Dec 9, 2024 · 2 comments
Assignees

Comments

@xahram
Copy link

xahram commented Dec 9, 2024

Hi, I hope you're all having a wonderful day.

I have a redshift table that includes 4 columns, two of the columns are string version of ObjectId.

I load the data in polars and then apply the following code.

assignment_fwks = assignment_fwks.with_columns( pl.col("profile_id").map_elements(ObjectId, return_dtype=pl.Object).alias("profile_id"), pl.col("framework_id").map_elements(ObjectId, return_dtype=pl.Object).alias("framework_id"))

However, when I do

pymongoarrow.api.write(my_collection, assignment_fwks)

I get the error,

Exception has occurred: PanicException called Option::unwrap() on a None value File "/home/ubuntu/projects/profile_assigner/src/consumption_assignments/app.py", line 49, in upsert_profile_assignment result = write(coll, insertion_fwk_assignments) File "/home/ubuntu/projects/profile_assigner/src/consumption_assignments/app.py", line 105, in client_profile_assignments upsert_profile_assignment( File "/home/ubuntu/projects/profile_assigner/src/consumption_assignments/app.py", line 136, in main client_error = client_profile_assignments(region, cli_region_df, credentials) File "/home/ubuntu/projects/profile_assigner/src/consumption_assignments/app.py", line 149, in <module> main() pyo3_runtime.PanicException: called Option::unwrap()

If i don't convert these columns to ObjectId and keep them as strings, then it works fine and inserts the data correctly into the mongo collection.

So is there a way I can convert these string columns to ObjectIds and do the insertion to mongo collection, without explicitly having to convert to another data structure like pandas dataframe or List?

As long as i can use the arrow format it would be great. As it is very memory and cost efficient.

@aclark4life
Copy link
Contributor

Thank you for the report @xahram ! We are tracking this issue here: https://jira.mongodb.org/browse/INTPYTHON-462

@blink1073
Copy link
Member

Hi @xahram, I did a bit of digging. Unfortunately until polars supports extension types, we need to do some conversion to get where you want to go.

Here's a sketch of what that looks like:

from pymongoarrow.pandas_types import PandasObjectId

# Convert to pandas
assignment_fwks_pd = assignment_fwks..to_pandas(use_pyarrow_exention_array=True)
# Convert extension types to pymongoarrow supported extension types
assignment_fwks_pd = assignment_fwks_pd.astype(dict(profile_id =PandasObjectId(), ...)
# Write to the collection
pymongoarrow.api.write(my_collection, assignment_fwks_pd)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants