Skip to content

SNOW-2019088: Extend write_pandas by a parameter for schema inference #2246

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Argon- opened this issue Apr 1, 2025 · 3 comments · May be fixed by #2250
Open

SNOW-2019088: Extend write_pandas by a parameter for schema inference #2246

Argon- opened this issue Apr 1, 2025 · 3 comments · May be fixed by #2250
Assignees
Labels
feature status-triage_done Initial triage done, will be further handled by the driver team

Comments

@Argon-
Copy link

Argon- commented Apr 1, 2025

What is the current behavior?

This feature request is born out of an issue I've had with write_pandas() from pandas _tools: https://github.com/snowflakedb/snowflake-connector-python/blob/main/src/snowflake/connector/pandas_tools.py#L250

I've tried to upload a wide dataframe into an existing table using write_pandas and was struggling with various type-related problems, most prominently this one:

snowflake.connector.errors.ProgrammingError: 002023 (22000): SQL compilation error:
Expression type does not match column data type, expecting BINARY(20) but got VARIANT for column BOM_SYS_HD

I tried every method under the sun to force and convert the data types of these columns of this dataframe to exactly what the target table expects, but nothing worked. Note that the data in the dataframe is correct and correctly typed. Essentially, I read from a snowflake table into a dataframe, perform some calculations and write it back.
When I was close to throwing in the towel, I took a look at the code of write_pandas and noticed that when auto_create_table=True is set, a schema inference step is performed. And lo and behold, that made it work. The data arrived correctly in the already existing target table.

What is the desired behavior?

Given that auto_create_table generates a CREATE ... IF NOT EXISTS statement, it’s relatively safe to use even if you already have an existing table. Nonetheless, I’d appreciate a more explicit (and even safer) way to achieve what I need: performing only the schema inference step.
This could be implemented with a new parameter for write_pandas, e.g., infer_schema, defaulting to False. I’ve implemented this in a local copy of this package, and it works well for me. I can submit a pull request if you’re interested.

How would this improve snowflake-connector-python?

Might solve more issues such as mine with a relatively low-friction parameter.

References and other background

No response

@github-actions github-actions bot changed the title Extend write_pandas by a parameter for schema inference SNOW-2019088: Extend write_pandas by a parameter for schema inference Apr 1, 2025
@sfc-gh-dszmolka sfc-gh-dszmolka added status-triage_done Initial triage done, will be further handled by the driver team and removed needs triage labels Apr 2, 2025
@sfc-gh-dszmolka
Copy link
Contributor

hi - thanks for raising this with us. If it's within your possibilities, please submit a PR and the team will review. Otherwise, we'll consider this enhancement request for later planning.

@Argon-
Copy link
Author

Argon- commented Apr 8, 2025

@sfc-gh-dszmolka created #2250

One more reason why I think this is useful: a user might want/need the schema inference but may not have the necessary access to CREATE TABLE, thus, can't use auto_create_table as a workaround.

@sfc-gh-dszmolka
Copy link
Contributor

Thank you for your contribution! The team will review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature status-triage_done Initial triage done, will be further handled by the driver team
Projects
None yet
3 participants