Skip to content

SNOW-2019088: Extend write_pandas by a parameter for schema inference #2250

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions DESCRIPTION.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ Source code is also available at: https://github.com/snowflakedb/snowflake-conne
- Added `gcs_use_virtual_endpoints` connection property that forces the usage of the virtual GCS usage. Thanks to this it should be possible to set up private DNS entry for the GCS endpoint. See more: https://cloud.google.com/storage/docs/request-endpoints#xml-api
- Fixed a bug that caused driver to fail silently on `TO_DATE` arrow to python conversion when invalid date was followed by the correct one.
- Added `check_arrow_conversion_error_on_every_column` connection property that can be set to `False` to restore previous behaviour in which driver will ignore errors until it occurs in the last column. This flag's purpose is to unblock workflows that may be impacted by the bugfix and will be removed in later releases.
- Added `infer_schema` parameter to `write_pandas` to perform schema inference on the passed data.

- v3.14.0(March 03, 2025)
- Bumped pyOpenSSL dependency upper boundary from <25.0.0 to <26.0.0.
Expand Down
49 changes: 27 additions & 22 deletions src/snowflake/connector/pandas_tools.py
Original file line number Diff line number Diff line change
Expand Up @@ -258,6 +258,7 @@ def write_pandas(
on_error: str = "abort_statement",
parallel: int = 4,
quote_identifiers: bool = True,
infer_schema: bool = False,
auto_create_table: bool = False,
create_temp_table: bool = False,
overwrite: bool = False,
Expand Down Expand Up @@ -316,6 +317,8 @@ def write_pandas(
quote_identifiers: By default, identifiers, specifically database, schema, table and column names
(from df.columns) will be quoted. If set to False, identifiers are passed on to Snowflake without quoting.
I.e. identifiers will be coerced to uppercase by Snowflake. (Default value = True)
infer_schema: Perform explicit schema inference on the data in the DataFrame and use the inferred data types
when selecting columns from the DataFrame. (Default value = False)
auto_create_table: When true, will automatically create a table with corresponding columns for each column in
the passed in DataFrame. The table will not be created if it already exists
create_temp_table: (Deprecated) Will make the auto-created table as a temporary table
Expand Down Expand Up @@ -481,7 +484,7 @@ def drop_object(name: str, object_type: str) -> None:
num_statements=1,
)

if auto_create_table or overwrite:
if auto_create_table or overwrite or infer_schema:
file_format_location = _create_temp_file_format(
cursor,
database,
Expand Down Expand Up @@ -520,27 +523,29 @@ def drop_object(name: str, object_type: str) -> None:
quote_identifiers,
)

iceberg = "ICEBERG " if iceberg_config else ""
iceberg_config_statement = _iceberg_config_statement_helper(
iceberg_config or {}
)
if auto_create_table or overwrite:
iceberg = "ICEBERG " if iceberg_config else ""
iceberg_config_statement = _iceberg_config_statement_helper(
iceberg_config or {}
)

create_table_sql = (
f"CREATE {table_type.upper()} {iceberg}TABLE IF NOT EXISTS identifier(?) "
f"({create_table_columns}) {iceberg_config_statement}"
f" /* Python:snowflake.connector.pandas_tools.write_pandas() */ "
)
params = (target_table_location,)
logger.debug(
f"auto creating table with '{create_table_sql}'. params: %s", params
)
cursor.execute(
create_table_sql,
_is_internal=True,
_force_qmark_paramstyle=True,
params=params,
num_statements=1,
)

create_table_sql = (
f"CREATE {table_type.upper()} {iceberg}TABLE IF NOT EXISTS identifier(?) "
f"({create_table_columns}) {iceberg_config_statement}"
f" /* Python:snowflake.connector.pandas_tools.write_pandas() */ "
)
params = (target_table_location,)
logger.debug(
f"auto creating table with '{create_table_sql}'. params: %s", params
)
cursor.execute(
create_table_sql,
_is_internal=True,
_force_qmark_paramstyle=True,
params=params,
num_statements=1,
)
# need explicit casting when the underlying table schema is inferred
parquet_columns = "$1:" + ",$1:".join(
f"{quote}{snowflake_col}{quote}::{column_type_mapping[col]}"
Expand Down Expand Up @@ -577,7 +582,7 @@ def drop_object(name: str, object_type: str) -> None:
f"FILE_FORMAT=("
f"TYPE=PARQUET "
f"COMPRESSION={compression_map[compression]}"
f"{' BINARY_AS_TEXT=FALSE' if auto_create_table or overwrite else ''}"
f"{' BINARY_AS_TEXT=FALSE' if auto_create_table or overwrite or infer_schema else ''}"
f"{sql_use_logical_type}"
f") "
f"PURGE=TRUE ON_ERROR=?"
Expand Down