Skip to content

SNOW-2057867 refactor and fixes to make stage related bind and pandas… #2300

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

sfc-gh-zyao
Copy link
Contributor

… write work for Python sprocs

  • relax the condition of using stage solution for array bind (so we will use bind_size >= threshold instead of bind_size > threshold)
  • use _upload_stream to implement BindUploadAgent
  • use _upload to implement write_pandas
  • correctly use scoped keyword and temp naming pattern for write_pandas
  • fix a minor bug where file_transfer_agent returns meta.dst_file_name as file name in download (dst_file_name may contain sub-directories, whereas name will be the base file name without any sub-directory prefixes. We should keep it consistent with upload, where we use meta.name as file name

Tests

  • the new test_direct_file_operation_utils.py to validate parse_file_operation for _upload and _upload_stream
  • existing test test_bindings.py to make sure the change does not break existing bind upload logic
  • existing test_pandas_tools.py to make sure the change does not break existing write_pandas logic
  • existing test_put_get.py to make sure the change does not break file_transfer_agent logic

Please answer these questions before submitting your pull requests. Thanks!

  1. What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes #NNNN

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am modifying authorization mechanisms
    • I am adding new credentials
    • I am modifying OCSP code
    • I am adding a new dependency
  3. Please describe how your code solves the related issue.

    Please write a short description of how your code change solves the related issue.

  4. (Optional) PR for stored-proc connector:

… write work for Python sprocs

- relax the condition of using stage solution for array bind (so we will use bind_size >= threshold instead of bind_size > threshold)
- use _upload_stream to implement BindUploadAgent
- use _upload to implement write_pandas
- correctly use scoped keyword and temp naming pattern for write_pandas
- fix a minor bug where file_transfer_agent returns meta.dst_file_name as file name in download (dst_file_name may contain sub-directories, whereas name will be the base file name without any sub-directory prefixes. We should keep it consistent with upload, where we use meta.name as file name

### Tests
- the new test_direct_file_operation_utils.py to validate parse_file_operation for _upload and _upload_stream
- existing test test_bindings.py to make sure the change does not break existing bind upload logic
- existing test_pandas_tools.py to make sure the change does not break existing write_pandas logic
@sfc-gh-zyao sfc-gh-zyao added NO-CHANGELOG-UPDATES This pull request does not need to update CHANGELOG.md DO_NOT_PORT_CHANGES_TO_SP Add this label when changes in this PR do not need to be port to SP connector labels Apr 28, 2025
@sfc-gh-sfan
Copy link
Contributor

This should be broken down into two PRs at least

  • Stage bind: including the first two items
  • write_pandas: including the 3rd and 4th items
  • the minor bug fix can be a standalone PR, but it might be okay to piggyback on one of the above PRs.

"""Parses a file operation by constructing SQL and getting the SQL parsing result from server."""
options_in_sql, option_bind_values = self._process_options_for_upload(options)

if command_type == CMD_TYPE_UPLOAD:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not look like a pure refactor. We do an assertion, a file name split and some escaping. These do not exist in the original code. Why are they required?

Comment on lines +81 to +84
self.cursor._upload_stream(
input_stream=f,
stage_location=os.path.join(self.stage_path, f"{row_idx}.csv"),
options={},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct? The original function use PUT which is somewhat equivalent to _upload?

@@ -557,6 +561,7 @@ def mocked_execute(*args, **kwargs):
)


# @pytest.mark.skipolddriver
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unintended change

@sfc-gh-zyao
Copy link
Contributor Author

Closing this PR, because we split it into 2 sub PRs

  1. SNOW-2057867 refactor BindUploadAgent to make it work for Python sprocs #2303
  2. SNOW-2057867 refactor and fixes to make pandas write work for Python … #2304

I will copy the comments over and address them in the split PRs instead

@github-actions github-actions bot locked and limited conversation to collaborators Apr 28, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
DO_NOT_PORT_CHANGES_TO_SP Add this label when changes in this PR do not need to be port to SP connector NO-CHANGELOG-UPDATES This pull request does not need to update CHANGELOG.md
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants