Skip to content

DOC: to_parquet state that path can start with s3:// #44976

Open
@mdavis-xyz

Description

@mdavis-xyz

Pandas version checks

  • I have checked that the issue still exists on the latest versions of the docs on master here

Location of the documentation

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_parquet.html

Documentation problem

It's not clear from the documentation of the first argument for to_parquet that the path can be a remote path such as s3://mybucket/something/.

The only mention of s3:// is in storage_options.

The existing documentation for path uses the exact words "Root Directory path". I'm not sure why that's capitalised. Is that exact phrasing supposed to indicate that s3:// can be used? I just interpreted it to mean that it's a local file system path that must start with a slash. (Or drive letter if on Windows.) Although off the top of my head I think relative local paths work too. So I find this actually quite misleading.

I think the same is true of other functions such as to_csv.

Suggested fix for documentation

If "Root Directory path" is supposed to suggest that it can start with s3://, I think that's not obvious and should be changed to something more explicit.

Can we change the docs to enumerate the possibilities? e.g. /path/to/file and relative/path and s3://mybucket/prefix/ and gcs://... and others?

Or provide an example of saving to S3? (e.g. if partitioning by a column, do I add a trailing slash or not?)

Metadata

Metadata

Assignees

No one assigned

    Labels

    DocsIO NetworkLocal or Cloud (AWS, GCS, etc.) IO IssuesIO Parquetparquet, feather

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions