Description
Pandas version checks
- I have checked that the issue still exists on the latest versions of the docs on
master
here
Location of the documentation
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_parquet.html
Documentation problem
It's not clear from the documentation of the first argument for to_parquet
that the path can be a remote path such as s3://mybucket/something/
.
The only mention of s3://
is in storage_options
.
The existing documentation for path
uses the exact words "Root Directory path". I'm not sure why that's capitalised. Is that exact phrasing supposed to indicate that s3://
can be used? I just interpreted it to mean that it's a local file system path that must start with a slash. (Or drive letter if on Windows.) Although off the top of my head I think relative local paths work too. So I find this actually quite misleading.
I think the same is true of other functions such as to_csv
.
Suggested fix for documentation
If "Root Directory path" is supposed to suggest that it can start with s3://
, I think that's not obvious and should be changed to something more explicit.
Can we change the docs to enumerate the possibilities? e.g. /path/to/file
and relative/path
and s3://mybucket/prefix/
and gcs://...
and others?
Or provide an example of saving to S3? (e.g. if partitioning by a column, do I add a trailing slash or not?)