Skip to content

BUG: read_csv in non utf-8 filesystem #6770

Closed
@hshimizu77

Description

@hshimizu77

When I tried 'DataFrame.read_csv' for the file named with local-language in Windows (the filesystem encoding is 'cp932'), it has failed with the error 'File does not exist'.
I find the line 538 in 'pandas / parser.pyx' that set filename encoding into utf-8 regardless its environment,

    if isinstance(source, basestring):
        if not isinstance(source, bytes):
            source = source.encode('utf-8')

I suggest to modify it like,

            source = source.encode(sys.getfilesystemencoding())

In my environment, WIndows8(x64) with Japanese language, this modification work.

Metadata

Metadata

Assignees

No one assigned

    Labels

    IO CSVread_csv, to_csvUnicodeUnicode stringsWindowsWindows OS

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions