Skip to content

Encoding/decoding error in Python3 for non-ASCII binary data #32

@GoogleCodeExporter

Description

@GoogleCodeExporter
Given the following code:

  #! /usr/bin/env python3

  import fake_filesystem as ff
  fs = ff.FakeFilesystem()

  text_fractions =  '⅓ ⅔ ⅕ ⅖'
  byte_fractions = b'\xe2\x85\x93 \xe2\x85\x94 \xe2\x85\x95 \xe2\x85\x96'

  fake_open = ff.FakeFileOpen(fs)

  if False:
      fractions = text_fractions.encode('utf-8')

      with fake_open('jim', 'wb') as fd:
          fd.write(fractions)
  else:
      with fake_open('jim', 'wb') as fd:
          fd.write(byte_fractions)

  with fake_open('jim', 'rb') as fd:
      jim = fd.read()

  print(type(jim))

with Python 3.4, we get a traceback:

  $ python test.py
  Traceback (most recent call last):
    File "test.py", line 20, in <module>
      with fake_open('jim', 'rb') as fd:
    File "/Users/tibbs/env/py3/lib/python3.4/site-packages/fake_filesystem.py", line 1869, in __call__
      return self.Call(*args, **kwargs)
    File "/Users/tibbs/env/py3/lib/python3.4/site-packages/fake_filesystem.py", line 2180, in Call
      closefd=closefd)
    File "/Users/tibbs/env/py3/lib/python3.4/site-packages/fake_filesystem.py", line 1971, in __init__
      contents = bytes(contents, 'ascii')
  UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)

Note that replacing "if False:" with "if True:" gives the same result, as one
would expect.

The problem appears to be that, in fake_filesystem.py, in SetContents,
at lines 225-227 it says:

    # convert a byte array to a string
    if sys.version_info >= (3, 0) and isinstance(contents, bytes):
      contents = ''.join(chr(i) for i in contents)

which makes "contents" be a string. Then in FakeFileWrapper at lines 1967-1971
it has:

        if sys.version_info >= (3, 0) and binary:
          io_class = io.BytesIO
          if contents and isinstance(contents, str):
            contents = bytes(contents, 'ascii')

Given that "contents" is a string, it tries to interpret it as "ascii", which
of course fails.

A possible solution is to enforce utf-8 encoding. Thus in the first piece of
code, use:

    if sys.version_info >= (3, 0) and isinstance(contents, bytes):
      contents = contents.decode('utf-8')

and in the second:

    if contents and isinstance(contents, str):
      contents = bytes(contents, 'utf-8')

The information above is copyrighted Alcatel-Lucent, 2015, All Rights
Reserved. The information is provided AS-IS. NO PATENT LICENSE, EXPRESS
OR IMPLIED, IS GRANTED OR INTENDED.

Original issue reported on code.google.com by [email protected] on 20 Feb 2015 at 2:09

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions