Skip to content

BLD,BUG: Add charset-normalizer to improve compatibility with non-ascii environments. #266

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

BeiyanYunyi
Copy link

On a system with a non-ascii compatible LANG environment variable, gfortran will produce non-ascii output. My working environment is Linux with LANG=zh_CN.UTF-8, in my environment,

gfortran -E ompgen.F90 -o omp.f90 -cpp

will output:

# 1 "ompgen.F90"
# 1 "<built-in>"
# 1 "<命令行>"
# 1 "ompgen.F90"
!... other code

instead of:

# 1 "ompgen.F90"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "ompgen.F90"

Chinese character at line 3 will cause the project fail to build:

      Traceback (most recent call last):
        File "/home/BeiyanYunyi/.cache/uv/builds-v0/.tmpP5ioKB/lib/python3.11/site-packages/numpy/f2py/crackfortran.py", line 391, in
      readfortrancode
          l = fin.readline()
              ^^^^^^^^^^^^^^
        File "/home/BeiyanYunyi/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/fileinput.py", line 292, in
      readline
          line = self._readline()
                 ^^^^^^^^^^^^^^^^
        File "/home/BeiyanYunyi/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/fileinput.py", line 372, in
      _readline
          return self._readline()
                 ^^^^^^^^^^^^^^^^
        File "/home/BeiyanYunyi/.local/share/uv/python/cpython-3.11.12-linux-x86_64-gnu/lib/python3.11/encodings/ascii.py", line 26, in
      decode
          return codecs.ascii_decode(input, self.errors)[0]
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 71: ordinal not in range(128)

To reproduce the bug, simply run this command in the repo (POSIX environment):

LANG=zh_CN.UTF-8 pip install

As numpy.f2py suggests, It is likely that installing charset_normalizer package will help f2py determine the input file encoding correctly. Adding charset-normalizer to build-system.requires will make it infer the encoding correctly. After adding it to build-system.requires, I've successfully built this package.

@kafitzgerald
Copy link
Collaborator

Thanks for the PR!

I'll take a look at this tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants