Skip to content

Support non-ASCII characters in PyGMT arguments and text in Figure.text #2204

@seisman

Description

@seisman

Problems

Due to the limitation of the PostScript language, GMT can only work with ASCII characters and a small set of non-ASCII characters. See https://docs.generic-mapping-tools.org/latest/cookbook/octal-codes.html for the full list of characters that PostScript/GMT/PyGMT can accept.

These non-ASCII characters must be specified using their octal codes or character escape sequence. A few non-ASCII characters (e.g., ü, Î) are allowed and GMT can substitute these non-ASCII characters with the correct PostScript octal codes.

Users who don't know the limitations may pass non-ASCII characters directly in the arguments. For example:

import pygmt
fig = pygmt.Figure()
fig.basemap(region=[0, 10, 0, 5], projection="x1c", frame="WSen+tTime (s) vs Distance (°)")
fig.show()

The above script produces this "surprising" figure:

non-ascii

So, if users want to add a non-ASCII character to a plot, they must know the limitations and have to go to this page https://docs.generic-mapping-tools.org/latest/cookbook/octal-codes.html, look for the character in the four tables, and figure out the corresponding octal code (\260 for the symbol °), which is tedious and not easy.

After finding the octal code, users may think changing ° to \260 should work:

import pygmt
fig = pygmt.Figure()
fig.basemap(region=[0, 10, 0, 5], projection="x1c", frame="WSen+tTime (s) vs Distance (\260)")
fig.show()

but it still produces the same "surprising" figure, because the Python interpreter recognizes \260 first, and converts it to ° before passing it to the GMT API. So, users have to use double backslashes or raw strings:

frame="WSen+tTime (s) vs Distance (\\260)"

or

frame=r"WSen+tTime (s) vs Distance (\260)"

Solutions

Since Python works well with non-ASCII characters (acutally it works with any unicode characters), it's possible to pass ° in Python, and PyGMT should substitute the non-ASCII characters with the corresponding octal codes.

Here are some tests in Python:

# Python support non-ASCII characters
>>> "WSen+tTime (s) vs Distance (°)"
'WSen+tTime (s) vs Distance (°)'

# Python knows how to convert \260 to °
>>> "WSen+tTime (s) vs Distance (\260)"
'WSen+tTime (s) vs Distance (°)'

# replace ° with \\260
>>> "WSen+tTime (s) vs Distance (°)".replace("°", "\\260")
'WSen+tTime (s) vs Distance (\\260)'

# how to convert ° to \\260. It should work for other non-ASCII characters
>>> oct(ord("°")).replace("0o", '\\')
'\\260'

So, if we can do the substitutions/conversions internally, we can support non-ASCII characters better. The simplest solution is to define a big dictionary that maps non-ASCII characters (e.g., °) to octal codes (e.g., \260). Better and more clever solutions are also possible.

Notes about the possible limitations of the solutions

Non-ASCII characters can be used in many cases:

  1. PyGMT arguments, e.g., frame="WSen+tTime (s) vs Distance (°)"
  2. Text strings as input data, e.g., fig.text(x=0, y=0, text="Distance (°)")
  3. Text strings in a plaintext file, e.g., a plaintext file with a record like 0 0 Distance (°)

The above solution should work well for case 1, may work or not work (depending on the implentation)
for case 2, and likely don't work for case 3.

Are you willing to help implement and maintain this feature?

Yes, but more discussions are needed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions