Skip to content

_process_number_stream does not support NaN #772

@zarch

Description

@zarch

Prework

Description

The function: _process_time_stream take as input a str and return list[float] as output.
Current implementation is:

def _process_number_stream(data_vals: str) -> list[float]:
    number_stream = re.sub(r"[;,]", " ", data_vals)
    number_stream = re.sub(r"\\[|\\]", " ", number_stream)
    number_stream = re.sub(r"^\\s+|\\s+$", "", number_stream)
    number_stream = [val for val in number_stream.split()]
    number_stream = [re.sub(r"[\\(\\)a-dA-Df-zF-Z]", "", val) for val in number_stream]
    number_stream = [float(val) for val in number_stream]
    return number_stream

The issues is that the regexp r"[\\(\\)a-dA-Df-zF-Z]" is too strict and remove valid values such as NaN, in python it is possible to get a float from float("NaN").
I think it would be more correct to remove that line.

Reproducible example

  • Post a minimal reproducible example (MRE) so the maintainer can troubleshoot the problems you identify. A reproducible example is:
    • Runnable: post enough code and data so any onlooker can create the error on their own computer.
    • Minimal: reduce runtime wherever possible and remove complicated details that are irrelevant to the issue at hand.
    • Readable: format your code according to the Style Guide for Python Code.
>>> from great_tables._formats import _process_number_stream
>>> _process_number_stream("1 2 3 NaN NaN NaN 7 8 9 10 11 12")
Traceback (most recent call last):
  Cell In[159], line 1
    _process_number_stream("1 2 3 NaN NaN NaN 7 8 9 10 11 12")
  File /src/.venv/lib/python3.13/site-packages/great_tables/_formats.py:5184 in _process_number_stream
    number_stream = [float(val) for val in number_stream]
ValueError: could not convert string to float: ''

However, in python these values are fine: nan, NaN, etc.

>>> [float(v) for v in "1 2 3 NaN NaN NaN 7 8 9 10 11 12".split()]
[1.0, 2.0, 3.0, nan, nan, nan, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0]

The issue is:

>>> import re
>>> number_stream = "1 2 3 NaN NaN NaN 7 8 9 10 11 12".split()
>>> [re.sub(r"[\\(\\)a-dA-Df-zF-Z]", "", val) for val in number_stream]
['1', '2', '3', '', '', '', '7', '8', '9', '10', '11', '12']
>>> float("")
Traceback (most recent call last):
  Cell In[165], line 1
    float("")
ValueError: could not convert string to float: ''

Expected result

I would expect that all the valid python string are supported.

Development environment

  • Operating System: not relevant
  • great_tables Version: latest

Additional context

Why is it important to support NaN?
For instance if I have monthly values with some values that are missing I can not get the nanoplots aligned all with the same number of months.

Image

Like in this example the year 2021 the first months (Jan-Jul) are missing, while in the 2025 the last months (Sep-Dec) are missing but from the nanoplot this is not clear.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions