-
Notifications
You must be signed in to change notification settings - Fork 111
Description
Prework
- Read and agree to the code of conduct and contributing guidelines.
- If there is already a relevant issue, whether open or closed, comment on the existing thread instead of posting a new issue.
Description
The function: _process_time_stream take as input a str and return list[float] as output.
Current implementation is:
def _process_number_stream(data_vals: str) -> list[float]:
number_stream = re.sub(r"[;,]", " ", data_vals)
number_stream = re.sub(r"\\[|\\]", " ", number_stream)
number_stream = re.sub(r"^\\s+|\\s+$", "", number_stream)
number_stream = [val for val in number_stream.split()]
number_stream = [re.sub(r"[\\(\\)a-dA-Df-zF-Z]", "", val) for val in number_stream]
number_stream = [float(val) for val in number_stream]
return number_streamThe issues is that the regexp r"[\\(\\)a-dA-Df-zF-Z]" is too strict and remove valid values such as NaN, in python it is possible to get a float from float("NaN").
I think it would be more correct to remove that line.
Reproducible example
- Post a minimal reproducible example (MRE) so the maintainer can troubleshoot the problems you identify. A reproducible example is:
- Runnable: post enough code and data so any onlooker can create the error on their own computer.
- Minimal: reduce runtime wherever possible and remove complicated details that are irrelevant to the issue at hand.
- Readable: format your code according to the Style Guide for Python Code.
>>> from great_tables._formats import _process_number_stream
>>> _process_number_stream("1 2 3 NaN NaN NaN 7 8 9 10 11 12")
Traceback (most recent call last):
Cell In[159], line 1
_process_number_stream("1 2 3 NaN NaN NaN 7 8 9 10 11 12")
File /src/.venv/lib/python3.13/site-packages/great_tables/_formats.py:5184 in _process_number_stream
number_stream = [float(val) for val in number_stream]
ValueError: could not convert string to float: ''However, in python these values are fine: nan, NaN, etc.
>>> [float(v) for v in "1 2 3 NaN NaN NaN 7 8 9 10 11 12".split()]
[1.0, 2.0, 3.0, nan, nan, nan, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0]The issue is:
>>> import re
>>> number_stream = "1 2 3 NaN NaN NaN 7 8 9 10 11 12".split()
>>> [re.sub(r"[\\(\\)a-dA-Df-zF-Z]", "", val) for val in number_stream]
['1', '2', '3', '', '', '', '7', '8', '9', '10', '11', '12']
>>> float("")
Traceback (most recent call last):
Cell In[165], line 1
float("")
ValueError: could not convert string to float: ''Expected result
I would expect that all the valid python string are supported.
Development environment
- Operating System: not relevant
- great_tables Version: latest
Additional context
Why is it important to support NaN?
For instance if I have monthly values with some values that are missing I can not get the nanoplots aligned all with the same number of months.
Like in this example the year 2021 the first months (Jan-Jul) are missing, while in the 2025 the last months (Sep-Dec) are missing but from the nanoplot this is not clear.