Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zn(NO3)2.6H2O fails to parse in 0.8.3, regression? #223

Closed
ibressler opened this issue Jul 21, 2023 · 9 comments
Closed

Zn(NO3)2.6H2O fails to parse in 0.8.3, regression? #223

ibressler opened this issue Jul 21, 2023 · 9 comments

Comments

@ibressler
Copy link

With version 0.8.2, the formula Zn(NO3)2.6H2O was parsed just fine:

>>> chempy.Substance.from_formula('Zn(NO3)2.6H2O')
<Substance(name=Zn(NO3)2.6H2O, ...)>

With version 0.8.3, I get an error:

>>> chempy.Substance.from_formula('Zn(NO3)2.6H2O')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "~/.pyvenv/lib/python3.10/site-packages/chempy/chemistry.py", line 190, in from_formula
    unicode_name=formula_to_unicode(formula),
  File "~/.pyvenv/lib/python3.10/site-packages/chempy/util/parsing.py", line 649, in formula_to_unicode
    return _formula_to_format(
  File "~/.pyvenv/lib/python3.10/site-packages/chempy/util/parsing.py", line 545, in _formula_to_format
    string += re.sub(r"([0-9]+\.[0-9]+|[0-9]+)", lambda m: sub(m.group(1)), stoich)
  File "/usr/local/Cellar/[email protected]/3.10.12_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/re.py", line 209, in sub
    return _compile(pattern, flags).sub(repl, string, count)
  File "~/.pyvenv/lib/python3.10/site-packages/chempy/util/parsing.py", line 545, in <lambda>
    string += re.sub(r"([0-9]+\.[0-9]+|[0-9]+)", lambda m: sub(m.group(1)), stoich)
  File "~/.pyvenv/lib/python3.10/site-packages/chempy/util/parsing.py", line 650, in <lambda>
    lambda x: "".join(_unicode_sub[str(_)] for _ in x),
  File "~/.pyvenv/lib/python3.10/site-packages/chempy/util/parsing.py", line 650, in <genexpr>
    lambda x: "".join(_unicode_sub[str(_)] for _ in x),
KeyError: '.'

Is this intended?
Thanks for more info on that!

Best wishes
Ingo

@jeremyagray
Copy link
Collaborator

I think this is the known issue noted in #207, #211, and here. I assume just the to unicode parsing is broken but the rest are working?

I haven't done further research on the unicode alternatives, so I don't know if they have added subscript/superscript decimal separators or if a better option has arisen. I know I have a bad solution ready in a branch somewhere, but I would really like unicode to add the decimal separators.

@bjodah
Copy link
Owner

bjodah commented Jul 24, 2023

@ibressler , thank you for reporting this. To me this looks like a regression. And I think we need to take regressions seriously (i.e. avoid breaking existing code whenever feasible).

@jeremyagray For a subscript dot in unicode this might work (it looks alright in my terminal but I know unicode support is iffy on other platforms):

>>> print("H₂\u0323")
H₂̣₃

it doesn't look that great in my webbrowser though. But I guess most solutions are better than the current state. Would you agree?

The superscript dot I tried doesn't look great, but I think it might be acceptable?

>>> print("Ca²\u02d9³⁺")
Ca²˙³⁺

This is what it looks like on a chromebook (my travelling laptop):
image

@jeremyagray
Copy link
Collaborator

The aesthetics are horrible, but that was the original problem with the fix. Let me find that original fix and I'll use these characters as stand-in decimal separators, hopefully in a day or two.

@bjodah
Copy link
Owner

bjodah commented Jul 24, 2023

Thank you @jeremyagray ! I'm on vacation at the moment, so no need to stress since I won't be able to upload to PyPI for another couple of days either way.

bjodah added a commit that referenced this issue Jul 24, 2023
@bjodah
Copy link
Owner

bjodah commented Jul 24, 2023

Even after we fix the unicode printing issue we will be facing another problem, i.e. the ambiguity between floating point stoichiometries and crystal water. I do know that we discussed this. But it escaped my mind that the change is now between 0.8.2 and 0.8.3:

$ git describe; python3 -c "import chempy; print(chempy.Substance.from_formula('Zn(NO3)2.6H2O').composition)"
v0.8.2
{30: 1, 7: 2, 8: 12, 1: 12}
$ git describe; python3 -c "import chempy; print(chempy.Substance.from_formula('Zn(NO3)2.6H2O').composition)"
v0.8.3-1-ge32e4e5
{30: 1, 7: 2.6, 8: 8.8, 1: 2}

I'm leaning towards creating a 0.8.x branch, reverting the parser changes there, bumping version on master to be upcoming 0.9.x and start thinking of a migration path for "crystal water dot". Perhaps requiring interpunct to get "crystal water behavior"? Any thoughts regarding this @ibressler?

@ibressler
Copy link
Author

ibressler commented Jul 26, 2023

Thanks for your feedback and fixes for this, it's very much appreciated!
Regarding your question @bjodah, since I am not a chemist, I asked my colleagues for their opinion on this:

A middle dot would be typographically more correct but harder to enter by users using the standard available character set. However, fractional occupancies are also possible, and they’d have to be denoted with a floating point number, meaning the standard period (.) cannot serve both functions.
So it probably has to be middle dot for crystalline water so we can have float occupancies of other elements.

So, the interpunct should be fine.

@bjodah
Copy link
Owner

bjodah commented Jul 26, 2023

Thank you @ibressler. I'm leaning toward accepting asterisk (*) as an alternative to interpunct too. I envision a version 0.8.4 of chempy where this is supported, and any . in the parsed string will issue a deprecation warning informing users to use e.g. * to indicate crystal water (or any adduct really). Then in chempy 0.9+ we interpret the dot as floating point number. Any thoughts on this @jeremyagray ?

@jeremyagray
Copy link
Collaborator

I can't remember if this is a regression. We (I?) addressed the hydrate/decimal subscript issue in the parser by using . for decimal subscripts and .. for hydrates. It's been a while since the parser was updated but it appears the update may be in some of my branches and not in others. I've pushed up a branch with this fix for the hydrate problem and the decimal problem, if anyone is interested in further testing.

@bjodah
Copy link
Owner

bjodah commented Apr 24, 2024

So I have released both:

  • chempy-0.9.0 which supports fractional stoichiometries (and crystal water using ..)
  • chempy-0.8.4 which does not support fractional stoichiometries, but supports the new crystal water syntax ...

Let me know if there are any further issues.

@bjodah bjodah closed this as completed Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants