-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about comparing formulae #235
Comments
You could use the parsing functions, like `formula_to_composition` to
return the composition dict. Its keys are the atomic number, so you could
use the hydrogen key to compare the number of hydrogens.
…On Thu, Aug 22, 2024 at 11:52 AM Robert Leach ***@***.***> wrote:
I was wondering if you might be able to provide any insights to my
following problem.
We receive chemical compound formulae from some Mass Spec software (Maven
or El Maven). There's always a compound name that accompanies the
formula, but that isn't always consistent (since compounds can have many
synonyms). When we don't have a matching compound name/synonym, we have to
determine if the name provided is a synonym of an existing compound in our
database or if we need to add a new compound to our database.
There are of course multiple ways to accomplish this, but one helper
method I added recently was to present the researcher with a list of
possible compound matches. I naïvely did this by matching all existing
compounds in our database with the same formula. And we immediately
encountered the fact that this can miss existing entries because the
formula from the mass spec data can represent the ionized version of the
compound (missing or with an extra H).
My subsequent (naïve) thought was to expand the search to include matches
that differ by some threshold of hydrogens. You might be able to provide
better suggestions for this strategy, but if that DOES sound reasonable, is
there an existing method in your package that can compare formulas or take
the difference of 2 formulas, e.g. C19H37NO5 - C19H35NO5 = H2?
—
Reply to this email directly, view it on GitHub
<#235>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AOQCHSYLXLA54BKLWJ6WX7LZSYJLBAVCNFSM6AAAAABM6RJIP6VHI2DSMVQWIX3LMV43ASLTON2WKOZSGQ4DCMRZGM3TCNI>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
--
Jeremy A. Gray
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I was wondering if you might be able to provide any insights to my following problem.
We receive chemical compound formulae from some Mass Spec software (
Maven
orEl Maven
). There's always a compound name that accompanies the formula, but that isn't always consistent (since compounds can have many synonyms). When we don't have a matching compound name/synonym, we have to determine if the name provided is a synonym of an existing compound in our database or if we need to add a new compound to our database.There are of course multiple ways to accomplish this, but one helper method I added recently was to present the researcher with a list of possible compound matches. I naïvely did this by matching all existing compounds in our database with the same formula. And we immediately encountered the fact that this can miss existing entries because the formula from the mass spec data can represent the ionized version of the compound (missing or with an extra
H
).My subsequent (naïve) thought was to expand the search to include matches that differ by some threshold of hydrogens. You might be able to provide better suggestions for this strategy, but if that DOES sound reasonable, is there an existing method in your package that can compare formulas or take the difference of 2 formulas, e.g.
C19H37NO5 - C19H35NO5 = H2
?The text was updated successfully, but these errors were encountered: