iupac-names

Project that collects all possible used IUPAC names in literature into a CCZero database. Every contributor guarantees that the contribution is CCZero and void of any legal claim otherwise. Autogenerated IUPAC names are forbidden and the IUPAC name must be found in literature. The latter includes the IUPAC names to be part of larger names, but a valid IUPAC name by itself. Zero metadata on the origin of the IUPAC name is recorded, and just the existence that the IUPAC name exists is the copyright-free fact we are recording here.

Our ambition is to have 1M IUPAC names within the first year.

This repository is very simple, consists of a single, sorted list of IUPAC names in the iupac-names.txt file. Each line in that file is a valid IUPAC names, as defined by OPSIN being able to generate a SMILES string from it.

Adding new names

The list is sort and contains only unique names. On GNU/Linux, the reference algorithm for this process is:

sort -f iupac-names.txt | uniq -i | tee tmp.txt | wc -l ; mv tmp.txt iupac-names.txt

There are small name variants, like 1,1 Dimethylhydrazine and 1,1-dimethylhydrazine, of which it is not clear if they are typos in the articles, artifacts of the text mining, but we do know they parse into a SMILES. By removing all spaces and all hyphens, we can count the number of unique lower-case names with:

cat iupac-names.txt | sed 's/[--‐]//g' | sed 's/\ //' | tr '[:upper:]' '[:lower:]' | sort | uniq | tee iupac-names-flat.txt | wc -l

Calculating unique InChIKeys

As an idea of the chemical space covered, we can check the number of unique InChIKeys (mind the tautomerism normalization):

groovy extractInChIKeys.groovy | sort | uniq | tee inchikeys.txt | wc -l

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
extractInChIKeys.groovy		extractInChIKeys.groovy
iupac-names.txt		iupac-names.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

iupac-names

Adding new names

Calculating unique InChIKeys

About

Releases 2

Packages

Languages

License

BlueObelisk/iupac-names

Folders and files

Latest commit

History

Repository files navigation

iupac-names

Adding new names

Calculating unique InChIKeys

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages