Skip to content

Duplicate node ids within the prime dataset #28

@cedricjansen

Description

@cedricjansen

Hi there,

I'm currently working with the prime benchmark and have gotten some questions.
When parsing the nodes from the nodes_info.pkl file, I noticed that there are a lot of duplicate ids, which is confusing to me because I thought
the node ids are also what is evaluated against when checking whether the right information node is found.

I've added an example:

E ID '4899' appears 2 times AFTER parsing:
E dict_key=3, keys=['id', 'type', 'name', 'source', 'details'], raw_id=4899
E dict_key=66899, keys=['id', 'type', 'name', 'source'], raw_id=4899

STaRKNodeInfo(id=4899, type=anatomy, name=right lung cranial lobe lobar bronchus mesenchyme, source=UBERON, details=None)
STaRKNodeInfo(id=4899, type=gene/protein, name=NRF1, source=NCBI, details=Details(id=4899, query=NRF1, name=nuclear respiratory factor 1, summary=This gene encodes a protein that homodimerizes and functions as a transcription factor which activates the expression of some key metabolic genes regulating cellular growth and nuclear genes required for respiration, heme biosynthesis, and mitochondrial DNA transcription and replication. The protein has also been associated with the regulation of neurite outgrowth. Alternative splicing results in multiple transcript variants. Confusion has occurred in bibliographic databases due to the shared symbol of NRF1 for this gene and for 'nuclear factor (erythroid-derived 2)-like 1' which has an official symbol of NFE2L1. [provided by RefSeq, May 2014]., genomicpos=GenomicPos(chr=7, start=129611720, end=129757082, ensemblgene=ENSG00000106459, strand=1)))

Could you please elaborate whether it is to be expected that there are duplicates within the node ids, which tuples create a unique combination to identify an entity correctly and to what are the ID's in the q/a set reffering to?

Best regards

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions