Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Examples for CNV representation #1

Open
mbaudis opened this issue May 28, 2020 · 3 comments
Open

Examples for CNV representation #1

mbaudis opened this issue May 28, 2020 · 3 comments

Comments

@mbaudis
Copy link
Member

mbaudis commented May 28, 2020

As discussed in the 2020-05-28 h-CNV call, we will use this thread to collect & discuss real-world use case examples, for generating a standard CNV object model.

There is now the "doodle" of a schema document and its rendered representation.

@mbaudis
Copy link
Member Author

mbaudis commented May 28, 2020

The Progenetix database stores CNVs from predominantly cancer samples, mostly derived from array based experiments:

{
	"callset_id" : "pgxcs::GSE10092::GSM253289",
	"digest" : "3:60791849-60792199:DEL",
	"reference_name" : "3",
	"variant_type" : "DEL",
	"info" : {
		"cnv_value" : -1.3735,
		"cnv_length" : 350
	},
	"variantset_id" : "AM_VS_GRCH38",
	"biosample_id" : "PGX_AM_BS_GSM253289",
	"start_min" : 60791849,
	"start_max" : 60791850,
	"end_min" : 60792198,
	"end_max" : 60792199
}

Future additions (i.e., very soon ...) will include e.g. inferred CN count and proper use of the [ start_min, start_max ] ... intervals (currently look like precise base annotations, though mostly derived from imprecise experiments).

Here, information about the

  • reference genome and experimental details are provided in the linked callset
  • biological characteristics (diagnostic etc.) are stored in the linked biosample
  • additional data (e.g. genomic sex) may be represented with the individual (aka subject) level

Overall the representation here is closely aligned with the GA4GH object models and the GKS work, and also is flexibly adjusted if new paradigms emerge there.

@mbaudis mbaudis pinned this issue May 28, 2020
@d-salgado
Copy link
Collaborator

Hi,
Minor comment here. I believe we should rename "reference_name" into something else, because it can be confusing as "reference" is usually used to describe a reference sequence such as in VCF ...ref alt.
I don't like either chromosome_name but I cannot find a better term.

@mbaudis
Copy link
Member Author

mbaudis commented May 28, 2020

@d-salgado This actually mirrors current Beacon use - which has to change (all know that the hard coded chromosome ENUM isn't good). VRS uses SequenceLocation with CURIE as value.

So emphasis again: This an example C&P from a current database, not the way it should be :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants