Skip to content
This repository has been archived by the owner on Jan 25, 2023. It is now read-only.

Extend the query to support region queries #235

Open
juhtornr opened this issue Nov 19, 2018 · 3 comments
Open

Extend the query to support region queries #235

juhtornr opened this issue Nov 19, 2018 · 3 comments
Assignees
Milestone

Comments

@juhtornr
Copy link
Collaborator

Can you @jrambla add description based on your presentation to ELIXIR Beacon strategic group?

@juhtornr juhtornr added this to the 1.2 milestone Nov 19, 2018
@mbaudis
Copy link
Member

mbaudis commented Dec 11, 2018

@juhtornr @jrambla I've added some notes & an example.

Region queries are used to determine the existence of any/all variants in a genomic range. A typical example would be the determination of variants n the CDR of a gene of interest. In this example, all variants with single nucleotide alternateBases in the CDR of the EIF4A1 gene in the DIPG childhood brain tumor dataset are being retrieved:

https://beacon.progenetix.org/beaconplus-server/beaconresponse.cgi?datasetIds=dipg&referenceName=17&assemblyId=GRCh38&startMin=7572826&endMax=7579005&referenceBases=*&alternateBases=N

Special parameters:

  • datasetIds=dipg
    • limit to the DIPG dataset
  • startMin=7572826
  • endMax=7579005
    • In this proposed form, the startMin and endMax parameters are used to indicate the extent of the queried region. This is in contrast to using start and end, which should be considered parameters for precise positions (e.g. a variant from start to end, not in the corrsponding range). However, this will need agreement & documentation.
  • referenceBases=*
    • any reference base (wildcard query)
    • also specific replacements could be queried for
  • alternateBases=N
    • The current API does not allow wildcards, but just undefined bases of specific count. Here, the query would match e.g. A>G, but not A>GA (one would have to use NN for this).

TODO

  • agree on position parameters used for ranges
  • think about wildcard options for alternateBases (in the sense of "*" or "/N+?/").
  • decide about a variant disambiguation strategy for the response
    • Variants in handover & post-processing?
    • Variants in response with variant specific counts?

@jrambla
Copy link
Collaborator

jrambla commented Dec 20, 2018 via email

@mbaudis
Copy link
Member

mbaudis commented Dec 20, 2018

@jrambla @juhtornr Great - I'll be happy o help, developing this further. For now, the query example still works ...
@teemukataja This also reports the short-form "digest" of the matched variants in the variantResponses (e.g. "DIPG_V_MAF_17_7578406_C_T"), since we've set the limit to "20" distinct variants for full reports. If you limit the target region you'll get a full report from the variants (slowish...): https://beacon.progenetix.org/beaconplus-server/beaconresponse.cgi?datasetIds=dipg&referenceName=17&assemblyId=GRCh38&startMin=7576826&endMax=7577200&referenceBases=*&alternateBases=N

@sdelatorrep sdelatorrep modified the milestones: spec 1.2, spec 2.0 Jun 11, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants