Skip to content
This repository has been archived by the owner on Jan 25, 2023. It is now read-only.

Add counts and observations as needed #105

Open
mcupak opened this issue Jul 4, 2017 · 5 comments
Open

Add counts and observations as needed #105

mcupak opened this issue Jul 4, 2017 · 5 comments
Labels
proposal stalled No plans to implement it in the short term theme:Response Beacon Response Format type:feature

Comments

@mcupak
Copy link
Contributor

mcupak commented Jul 4, 2017

Currently, we have the following at the dataset response level. Review if this is enough and add information as needed:

  // Frequency of this allele in the dataset. Between 0 and 1, inclusive.
  double frequency = 4;

  // Number of variants matching the allele request in the dataset.
  int64 variant_count = 5;

  // Number of calls matching the allele request in the dataset.
  int64 call_count = 6;

  // Number of samples matching the allele request in the dataset.
  int64 sample_count = 7;
@mcupak mcupak added this to the 0.5 milestone Jul 4, 2017
@mcupak
Copy link
Contributor Author

mcupak commented Jul 4, 2017

@mfiume WDYT?

@mbaudis
Copy link
Member

mbaudis commented Jul 4, 2017

@mcupak I would prefer to use "biosample_count", which would go along with the GA4GH (and general) "biosample" concept. This corresponds to the most relevant question (does this biological sample - tumor tissue, germline DNA, environmental sample - contain "DNA sequence nnn".

"sample" is less well defined; e.g. could refer to technical replicate etc. This is covered by "call_count" (though, actually, may better or additionally be "callset_count").

An extended representation would be:

  • biosample_count: number of biological material preparations showing a variant
  • callset_count: number of experiments with a variant
  • call_count: number of alleles with an allele
  • variant_count: number of variants with one or more calls matching the allele request

Is this, conceptually, correct? Not sure if we should cover all, but this should be declared & documented.

@juhtornr
Copy link
Collaborator

I don't understand what you mean by observations. Can you @mbaudis or @mcupak clarify?

@mbaudis
Copy link
Member

mbaudis commented Mar 15, 2018

@juhtornr So my use of "_count" would be incorrect, when using the counts <-> observations concept, in which:

  • count => all records (biosamples, variants, callsets ...)
  • observations => matches

However: In the schema, "count" is used for both types :-(

BeaconDataset.callCount
  integer($int64)
  minimum: 0
  Total number of calls in the dataset.
BeaconDatasetAlleleResponse.callCount
  integer($int64)
  minimum: 0
  Number of calls matching the allele request in the dataset.

@juhtornr juhtornr added proposal theme:Response Beacon Response Format labels Mar 15, 2018
@jrambla
Copy link
Collaborator

jrambla commented Aug 5, 2018

@antbro I guess the original was yours, could you bring your views here, please?

@jrambla jrambla removed this from the 1.0.2 milestone Sep 18, 2018
@jrambla jrambla added the stalled No plans to implement it in the short term label Sep 18, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
proposal stalled No plans to implement it in the short term theme:Response Beacon Response Format type:feature
Projects
None yet
Development

No branches or pull requests

4 participants