Handling wildcards #223

teemukataja · 2018-10-29T11:13:59Z

This is more a feature request / question.

Now that #221 has been preliminarily accepted, we have implemented it into the Beacon API. We have thus arrived at a new complication described in CSCfi/beacon-python#24.

In some cases the reference bases may have multiple alternate alleles. The specification doesn't offer a direct solution to this issue, and the quickest solution we came up with, is to add these fields into the info key of the datasetAlleleResponses object.

Should we carry on with this solution, or should these fields be added into the response object, as they are quite important in all wildcard queries?

Choices

Feature request: Update specification to support differentiation of wildcard responses, namely the datasetAlleleResponses should contain these two values that reflect the wildcard results (RECOMMENDED);
Keep specification as it is and handle wildcard differentiation in the info key.

The text was updated successfully, but these errors were encountered:

mbaudis · 2018-10-29T11:16:53Z

+1 for a specific response (not overloading info).

teemukataja · 2018-10-29T11:29:57Z

Proposed update to the specification: add two new fields (referenceBases and alternateBases) to the datasetAlleleResponses key in the BeaconDatasetAlleleResponse response object.

Response for query is currently:

[
  {
    "beaconId": "string",
    "apiVersion": "string",
    "exists": true,
    "alleleRequest": {
      "referenceName": "1",
      "start": 0,
      "end": 0,
      "startMin": 0,
      "startMax": 0,
      "endMin": 0,
      "endMax": 0,
      "referenceBases": "string",
      "alternateBases": "string",
      "variantType": "string",
      "assemblyId": "GRCh38",
      "datasetIds": [
        "string"
      ],
      "includeDatasetResponses": "ALL"
    },
    "datasetAlleleResponses": [
      {
        "datasetId": "string",
        "exists": true,
        "error": {
          "errorCode": 0,
          "errorMessage": "string"
        },
        "frequency": 0,
        "variantCount": 0,
        "callCount": 0,
        "sampleCount": 0,
        "note": "string",
        "externalUrl": "string",
        "info": [
          {
            "key": "string",
            "value": "string"
          }
        ]
      }
    ],
    "error": {
      "errorCode": 0,
      "errorMessage": "string"
    }
  }
]

Proposed format:

[
  {
    "beaconId": "string",
    "apiVersion": "string",
    "exists": true,
    "alleleRequest": {
      "referenceName": "1",
      "start": 0,
      "end": 0,
      "startMin": 0,
      "startMax": 0,
      "endMin": 0,
      "endMax": 0,
      "referenceBases": "string",
      "alternateBases": "string",
      "variantType": "string",
      "assemblyId": "GRCh38",
      "datasetIds": [
        "string"
      ],
      "includeDatasetResponses": "ALL"
    },
    "datasetAlleleResponses": [
      {
        "datasetId": "string",
        "referenceBases": "string",
        "alternateBases": "string",
        "exists": true,
        "error": {
          "errorCode": 0,
          "errorMessage": "string"
        },
        "frequency": 0,
        "variantCount": 0,
        "callCount": 0,
        "sampleCount": 0,
        "note": "string",
        "externalUrl": "string",
        "info": {}
      }
    ],
    "error": {
      "errorCode": 0,
      "errorMessage": "string"
    }
  }
]

mbaudis · 2018-10-29T14:59:58Z

Parsing this I see 2 differences

the added "referenceBases": "string", "alternateBases": "string",
the removal of the placeholder key/value texts from info

So: The ref/alt values in the response then would correspond to a match each, i.e. different matches to a wildcard ... would lead to multiple datasetAlleleResponses, each w/ their own values, right? Seems sensible. But: This then has to be extended for other attributes.
Examples:

a (proposed) "BRK" variantType could match specified "BRK" values, but also the edges of other structural events (e.g. start and end of "DUP" or "DEL")
positional fuzziness would lead to different start, end values of the matched variants

There is an argument to be made to use a handoff scenario for this (we have this e.g. in Beacon+ - handoff, where one just loads all the data of the matched variants.). But this then requires a specified handover response format, too.

I don't get the change in the info field - these are just placeholders telling users to stick to some kind of "key" : "value" format for additional data.

blankdots · 2018-10-30T06:35:45Z

Building on what @teemukataja mentioned, we made this suggestion based on our experience loading and analysing the data from 1000 genome project.
We are tackling one issue at a time, and it happens to be that the wildcard was one of the first.

We would like for the user to be able to differentiate between wildcard results in the UI (we have an example here: CSCfi/beacon-python#24), however the API specification did not provide any fields we can utilise for this purpose.

Thus we made a suggestion how this could be implemented, and we rely on the people defining the specification for the solution.

Regarding "BRK", we had not fully tackled variantTypes yet, but understanding that we might encounter such a use case is beneficial for us.

Regarding the info key the in the current Beacon 1.0.0 specification it is recommended that we implement something like this:

"info": [
          {"key": "accessType",
           "value": "PUBLIC"},
          {"key": "filterAlgorithm",
           "value": "CUSTOM"},
          {"key": "other",
           "value": null}
        ]

However that is equivalent to:

"info": {"accessType": "PUBLIC",
           "filterAlgorithm": "CUSTOM",
           "other": null}

This second option is easier to parse and work with, and we have not encountered any use cases for the first option.
We are aware of that this is tackled in isssue #168 and awaiting resolution.

mbaudis · 2018-10-30T07:48:12Z

@blankdots @teemukataja As said, I think the concept of returning the different matched alleles looks good to me, but needs some work/discussion about the implementation:

There is a difference between the returned variant and the query representation - so this is better solved with a proper variant model instead of using the query attributes w/ the variant values. We have started to have a variant schema for this kind of representation, which could be a good starting place (we use this behind the Beacon+ test implementation).
Some of the nesting of response objects has to be figured out (e.g. getting responses from multiple datasets, and then multiple alleles in datasetAlleleResponses - does one just have each different variant/allele represented there w/ the originating dataset as a value, or do we represent each dataset's responses w/ all embedded alleles?).

I'm highly in favour of doing some rapid development here - so suggestions, discussions, PRs welcome (IMO)!

mbaudis · 2018-11-06T15:52:48Z

@blankdots Btw., those (list vs. object) are not equivalent since only a list allows repeated keys (making it harder to parse, but better as a wrapper).

blankdots · 2018-11-12T05:50:19Z

@mbaudis probably should have mentioned i was focusing on content (and how to it could be used), not structure

mbaudis · 2018-11-16T14:02:02Z

I have written up a page about proposed range matches and wildcards, which also demonstrates a handover [H->O] variant object, which in turn can be analysed for its variant flavours.

This should be considered a (working) prototype, which may look different in the implementation brought forward by the dev team (@sdelatorrep?).

teemukataja mentioned this issue Oct 29, 2018

Implement wildcard differentiation and remove array from info key CSCfi/beacon-python#26

Merged

This was referenced Nov 1, 2018

Fix info key #224

Closed

Fix info key #226

Merged

teemukataja added a commit that referenced this issue Nov 1, 2018

introduces a solution for #223

c2b2528

teemukataja mentioned this issue Nov 1, 2018

Wildcard differentiation in response #227

Closed

teemukataja mentioned this issue Dec 11, 2018

Extend the ELIXIR Beacon Network UI to support multiple Beacon #239

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling wildcards #223

Handling wildcards #223

teemukataja commented Oct 29, 2018 •

edited

Loading

mbaudis commented Oct 29, 2018

teemukataja commented Oct 29, 2018 •

edited

Loading

mbaudis commented Oct 29, 2018

blankdots commented Oct 30, 2018

mbaudis commented Oct 30, 2018

mbaudis commented Nov 6, 2018

blankdots commented Nov 12, 2018 •

edited

Loading

mbaudis commented Nov 16, 2018

Handling wildcards #223

Handling wildcards #223

Comments

teemukataja commented Oct 29, 2018 • edited Loading

Choices

mbaudis commented Oct 29, 2018

teemukataja commented Oct 29, 2018 • edited Loading

mbaudis commented Oct 29, 2018

blankdots commented Oct 30, 2018

mbaudis commented Oct 30, 2018

mbaudis commented Nov 6, 2018

blankdots commented Nov 12, 2018 • edited Loading

mbaudis commented Nov 16, 2018

teemukataja commented Oct 29, 2018 •

edited

Loading

teemukataja commented Oct 29, 2018 •

edited

Loading

blankdots commented Nov 12, 2018 •

edited

Loading