Skip to content

Allele search web component#595

Open
That-Thing wants to merge 5 commits intomainfrom
allele-search-tool
Open

Allele search web component#595
That-Thing wants to merge 5 commits intomainfrom
allele-search-tool

Conversation

@That-Thing
Copy link
Collaborator

Web component version of the SoyBase allele search tool requested by @StevenCannon-USDA.

@StevenCannon-USDA
Copy link

StevenCannon-USDA commented Feb 26, 2026

Testing on 2/26 locally (nom run serve), I see only <lis-allele-search-element>

@matthewwiese
Copy link

@StevenCannon-USDA FWIW from a fresh clone on the branch I see all existing components plus the allele search one.

@That-Thing Regarding the TODO in the element code (glyma.Wm82.gnm4.Gm01 etc placeholders) could this be made customizable, such as part of the "collections"? In case we want to use this for something other than soybean in future.

Copy link

@StevenCannon-USDA StevenCannon-USDA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing now, after (re?)running npm run build, the content comes up fine for me.

This all looks good to me - though I think I would like two changes:

  • Change the text "Ref / Alt only" under 1.5 to "All strains". The behavior has changed vs. the implementation at https://www.soybase.org/tools/ . I think the new behavior is good, but that first radio button does give all strains, so that should be so-labeled.
  • I think we should limit the query region size, in both the "Identifier" and "Region" sections. I suggest 1000000. That probably then calls for a corresponding label: "Flanking Region (1 million max)"

@That-Thing
Copy link
Collaborator Author

@matthewwiese
Getting a weird error from fasta-api:

curl 'http://dev.lis.ncgr.org:50043/vcf/alleles/glyma.Wm82.gnm4.Gm16:3727736-3749207/https%253A%252F%252Fdata.legumeinfo.org%252FGlycine%252Fmax%252Fdiversity%252FWm82.gnm4.div.Song_Hyten_2015%252Fglyma.Wm82.gnm4.div.Song_Hyten_2015.vcf.gz?encoding=hap'

response:

{"error": "Unable to open file: [Errno 0] Closing failed: Success: 'https://data.legumeinfo.org/Glycine/max/diversity/Wm82.gnm4.div.Song_Hyten_2015/glyma.Wm82.gnm4.div.Song_Hyten_2015.vcf.gz'", "status": 400}

@matthewwiese
Copy link

@That-Thing That's interesting, I think the .tbi index is wrong? On my side the errors are:

[E::hts_open_format] Failed to open file "https://data.legumeinfo.org/Glycine/max/diversity/Wm82.gnm4.div.Song_Hyten_2015/glyma.Wm82.gnm4.div.Song_Hyten_2015.vcf.gz" : Destination address required
[E::hts_open_format] Failed to open file "https://data.legumeinfo.org/Glycine/max/diversity/Wm82.gnm4.div.Song_Hyten_2015/glyma.Wm82.gnm4.div.Song_Hyten_2015.vcf.gz" : Destination address required
[E::bgzf_read_block] Invalid BGZF header at offset 116078328
[E::bgzf_read_block] Invalid BGZF header at offset 116069636
[E::bgzf_read_block] Invalid BGZF header at offset 116069636
[E::bgzf_read_block] Invalid BGZF header at offset 116078328
[E::bgzf_read_block] Invalid BGZF header at offset 116069636
[E::bgzf_read_block] Invalid BGZF header at offset 116078328
[E::bgzf_read_block] Invalid BGZF header at offset 116078328
[E::bgzf_read_block] Invalid BGZF header at offset 116078328
[E::bgzf_read_block] Invalid BGZF header at offset 116078328
[E::bgzf_read_block] Invalid BGZF header at offset 116078328
[E::bgzf_read_block] Invalid BGZF header at offset 116078328
[E::hts_open_format] Failed to open file "https://data.legumeinfo.org/Glycine/max/diversity/Wm82.gnm4.div.Song_Hyten_2015/glyma.Wm82.gnm4.div.Song_Hyten_2015.vcf.gz" : Destination address required
[E::bgzf_read_block] Invalid BGZF header at offset 116078328
[E::bgzf_read_block] Invalid BGZF header at offset 116078328

Pysam's fetch() automatically looks for a tbi at the same URL as the VCF, but the file at that location is only ~70 bytes https://data.legumeinfo.org/Glycine/max/diversity/Wm82.gnm4.div.Song_Hyten_2015/glyma.Wm82.gnm4.div.Song_Hyten_2015.vcf.gz.tbi

Whereas e.g. https://data.soybase.org/Glycine/max/diversity/Wm82.gnm2.div.Valliyodan_Brown_2021/USB481-25Kshared50Kpos.vcf.gz.tbi is ~170 KB.

My guess is the data is corrupt or something? @adf-ncgr @StevenCannon-USDA any ideas?

@adf-ncgr
Copy link
Contributor

adf-ncgr commented Mar 4, 2026

sounds plausible. When I run bcftools view I don't get an error but indexed retrieval seems to yield nothing but the headers. I'll see if I can fix the version in the datastore but @StevenCannon-USDA will then have to sync it to where the web-hosting takes place.

@adf-ncgr
Copy link
Contributor

adf-ncgr commented Mar 4, 2026

OK, there were several files that seemed to have a similar problem (different genome versions of the same Song_Hyten_2015 genotype set). I've fixed them on ceres but I think the problem @That-Thing reported will remain until @StevenCannon-USDA can sync the new files to the web-hosting server.

@StevenCannon-USDA
Copy link

I have schlepped these over to c2s2. This hasn't seemed to fix the problem for me -- though the explanation makes sense.

Adding @weihuang12 , who has been working on these files.

Is the problem that some of the indexes are out of date with the bgzipped VCFs?

@adf-ncgr
Copy link
Contributor

adf-ncgr commented Mar 4, 2026

I'm not really sure what happened. As @matthewwiese said the tbi files that were there before I re-ran the indexing seemed to be only ~70bytes in size, although it looked like they had been updated recently. Now if I run a bcftools view locally on ceres the indexing seems to work OK, but I agree that the web-hosted one is still mis-behaving; not sure if some sort of caching could be at play. Can you verify that the tbi files on c2s2 are appropriately sized? (they are hidden through the h5ai interface so I can't tell)

@adf-ncgr
Copy link
Contributor

adf-ncgr commented Mar 4, 2026

actually nevermind about validating the file size, I just curled it and it seems OK. Puzzled as to what may be happening.

@adf-ncgr
Copy link
Contributor

adf-ncgr commented Mar 4, 2026

Actually, I think it was a caching issue, at least in the way I was testing it. doing something like:
bcftools view https://data.legumeinfo.org/Glycine/max/diversity/Wm82.gnm4.div.Song_Hyten_2015/glyma.Wm82.gnm4.div.Song_Hyten_2015.vcf.gz glyma.Wm82.gnm4.Gm01:1-100000
first time around downloads the tbi file to where you ran the command. After deleting the bad one from a previous attempt it started working properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants