Fix short transcripts by AnnaLazarEBI · Pull Request #23 · Ensembl/ensembl-anno

AnnaLazarEBI · 2025-11-25T16:10:11Z

FIlter for 1 codon translations (3bp).

Tested.

Original gtf: /hps/nobackup/flicek/ensembl/genebuild/lazar/bbr/zymoseptoria_tritici_pangenome/GCA_017766825.1/annotation_output/initial_region_gtfs/4.rs1.re2952612.busco_copy.gtf

Before filtering:
/hps/nobackup/flicek/ensembl/genebuild/lazar/dev/validation/old/output.gtf

After filtering:
/hps/nobackup/flicek/ensembl/genebuild/lazar/dev/validation/output.gtf

later.

ens-ftricomi · 2025-11-28T14:30:44Z

support_scripts_perl/select_best_transcripts.pl

+
+      my $translation = $transcript->translation;
+
+      if (!$translation) {


we compute the translation before so I think we want to exclude these ?

AnnaLazarEBI · 2026-01-20T14:31:10Z

Issue 3bp CDS survived original filter.
For the 3bp BUSCO transcript to survive this:

‎$transcript->translation must be ‎undef at that time

→ it goes through the ‎unless ($translation) branch and is kept.This happens if:

▫ you were in the “final transcript in file” case where ‎compute_translation was commented out when the transcript was first built, or

▫ it’s a transcript produced later (e.g. by joining / UTR processing) that hasn’t had ‎translation set yet.

Or it had a translation with ‎length > 1 aa at that moment, and only later (after you call ‎compute_translation in the ‎"Computing translations" loop) did the CDS collapse to 3bp.

We now filter out 1‑codon transcripts immediately after grouping and optional joining, instead of only relying on later CDS‑length criteria. The change walks over ‎joined_transcripts, inspects any existing ‎translation, and removes transcripts whose translation length is ≤1 amino acid, logging what was dropped. This prevents obviously spurious 3bp coding models (like BUSCO artefacts) from propagating into downstream steps and the final GTF, while leaving non‑coding or untranslated models untouched for later processing.

AnnaLazarEBI · 2026-01-20T14:33:14Z

These are filtered:
DS995906.1 anno exon 540775 543237 . - . gene_id "anno_4596"; transcript_id "anno_4596"; exon_number "1"; DS995906.1 anno transcript 1549067 1549069 . - . gene_id "anno_11407"; transcript_id "anno_11407"; biotype "busco"; translation_coords "1549067:1549069:1:1549067:1549069:3"; DS995906.1 anno exon 1549067 1549069 . - . gene_id "anno_11407"; transcript_id "anno_11407"; exon_number "1";

The first line is not expected.
original slice: /hps/nobackup/flicek/ensembl/genebuild/lazar/bbr/reannotation/batch2/GCA_000001985.1/annotation_output/slect_script_test/DS995906.1.rs1.re1818470.busco.gtf
filtered slice: /hps/nobackup/flicek/ensembl/genebuild/lazar/bbr/reannotation/batch2/GCA_000001985.1/annotation_output/slect_script_test/DS995906.1.rs1.re1818470.busco_sel.gtf

AnnaLazarEBI added 2 commits November 25, 2025 14:44

Added filtering for 3bp transcripts so the validation doesn't fail

d8a74af

later.

Restore anno to original

ef33bbc

AnnaLazarEBI requested a review from ens-ftricomi November 25, 2025 16:10

ens-ftricomi reviewed Nov 28, 2025

View reviewed changes

AnnaLazarEBI added 2 commits December 4, 2025 13:06

Changed script so it removes the trascript from the list

bd8f6fe

Remove file. Accidental commit

a7206d4

AnnaLazarEBI requested a review from ens-ftricomi January 13, 2026 11:46

AnnaLazarEBI marked this pull request as draft January 20, 2026 13:15

Added filer post comuting transcripts

06f96a6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix short transcripts#23

Fix short transcripts#23
AnnaLazarEBI wants to merge 5 commits intomainfrom
fix/short_transcript

AnnaLazarEBI commented Nov 25, 2025

Uh oh!

ens-ftricomi Nov 28, 2025

Uh oh!

AnnaLazarEBI commented Jan 20, 2026

Uh oh!

AnnaLazarEBI commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		my $translation = $transcript->translation;

		if (!$translation) {

Conversation

AnnaLazarEBI commented Nov 25, 2025

Uh oh!

ens-ftricomi Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

AnnaLazarEBI commented Jan 20, 2026

Uh oh!

AnnaLazarEBI commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants