Conversation
|
|
||
| my $translation = $transcript->translation; | ||
|
|
||
| if (!$translation) { |
There was a problem hiding this comment.
we compute the translation before so I think we want to exclude these ?
|
Issue 3bp CDS survived original filter.
→ it goes through the ▫ you were in the “final transcript in file” case where ▫ it’s a transcript produced later (e.g. by joining / UTR processing) that hasn’t had
We now filter out 1‑codon transcripts immediately after grouping and optional joining, instead of only relying on later CDS‑length criteria. The change walks over joined_transcripts, inspects any existing translation, and removes transcripts whose translation length is ≤1 amino acid, logging what was dropped. This prevents obviously spurious 3bp coding models (like BUSCO artefacts) from propagating into downstream steps and the final GTF, while leaving non‑coding or untranslated models untouched for later processing. |
|
These are filtered: The first line is not expected. |
FIlter for 1 codon translations (3bp).
Tested.
Original gtf: /hps/nobackup/flicek/ensembl/genebuild/lazar/bbr/zymoseptoria_tritici_pangenome/GCA_017766825.1/annotation_output/initial_region_gtfs/4.rs1.re2952612.busco_copy.gtf
Before filtering:
/hps/nobackup/flicek/ensembl/genebuild/lazar/dev/validation/old/output.gtf
After filtering:
/hps/nobackup/flicek/ensembl/genebuild/lazar/dev/validation/output.gtf