-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does enhanced RP gene score drop peaks in exons/promoter regions of multiple genes? #107
Comments
@mengxiao According to MAESTRO/MAESTRO/scATAC_Genescore.py Lines 233 to 237 in 28e2927
All the peaks will be classified as either the peaks in any 'gene-bodies' (=exon+promoter)', or the peaks outside of any 'gene-bodies'. If a peak_i is in the first class, it will be excluded from later calculation (L238 and so one) while applying a decay function. The definition of a peak_i 'located in the promoter or exon regions of any nearby genes' doesn't exclude the peak overlapping with the exon/promoter of the given gene_j itself. So my understanding of the current implementation is that if the peak_i is overlapping with the exon of the given gene, it will have a weight of 1 no matter whether it's overlapping with other genes; if the peak_i is not overlapping with the given gene, but it's overlapping with any nearby gene exons, it will have a weight of 0. For me, the confusion is on how the isoforms of a gene model are considered? In the current codes, each isoform is independent while calculating the RPs and they will inevitably influence each other. In brief, we are still looking into this issue. |
Ah interesting. May I ask a bit about the rationale behind that approach? Based on the supplement, it sounded like the enhanced model tries to avoid misattributing peaks to unexpressed neighboring genes. If that's the goal, wouldn't it make sense to exclude any peaks in the 'gene bodies' of multiple genes? The issue of isoforms might be being addressed here MAESTRO/MAESTRO/scATAC_Genescore.py Lines 370 to 383 in 28e2927
MAESTRO/MAESTRO/scATAC_Genescore.py Line 87 in 28e2927
|
I was thinking a bit more about this- is the reason for handling overlaps this way that one doesn't expect many instances of peaks that are in promoter/exons of multiple genes? Dropping the promoter/exon regions is explicitly done for the exponential decay component of the model because that is much more likely to involve overlap between genes? |
Based on the MAESTRO paper, my understanding is that a peak that is in the exons or promoter region of more than one gene is scored as 0 for all genes:
If I've understood correctly, Wij is calculated by RP_AddExonRemovePromoter, which keeps track of which peaks are in exons/promoter regions. However, it doesn't seem to track which ones are in more than one gene. Is it possible that an additional step to filter such peaks has been left out?
The text was updated successfully, but these errors were encountered: