Skip to content

Commit

Permalink
Continued work to complete the combined processed/pa-ncRNA gtf and co…
Browse files Browse the repository at this point in the history
…unts matrix, among other things
  • Loading branch information
Kris Alavattam committed May 24, 2023
1 parent 570fd3d commit 9f5f8d8
Show file tree
Hide file tree
Showing 7 changed files with 3,751 additions and 48 deletions.
3,179 changes: 3,179 additions & 0 deletions results/2023-0215/work_assess-process_R64-1-1_gff3_part-1.5.nb.html

Large diffs are not rendered by default.

5 changes: 3 additions & 2 deletions results/2023-0215/work_assess-process_R64-1-1_gff3_part-1.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -901,6 +901,7 @@ z_sense %>%
<br />

## Next step
Go to [`work_assess-process_R64-1-1_gff3_part-2.Rmd`](./work_assess-process_R64-1-1_gff3_part-2.Rmd)
*(This is the overlap/classification work for the Q and G1 nascent transcriptomes.)*
- Go to [`work_combine-gtfs_processed-pa-ncRNA_part-0.Rmd`](./work_combine-gtfs_processed-pa-ncRNA_part-0.Rmd)
- Go to [`work_assess-process_R64-1-1_gff3_part-2.Rmd`](./work_assess-process_R64-1-1_gff3_part-2.Rmd) *(This is the overlap/classification work for the Q and G1 nascent transcriptomes.)* `#TODO` *Rename this script to better reflect its function.*
- Go to [`work_count-features_assessed-processed-R64-1-1-gff3s.md`](./work_count-features_assessed-processed-R64-1-1-gff3s.md) *(This is the notebook for submitting htseq-counts jobs to the cluster.)*
<br />
13 changes: 10 additions & 3 deletions results/2023-0215/work_assess-process_R64-1-1_gff3_part-1.nb.html

Large diffs are not rendered by default.

46 changes: 12 additions & 34 deletions results/2023-0215/work_assessment-processing_gtfs_part-2_Trinity.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,19 +8,19 @@
<!-- MarkdownTOC -->

1. [Get situated](#get-situated)
1. [Code](#code)
1. [Code](#code)
1. [Run `htseq-count` on bams in `bams_renamed/` with `.gtf`s in `outfiles_gtf-gff3/Trinity-GG`](#run-htseq-count-on-bams-in-bams_renamed-with-gtfs-in-outfiles_gtf-gff3trinity-gg)
1. [Set up outfile directories](#set-up-outfile-directories)
1. [Code](#code-1)
1. [Set up arrays of bams](#set-up-arrays-of-bams)
1. [Code](#code-2)
1. [Index bams](#index-bams)
1. [Code](#code-3)
1. [Run `htseq-count` with `.gtf`s in `outfiles_gtf-gff3/representation`](#run-htseq-count-with-gtfs-in-outfiles_gtf-gff3representation)
1. [Set up necessary arrays, variables](#set-up-necessary-arrays-variables)
1. [Code](#code-4)
1. [Set up and submit `htseq-count` jobs](#set-up-and-submit-htseq-count-jobs)
1. [Code](#code-5)
1. [Set up outfile directories](#set-up-outfile-directories)
1. [Code](#code-1)
1. [Set up arrays of bams](#set-up-arrays-of-bams)
1. [Code](#code-2)
1. [Index bams](#index-bams)
1. [Code](#code-3)
1. [Run `htseq-count` with `.gtf`s in `outfiles_gtf-gff3/representation`](#run-htseq-count-with-gtfs-in-outfiles_gtf-gff3representation)
1. [Set up necessary arrays, variables](#set-up-necessary-arrays-variables)
1. [Code](#code-4)
1. [Set up and submit `htseq-count` jobs](#set-up-and-submit-htseq-count-jobs)
1. [Code](#code-5)

<!-- /MarkdownTOC -->
</details>
Expand Down Expand Up @@ -328,28 +328,6 @@ for i in "strd-eq"; do
sleep 0.5
done
done




# sbatch \
# --job-name=${job_name} \
# --nodes=1 \
# --cpus-per-task=${threads} \
# --error=${err_out}.%A.stderr.txt \
# --output=${err_out}.%A.stdout.txt \
# htseq-count \
# --order "pos" \
# --stranded "${hc_strd}" \
# --nonunique "none" \
# --type "feature" \
# --idattr "gene_id" \
# --nprocesses ${threads} \
# --counts_output "${out}" \
# --with-header \
# ${UT_prim_UMI[*]} \
# "${count_against}"

```
</details>
<br />
185 changes: 185 additions & 0 deletions results/2023-0215/work_combine-gtfs_processed-pa-ncRNA_part-0.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,185 @@
---
title: "work_combine-gtfs_processed-pa-ncRNA_part-0.Rmd"
author: "KA"
email: "[email protected]"
output:
html_notebook:
toc: yes
toc_float: true
---

## Get situated
### Code
<details>
<summary><i>Code: Get situated</i></summary>

```{r Get situated, results='hide', message=FALSE, warning=FALSE}
#!/usr/bin/env Rscript
library(GenomicRanges)
library(IRanges)
library(plyr)
library(readxl)
library(rtracklayer)
library(tidyverse)
options(scipen = 999)
options(ggrepel.max.overlaps = Inf)
if(base::isTRUE(stringr::str_detect(getwd(), "kalavattam"))) {
p_local <- "/Users/kalavattam/Dropbox/FHCC"
} else {
p_local <- "/Users/kalavatt/projects-etc"
}
p_wd <- "2022_transcriptome-construction/results/2023-0215"
setwd(paste(p_local, p_wd, sep = "/"))
getwd()
rm(p_local, p_wd)
```
</details>
<br />
<br />

## Load and combine `gtf`s
### Load `gtf` files [derived/"processed" from R64-1-1](./work_assess-process_R64-1-1_gff3_part-1.Rmd)
#### Code
<details>
<summary><i>Code: Load `gtf` files derived/"processed" from R64-1-1</i></summary>

```{r Load gtf files derived/processed from R64-1-1, results='hide', message=FALSE, warning=FALSE}
#!/usr/bin/env Rscript
read_processed_gtf <- function(file) {
tbl <- file %>%
rtracklayer::import() %>%
tibble::as_tibble() %>%
dplyr::select(-c(width, score, phase)) %>%
dplyr::rename(feature = type.1) %>%
dplyr::arrange(seqnames, start)
tbl[tbl == "NA"] <- NA_character_
return(tbl)
}
p_processed <- "./outfiles_gtf-gff3/comprehensive/S288C_reference_genome_R64-1-1_20110203"
f_gene <- "processed_gene_sense.gtf"
f_PG <- "processed_PG_sense.gtf"
f_snRNA <- "processed_snRNA_sense.gtf"
f_snoRNA <- "processed_snoRNA_sense.gtf"
f_TE <- "processed_TE_sense.gtf"
t_gene <- paste(p_processed, f_gene, sep = "/") %>% read_processed_gtf()
t_PG <- paste(p_processed, f_PG, sep = "/") %>% read_processed_gtf()
t_snRNA <- paste(p_processed, f_snRNA, sep = "/") %>% read_processed_gtf()
t_snoRNA <- paste(p_processed, f_snoRNA, sep = "/") %>% read_processed_gtf()
t_TE <- paste(p_processed, f_TE, sep = "/") %>% read_processed_gtf()
t_processed <- dplyr::bind_rows(t_gene, t_PG, t_snRNA, t_snoRNA, t_TE) %>%
dplyr::arrange(seqnames, start)
rm(p_processed, f_gene, f_PG, f_snRNA, f_snoRNA, f_TE)
rm(t_gene, t_PG, t_snRNA, t_snoRNA, t_TE)
```
</details>
<br />

### Load [pa-ncRNA `gtf` file](./work_representative-non-coding-transcriptome_part-4.Rmd)
#### Code
<details>
<summary><i>Code: Load pa-ncRNA `gtf` file</i></summary>

```{r}
#!/usr/bin/env Rscript
read_pa_ncRNA_gtf <- function(file) {
tbl <- file %>%
rtracklayer::import() %>%
tibble::as_tibble() %>%
dplyr::select(-c(
width, score, phase, details_type_alpha, details_type, details_id,
details_all, n_types, n_features, n_types_features, length
)) %>%
dplyr::mutate(
feature = "pancRNA",
orf_classification = NA_character_,
source_id = NA_character_
) %>%
dplyr::arrange(seqnames, start)
return(tbl)
}
p_pa_ncRNA <- "./outfiles_gtf-gff3/representation"
f_pa_ncRNA <- "Greenlaw-et-al_representative-non-coding-transcriptome.gtf"
t_pa_ncRNA <- paste(p_pa_ncRNA, f_pa_ncRNA, sep = "/") %>% read_pa_ncRNA_gtf()
rm(p_pa_ncRNA, f_pa_ncRNA)
```
</details>
<br />

### Row-bind the "processed" and pa-ncRNA `gtf`s
<details>
<summary><i>Code: Row-bind the "processed" and pa-ncRNA `gtf`s</i></summary>

```{r}
#!/usr/bin/env Rscript
t_bound <- dplyr::bind_rows(t_processed, t_pa_ncRNA) %>%
dplyr::arrange(seqnames, start)
```
</details>
<br />
<br />

## Write out combined "processed" and pa-ncRNA `gtf`
### Code
<details>
<summary><i>Code: Row-bind the "processed" and pa-ncRNA `gtf`s</i></summary>

```{r}
#!/usr/bin/env Rscript
write_gtf <- function(x, y) {
# ...
# :param x: tibble
# :param y: outfile
# :return: NA
readr::write_tsv(
x,
y,
col_names = FALSE,
quote = "none",
escape = "none"
)
}
p_gtf <- "./outfiles_gtf-gff3/representation"
f_gtf <- "Greenlaw-et-al_representative-coding-non-coding-etc-transcriptome.gtf"
t_gtf <- t_bound %>%
dplyr::mutate(score = ".", frame = ".") %>%
dplyr::relocate(c("seqnames", "source", "type"), .before = "start") %>%
dplyr::relocate(c("score", "strand", "frame"), .after = "end") %>%
dplyr::mutate(
attribute = paste(
paste0("gene_id \"", gene_id, "\""),
paste0("transcript_id \"", transcript_id, "\""),
paste0("type \"", feature, "\""),
paste0("orf_classification \"", orf_classification, "\""),
paste0("source_id \"", source_id, "\""),
sep = "; "
)
) %>%
dplyr::select(
-c(gene_id, transcript_id, feature, orf_classification, source_id)
)
write_gtf(t_gtf, paste(p_gtf, f_gtf, sep = "/"))
```
</details>
<br />
Loading

0 comments on commit 9f5f8d8

Please sign in to comment.