🧬 Abstract

RNA molecules localize to specific subcellular compartments to perform their functions. While most mRNAs are translated at the endoplasmic reticulum, many lncRNAs act within organelles. To better understand RNA localization mechanisms, we developed SRLE-seq (Screening RNA Localization Elements by Sequencing), a high-throughput method to identify functional localization elements.

💻 Installation

Requires Python ≥ 3.8 and the following libraries:

pip install biopython numpy scipy tqdm

⚙️ Functional

📚 Kmer-based library diversity validation

1.Introduction

This section is designed to validate whether the constructed Kmer-based plasmid library is suitable for downstream experiments. The distribution of plasmid types corresponding to each k-mer should be relatively uniform. No single k-mer type should dominate the library in abundance, as this would indicate amplification bias or cloning artifacts.

2.Input Requirements

The script accepts the following required command-line arguments:

Argument	Type	Description	Default
`-fq1`	`str`	Path to the nuclear sample R1 (forward reads) FASTQ file
`-fq2`	`str`	Path to the nuclear sample R2 (reverse reads) FASTQ file	None
`-s`	`str`	Output directory for saving result figures and tables
`-up_flanking`	`str`	Sequence of the upstream (5') anchor primer
`-down_flanking`	`str`	Sequence of the downstream (3') anchor primer
`-kmer_length`	`str`	Length of k-mers to count

⚠️ For paired-end reads with Paired-End sequences, at least one of the primer-binding regions should be longer than 150 bp. Otherwise, the reads should be merged using PEAR before further processing.

3.Quick Start

Follow these steps to quickly analyse library diversity:

python kmer_diversity_validation.py \
  -fq1 R1.fastq \
  -fq2 R2.fastq \
  -s ./ \
  -up_flanking ATTATGAT \
  -down_flanking GCTTAGTG \
  -kmer_length 6 \

4.Check the Output

After completion, the output directory will contain:

kmer_diversity.validation.tsv – records the raw count and CPM (counts per million) for each k-mer.

📚 Fragment-based library diversity validation

1.Introduction

This section is designed to validate whether the constructed fragment-based plasmid library is suitable for downstream experiments. The inserted fragments should collectively achieve broad and uniform coverage across the full-length source sequence. Ideally, all regions of the original sequence should be represented within the library, ensuring comprehensive functional screening.

2.Input Requirements

The script accepts the following required command-line arguments:

Argument	Type	Description
`-fq1`	`str`	Path to the nuclear sample R1 (forward reads) FASTQ file
`-fq2`	`str`	Path to the nuclear sample R2 (reverse reads) FASTQ file
`-s`	`str`	Output directory for saving result figures and tables
`-up_flanking`	`str`	Sequence of the upstream (5') anchor primer
`-down_flanking`	`str`	Sequence of the downstream (3') anchor primer
`-gene_chr`	`str`	Chromosome name of the gene locus (e.g., `chr11`)
`-gene_region_start`	`int`	Start coordinate of the target gene region (1-based)
`-gene_region_end`	`int`	End coordinate of the target gene region (1-based)
`-config`	`str`	Path to a configuration file containing parameter presets
`-thread`	`int`	Number of threads to use for parallel processing

⚠️ For paired-end reads with Paired-End sequences, at least one of the primer-binding regions should be longer than 150 bp. Otherwise, the reads should be merged using PEAR before further processing.

3.Quick Start

Follow these steps to quickly analyse library diversity:

python randomfrag_diversity_evaluation.py \
  -fq1 R1.fastq \
  -fq2 R2.fastq \
  -s ./ \
  -up_flanking ATTATGAT \
  -down_flanking GCTTAGTG \
  -gene_chr chr11 \
  -gene_region_start 65497688
  -gene_region_end 65506516
  -config ./config.yaml
  -thread 64

4.Check the Output

After completion, the output directory will contain:

dna_fragment.coverage.bed - BED-format file showing per-base coverage of extracted fragments across the specified gene region.

dna_fragment.coverage.png - Coverage plot visualizing the distribution of extracted fragments.

📍 Kmer location analysis

1.Introduction

This section aims to investigate the subcellular localization preferences of individual k-mers following plasmid transfection into cells. As a prerequisite, quality control (QC) is performed on the post-transfection k-mer library. A k-mer is considered to pass QC if it shows a count-per-million (CPM) value greater than 1 in at least one of the three compartments: Total, Nuclear (Nuc), or Cytoplasmic (Cyto). This criterion ensures that the k-mer has been successfully transfected and exhibits sufficient expression for downstream analysis.

2.Input Requirements

The script accepts the following required command-line arguments:

Argument	Type	Description
`-nuc_fq1_file`	`str`	Path to the nuclear sample R1 (forward reads) FASTQ file
`-nuc_fq2_file`	`str`	Path to the nuclear sample R2 (reverse reads) FASTQ file
`-cyto_fq1_file`	`str`	Path to the cytoplasm sample R1 (forward reads) FASTQ file
`-cyto_fq2_file`	`str`	Path to the cytoplasm sample R2 (forward reads) FASTQ file
`-s`	`str`	Output directory for saving result figures and tables
`-up_flanking`	`str`	Sequence of the upstream (5') anchor primer
`-down_flanking`	`str`	Sequence of the downstream (3') anchor primer
`-mode`	`str`	Processing mode: `kmer_complexity` or `fragment_complexity`
`-kmer_length`	`int`	Length of k-mers to count

⚠️ For paired-end reads with Paired-End sequences, at least one of the primer-binding regions should be longer than 150 bp. Otherwise, the reads should be merged using PEAR before further processing.

3.Quick Start

Follow these steps to quickly analyse fragment location analysis:

python library_location_analysis.py \
  -nuc_fq1_file nuc.R1.fastq \
  -nuc_fq2_file nuc.R2.fastq \
  -cyto_fq1_file cyto.R1.fastq \
  -cyto_fq2_file cyto.R2.fastq \
  -s ./ \
  -up_flanking ATTATGAT \
  -down_flanking GCTTAGTG \
  -mode kmer_complexity \
  -kmer_length 6 \

4.Check the Output

After completion, the output directory will contain:

kmer_complexity.kmer.count.png – A scatter plot comparing the abundance (CPM) of each k-mer between the cytoplasmic (x-axis) and nuclear (y-axis) compartments. This plot reveals subcellular localization patterns of individual k-mers.

kmer_complexity.info.csv – A summary table listing each k-mer's cytoplasmic and nuclear counts, log₂ fold change (cytoplasm vs. nucleus), and Nuclear Enrichment Score (NES), providing quantitative metrics for assessing k-mer localization bias.

📍 Randomfrag location analysis

1.Introduction

This section focuses on the subcellular localization patterns of full-length or partial sequence fragments after plasmid transfection. Quality control of the fragment library requires each fragment to have a minimum sequencing coverage of 100×, ensuring reliable detection and quantification. Only fragments meeting this coverage threshold are included in subsequent analyses to evaluate their abundance, CPM, and nuclear enrichment score (NES), which together provide insights into their subcellular distribution characteristics.

2.Input Requirements

The script accepts the following required command-line arguments:

Argument	Type	Description
`-nuc_fq1`	`str`	Path to the nuclear sample R1 (forward reads) FASTQ file
`-nuc_fq2`	`str`	Path to the nuclear sample R2 (reverse reads) FASTQ file
`-cyto_fq1`	`str`	Path to the cytoplasm sample R1 (forward reads) FASTQ file
`-cyto_fq2`	`str`	Path to the cytoplasm sample R2 (forward reads) FASTQ file
`-s`	`str`	Output directory for saving result figures and tables
`-up_flanking`	`str`	Sequence of the upstream (5') anchor primer
`-down_flanking`	`str`	Sequence of the downstream (3') anchor primer
`-gene_chr`	`str`	Processing mode: `kmer_complexity` or `fragment_complexity`
`-gene_region_start`	`int`	Start coordinate of the target gene region (1-based)
`-gene_region_end`	`int`	End coordinate of the target gene region (1-based)
`-config`	`str`	Path to a configuration file containing parameter presets
`-thread`	`int`	Number of threads to use for parallel processing

⚠️ For paired-end reads with Paired-End sequences, at least one of the primer-binding regions should be longer than 150 bp. Otherwise, the reads should be merged using PEAR before further processing.

3.Quick Start

Follow these steps to quickly analyse fragment location analysis:

python randomfrag_location_analysis.py \
  -nuc_fq1_file nuc.R1.fastq \
  -nuc_fq2_file nuc.R2.fastq \
  -cyto_fq1_file cyto.R1.fastq \
  -cyto_fq2_file cyto.R2.fastq \
  -s ./ \
  -up_flanking ATTATGAT \
  -down_flanking GCTTAGTG \
  -mode kmer_complexity \
  -gene_chr chr11 \
  -gene_region_start 65497688 \
  -gene_region_end 65506516 \
  -config ./config.yaml \
  -thread 64 \

4.Check the Output

After completion, the output directory will contain:

randomfrag_diversity.tsv – The abundance (read count and CPM) of each random fragment derived from the target gene.

coverage.bed – Base-level coverage information for each specified gene region, indicating how many reads cover each genomic position.

dna_fragment.coverage.png – A visualization of base-level sequencing coverage across each specified gene region, illustrating the distribution and depth of coverage.

Fragment.length.distribution.png – A scatter plot comparing the abundance of each k-mer in the cytoplasm (x-axis) versus the nucleus (y-axis), highlighting their subcellular localization tendencies.

📝 Please Cite

If you use this script or parts of it in your research or project, please cite the repository or acknowledge the author appropriately. A suggested citation format:

Zeng. x. (2025). SRLE-seq k-mer comparison and visualization pipeline

👨‍💻 Maintainer

Lin Yusen
Email: [email protected]

Feel free to open an issue or pull request for improvements or bug fixes.

🤝 Contributors

Current contributors:

Yusen Lin

Xingqaun Zeng

Jiajian Zhou

📄 License

This project is licensed under the MIT License.
You are free to use, modify, and distribute it with attribution.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
LICENSE.txt		LICENSE.txt
README.md		README.md
config.yaml		config.yaml
kmer_diversity_evaluation.py		kmer_diversity_evaluation.py
kmer_location_analysis.py		kmer_location_analysis.py
randomfrag_diversity_evaluation.py		randomfrag_diversity_evaluation.py
randomfrag_location_analysis.py		randomfrag_location_analysis.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧬 Abstract

💻 Installation

⚙️ Functional

📚 Kmer-based library diversity validation

1.Introduction

2.Input Requirements

3.Quick Start

4.Check the Output

📚 Fragment-based library diversity validation

1.Introduction

2.Input Requirements

3.Quick Start

4.Check the Output

📍 Kmer location analysis

1.Introduction

2.Input Requirements

3.Quick Start

4.Check the Output

📍 Randomfrag location analysis

1.Introduction

2.Input Requirements

3.Quick Start

4.Check the Output

📝 Please Cite

👨‍💻 Maintainer

🤝 Contributors

📄 License

About

Uh oh!

Releases

Packages

Languages

License

lysovosyl/SRLE-seq

Folders and files

Latest commit

History

Repository files navigation

🧬 Abstract

💻 Installation

⚙️ Functional

📚 Kmer-based library diversity validation

1.Introduction

2.Input Requirements

3.Quick Start

4.Check the Output

📚 Fragment-based library diversity validation

1.Introduction

2.Input Requirements

3.Quick Start

4.Check the Output

📍 Kmer location analysis

1.Introduction

2.Input Requirements

3.Quick Start

4.Check the Output

📍 Randomfrag location analysis

1.Introduction

2.Input Requirements

3.Quick Start

4.Check the Output

📝 Please Cite

👨‍💻 Maintainer

🤝 Contributors

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages