Fastp Plugin for Geneious Prime - Installation and Usage Guide

Overview

This plugin integrates fastp (short-read quality control) and fastplong (long-read quality control) into Geneious Prime, allowing you to perform comprehensive quality control, filtering, and preprocessing of FASTQ sequencing data directly within Geneious.

Features

✓ Automatic detection of paired-end vs single-end reads
✓ Automatic detection of short-read vs long-read data
✓ Quality filtering and trimming
✓ Adapter detection and removal
✓ Length filtering
✓ Per-read quality cutting
✓ PolyG/PolyX tail trimming (for NextSeq/NovaSeq)
✓ Base correction in overlapped regions (paired-end)
✓ UMI processing
✓ Complexity filtering and deduplication
✓ Comprehensive statistics and HTML reports
✓ Integration with Geneious document metadata

Version Information

Plugin Version: 1.0.0
Fastp Version: 1.0.1
Fastplong Version: 0.4.1
Supported Platforms: macOS (Intel and Apple Silicon)
Geneious Compatibility: Geneious Prime 2025.2.2+

Installation

Step 1: Download the Plugin

The plugin file is located at:

/Users/dho/Documents/geneious-plugin-fastp/build/FastpPlugin.gplugin

Step 2: Install in Geneious

Method 1: Via Plugins Directory

cp /Users/dho/Documents/geneious-plugin-fastp/build/FastpPlugin.gplugin \
   ~/Library/Application\ Support/Geneious\ Prime/plugins/

Method 2: Via Geneious UI

Open Geneious Prime
Go to Tools → Plugins → Install Plugin from File...
Navigate to and select FastpPlugin.gplugin
Click Install

Step 3: Restart Geneious

Close and restart Geneious Prime to activate the plugin.

Step 4: Verify Installation

After restart:

Go to Tools → Plugins
Verify "Fastp Quality Control Plugin" appears in the list
Check the toolbar - you should see "Run Fastp QC" button
Right-click on any sequence list - "Run Fastp QC" should appear in the context menu

Usage

Basic Workflow

Import FASTQ Data
- Import your FASTQ files into Geneious
- Files will be recognized as sequence lists
Select Sequence Lists
- Select one or more sequence lists in the Document Table
- For paired-end data, you can select individual files or both R1/R2
Launch Fastp
- Click "Run Fastp QC" in the toolbar, or
- Right-click and select "Run Fastp QC", or
- Go to Tools → Run Fastp QC
Configure Options
- The options panel will appear with all available parameters
- Configure as needed or use default presets
Run
- Click OK to start processing
- Progress will be shown in the progress dialog
View Results
- Filtered sequence lists will appear in the Document Table
- Click on output sequences to view metadata in the Info tab
- Statistics include read counts, quality metrics, GC content, etc.

Tool Selection

The plugin can automatically detect the appropriate tool based on your data:

Auto-Detect Mode (Default)

The plugin analyzes the first 10 sequences to determine average read length:

Short reads (< 1000 bp) → Uses fastp
Long reads (≥ 1000 bp) → Uses fastplong

Manual Selection

You can override auto-detection:

Tool Selection dropdown:
- Auto-detect (recommended)
- Fastp (force short-read mode)
- Fastplong (force long-read mode)

Paired-End Detection

The plugin automatically detects paired-end data using naming conventions:

Supported R1 Indicators

_R1, _1, .R1, .1
_forward, _fwd

Supported R2 Indicators

_R2, _2, .R2, .2
_reverse, _rev

Examples

sample_R1.fastq and sample_R2.fastq → Detected as paired
sample_1.fq.gz and sample_2.fq.gz → Detected as paired
sample.fastq → Processed as single-end

Manual Pairing

If your files don't follow standard naming:

Select both R1 and R2 files together
The plugin will process them as single-end unless names match
Consider renaming files to use standard conventions

Options Guide

Basic Options

Number of Threads (1-32, default: 4)

More threads = faster processing
Recommended: 4-8 for most systems

Compression Level (1-9, default: 4)

Higher = smaller output files, slower processing
Recommended: 4 for balance

Quality Filtering

Disable Quality Filtering

Check to skip all quality-based filtering

Qualified Quality Phred (0-40, default: 15)

Minimum quality for a base to be considered "qualified"
Phred 15 = 96.8% accuracy, Phred 20 = 99% accuracy

Unqualified Percent Limit (0-100%, default: 40)

Maximum percentage of unqualified bases allowed per read
Reads exceeding this are discarded

N Base Limit (0-100, default: 5)

Maximum N (unknown) bases allowed per read

Average Quality (0-40, default: 0/disabled)

Minimum average quality required for entire read
0 = disabled

Length Filtering

Disable Length Filtering

Check to skip length-based filtering

Minimum Length (0-10000, default: 15)

Reads shorter than this are discarded
Prevents very short reads from downstream analysis

Maximum Length (0-100000, default: 0/unlimited)

Reads longer than this are discarded
0 = no maximum

Adapter Trimming

Disable Adapter Trimming

Check to skip adapter removal

Auto-detect Adapters (default: enabled)

Automatically detects and removes common Illumina adapters
Recommended for most uses

Custom Adapter R1 / R2

Specify custom adapter sequences if auto-detect doesn't work
Enter sequences in ATGCN format
R2 field only active for paired-end data

Per-Read Quality Cutting

Cut Front (5' end cutting)

Removes low-quality bases from beginning of reads

Cut Tail (3' end cutting)

Removes low-quality bases from end of reads

Cut Right (alternative 3' algorithm)

Alternative per-read cutting algorithm

Cut Window Size (1-100, default: 4)

Sliding window size for quality cutting

Cut Mean Quality (0-40, default: 20)

Mean quality threshold within window
Windows below this quality are trimmed

Base Trimming (Fixed Positions)

Trim Front Bases (R1) (0-100)

Remove this many bases from 5' end of R1 reads

Trim Tail Bases (R1) (0-100)

Remove this many bases from 3' end of R1 reads

Trim Front Bases (R2) (0-100)

Remove this many bases from 5' end of R2 reads (paired-end only)

Trim Tail Bases (R2) (0-100)

Remove this many bases from 3' end of R2 reads (paired-end only)

PolyG/PolyX Trimming

Useful for Illumina NextSeq/NovaSeq (two-color chemistry).

Enable PolyG Tail Trimming

Removes polyG tails caused by dark cycles
Recommended for NextSeq/NovaSeq data

PolyG Minimum Length (1-100, default: 10)

Minimum length of polyG tail to trigger trimming

Enable PolyX Tail Trimming

Removes tails of any repeated base (AAAA, TTTT, etc.)

PolyX Minimum Length (1-100, default: 10)

Minimum length of polyX tail to trigger trimming

Paired-End Options

Only displayed when paired-end reads are detected.

Base Correction in Overlapped Regions

Uses read overlap to correct sequencing errors
Highly recommended for paired-end data with overlap

Overlap Length Requirement (1-1000, default: 30)

Minimum overlap length required for correction

Overlap Difference Limit (0-100, default: 5)

Maximum mismatches allowed in overlap region

UMI Processing

For data with Unique Molecular Identifiers.

Enable UMI Processing

Check to extract and process UMIs

UMI Location

None / Index1 / Index2 / Read1 / Read2 / Per Index / Per Read

UMI Length (0-100)

Length of UMI sequence in bases

UMI Prefix

Prefix to add before UMI in read name

UMI Skip Bases (0-100)

Bases to skip after UMI before sequence starts

Advanced Filtering

Enable Complexity Filter

Filters out low-complexity reads (e.g., AAAAA...)
Useful for removing artifacts

Complexity Threshold (0-100%, default: 30)

Percentage of different bases required

Enable Deduplication

Removes exact duplicate reads
Uses sequence-based matching

Output and Results

Output Documents

For each input sequence list, the plugin creates:

Filtered sequence list - Quality-controlled sequences
Processing log - Details of the fastp run (in console output)

Metadata Fields

Each output sequence list includes these custom fields (visible in Info tab):

Reads Before - Total reads in input
Reads After - Total reads after filtering
Bases Before - Total bases in input
Bases After - Total bases after filtering
Q20 Rate - Percentage of bases with quality ≥ 20
Q30 Rate - Percentage of bases with quality ≥ 30
GC Content - Percentage of G+C bases
Duplication Rate - Percentage of duplicate reads
Original Document - Reference to input file
Processing Date - When fastp was run
Tool Used - fastp or fastplong

Sequence List Description

Output sequence lists have updated descriptions noting they've been processed through fastp/fastplong.

HTML Reports

Fastp generates detailed HTML reports with:

Quality score distributions
Base content distributions
Read length distributions
Adapter content
Overrepresented sequences
Filtering statistics charts

Note: HTML reports are currently logged to the console with their file paths. Future versions will import them as Geneious documents.

Common Use Cases

RNA-Seq Quality Control

Recommended settings:

Tool: Fastp (auto-detect)
Quality: Qualified phred ≥ 20
Length: Min 50 bp
Adapter trimming: Auto-detect enabled
PolyG trimming: Enable if using NextSeq/NovaSeq

Whole Genome Sequencing (WGS)

Recommended settings:

Tool: Fastp (auto-detect)
Quality: Qualified phred ≥ 15
Length: Min 30 bp
Paired-end correction: Enable
Deduplication: Enable

PacBio/Nanopore Long Reads

Recommended settings:

Tool: Fastplong (or auto-detect)
Quality: Lower thresholds (reads are lower quality)
Length: Min 500 bp, adjust based on your application
Skip most short-read specific options

Amplicon Sequencing

Recommended settings:

Tool: Fastp
Quality: Qualified phred ≥ 20
Length: Match expected amplicon size ± tolerance
Complexity filter: Enable
Deduplication: Enable

Troubleshooting

Plugin doesn't appear in Geneious

Solution:

Verify plugin file is in correct directory:

ls ~/Library/Application\ Support/Geneious\ Prime/plugins/

Ensure file has .gplugin extension
Restart Geneious completely
Check Tools → Plugins for error messages

"Binary not found" or "Binary not executable" errors

Solution: The binaries are bundled in the plugin JAR. This error indicates extraction failed.

Check temp directory permissions:
```
ls -la /tmp/
```
Verify you have disk space
Check console for detailed error messages

No output generated

Possible causes:

All reads were filtered out (too strict settings)
Input files are not valid FASTQ format
Process was cancelled

Solution:

Check console output for fastp messages
Try less strict filtering parameters
Verify input files are FASTQ format

Paired-end not detected

Solution:

Verify file naming follows conventions (_R1/_R2 or _1/_2)
Rename files if needed
Process as single-end if pairing can't be determined

Plugin runs slowly

Solution:

Increase thread count (up to number of CPU cores)
Disable unnecessary filtering options
Process smaller batches of files

Performance Tips

Optimize Thread Count

Check your CPU core count: sysctl -n hw.ncpu
Set threads to 50-75% of available cores
Leave some cores for Geneious and OS

Process Files in Batches

Don't select too many sequence lists at once
Process 10-20 files per batch for best performance

Adjust Compression

Lower compression (1-3) = faster, larger files
Higher compression (7-9) = slower, smaller files
Default (4) is good balance

Monitor Resources

Watch Activity Monitor during processing
Fastp is CPU and I/O intensive
Ensure adequate free disk space (2-3x input size)

Known Limitations

Platform Support

Currently macOS only (Intel and Apple Silicon)
Linux and Windows support coming in future versions

HTML Report Import

HTML reports are generated but not yet imported as Geneious documents
Reports are saved to temp directory (path logged to console)
Future versions will display reports in Geneious

Quality Score Handling

Plugin generates default quality scores (Phred 40) if input lacks them
Some Geneious API quality methods may not be fully supported

Support and Contact

Getting Help

Check this documentation first
Review console output for error messages
Check Geneious log file: ~/Library/Application Support/Geneious Prime/geneious.log

Reporting Issues

When reporting issues, please include:

Geneious Prime version
Plugin version
macOS version and architecture (Intel/Apple Silicon)
Input file format and size
Error messages from console
Steps to reproduce

Developer

David Ho Email: [Your email] GitHub: [Your repository URL]

Version History

Version 1.0.0 (Current)

Initial release
Fastp 1.0.1 / Fastplong 0.4.1
macOS support (universal binary)
Full parameter coverage
Automatic paired-end detection
Automatic read type detection
Comprehensive metadata integration

Planned Features

Linux and Windows support
HTML report import and display
Parameter presets (RNA-seq, WGS, etc.)
Batch processing improvements
Advanced filtering options

License

This plugin integrates:

Fastp: MIT License (https://github.com/OpenGene/fastp)
Fastplong: MIT License (https://github.com/OpenGene/fastplong)

Plugin code: [Your license]

Acknowledgments

Shifu Chen and team for fastp/fastplong
Biomatters for Geneious Prime platform
Geneious Plugin Developer Kit

Quick Reference

Installation

cp build/FastpPlugin.gplugin ~/Library/Application\ Support/Geneious\ Prime/plugins/

Basic Usage

Select FASTQ sequence list(s)
Click "Run Fastp QC"
Configure options (or use defaults)
Click OK

Key Settings for Common Tasks

Quick QC (minimal filtering)

All filters: Default values
Threads: 4-8

Strict QC (maximum quality)

Qualified phred: 25
Unqualified %: 30
Min length: 50
Enable deduplication

Paired-end overlapping reads

Enable base correction
Overlap length: 30
Overlap diff limit: 5

End of Documentation

FilesExpand file tree

INSTALLATION_USAGE.md

Latest commit

History

INSTALLATION_USAGE.md

File metadata and controls

Fastp Plugin for Geneious Prime - Installation and Usage Guide

Overview

Features

Version Information

Installation

Step 1: Download the Plugin

Step 2: Install in Geneious

Step 3: Restart Geneious

Step 4: Verify Installation

Usage

Basic Workflow

Tool Selection

Auto-Detect Mode (Default)

Manual Selection

Paired-End Detection

Supported R1 Indicators

Supported R2 Indicators

Examples

Manual Pairing

Options Guide

Basic Options

Quality Filtering

Length Filtering

Adapter Trimming

Per-Read Quality Cutting

Base Trimming (Fixed Positions)

PolyG/PolyX Trimming

Paired-End Options

UMI Processing

Advanced Filtering

Output and Results

Output Documents

Metadata Fields

Sequence List Description

HTML Reports

Common Use Cases

RNA-Seq Quality Control

Whole Genome Sequencing (WGS)

PacBio/Nanopore Long Reads

Amplicon Sequencing

Troubleshooting

Plugin doesn't appear in Geneious

"Binary not found" or "Binary not executable" errors

No output generated

Paired-end not detected

Plugin runs slowly

Performance Tips

Optimize Thread Count

Process Files in Batches

Adjust Compression

Monitor Resources

Known Limitations

Platform Support

HTML Report Import

Quality Score Handling

Support and Contact

Getting Help

Reporting Issues

Developer

Version History

Version 1.0.0 (Current)

Planned Features

License

Acknowledgments

Quick Reference

Installation

Basic Usage

Key Settings for Common Tasks