PowerPlant

PowerPlant is a Python package that leverages deep learning to forecast the success of DNA extraction from herbarium samples. This tool is designed to assist botanical researchers in optimizing their selection of herbarium specimens for genomic studies.

Overview

PowerPlant employs a deep learning algorithm that integrates multiple data sources to predict ancient DNA extraction success:

Morphological features from scanned herbarium images
Sample color information
Metadata including sample age and locality
DNA quantity metrics from previously processed samples

Trained on a dataset of approximately 2,000 herbarium specimens from the PAFTOL project, spanning nearly two centuries (1832 to present), PowerPlant aims to revolutionize the approach to working with herbarium-derived DNA.

Requirements

Linux or macOS operating system.
Python 3.11 (later versions may not be fully supported by some dependencies).
GPU support recommended for optimal performance.

Installation

PowerPlant integrates several deep learning tools. While most are distributed as Python packages and can be installed via pip or conda, the image segmentation component relies on the PaddleSeg framework, which requires specific installation steps.

Clone the PowerPlant repository:

git clone https://github.com/sales-lab/powerplant.git

Create and activate a virtual environment:

cd powerplant
python3 -m venv .venv
source .venv/bin/activate

Install PowerPlant:

pip install ./

Install PaddleSeg:

cd vendor
sh install-paddleseg.sh

Note: The default installation is CPU-only. For GPU support, modify the script to install the appropriate PaddlePaddle and PaddleSeg variants for your hardware.

Download trained weights for segmentation:

cd checkpoints
curl --location --output segmentation-checkpoint.zip 'https://figshare.com/ndownloader/files/52146800'
unzip segmentation-checkpoint.zip

Usage

Image Segmentation

PowerPlant processes herbarium sheet images in JPEG format (with .jpg extension).

The package automatically performs two key operations on your images: - Segmentation to isolate plant material and remove extraneous elements such as annotations, labels, stamps, and envelopes. - Resizing of images so that the longest side is at most 1024 pixels long.

Copy your original herbarium sheet images to the images/original directory, then run the following command:

powerplant-segment

The processed images (segmented and resized) will be stored in the images/masked directory.

Prediction of DNA Yield

PowerPlant employs a convolutional neural network (CNN) coupled with metadata analysis to predict DNA yield from herbarium specimens. This dual-input model processes both segmented images and associated specimen data to generate accurate yield estimates.

To use this feature:

Retrieve the segmented herbarium images from the images/masked directory (generated during the preprocessing step described in the Image Segmentation section) and copy them into the dataset directory. Divide these images into training and test sets by placing them in the corresponding dataset/train and dataset/test subdirectories.
Prepare your metadata in a CSV file named metadata.csv and place it inside the dataset directory. This file should contain relevant information for each specimen, including:
- Specimen age;
- Location of sample collection;
- Taxonomic information.

An example file metadata/samples.csv is included in this repository to guide you in formatting your metadata correctly.

To train the prediction model, run the following command:

powerplant-train

This script processes the images and metadata from the dataset directory, trains the machine learning model, and saves the trained model in the checkpoints/prediction directory.

License

GNU Affero General Public License, version 3.

Contact

For questions and support, please open an issue on our GitHub repository.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
checkpoints/segment		checkpoints/segment
dataset		dataset
images		images
metadata		metadata
src/powerplant		src/powerplant
vendor		vendor
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PowerPlant

Overview

Requirements

Installation

Usage

Image Segmentation

Prediction of DNA Yield

License

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

sales-lab/powerplant

Folders and files

Latest commit

History

Repository files navigation

PowerPlant

Overview

Requirements

Installation

Usage

Image Segmentation

Prediction of DNA Yield

License

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages