Semantic segmentation of satellite imagery is a core task for mapping land cover, estimating agricultural yield, monitoring floods, or analysing wildfire scars. This repository demonstrates how to preprocess multi‑spectral Sentinel/Harmonized Landsat–Sentinel (HLS) data, train segmentation models with the Prithvi EO foundation models using TerraTorch, evaluate model performance, and generate georeferenced land‑ cover footprints. Prithvi‑EO models are transformer‑based geospatial foundation models trained on millions of spatio‑temporal satellite samples. They extend the Vision Transformer (ViT) architecture by replacing 2‑D patch embeddings and positional embeddings with 3‑D counterparts to support sequences of images over time and by adding learnable temporal and location embeddings. Prithvi‑EO‑2.0 models are trained on a global HLS dataset; the largest 600M‑parameter variant with temporal and location embeddings outperforms earlier Prithvi versions and other geospatial foundation models by roughly 8 percentage points across a range of remote‑sensing tasks.
This project is organised into several notebooks and utilities:
| Component | Purpose | Notes |
|---|---|---|
1. Pre_process_data.ipynb |
Download raw Sentinel/HLS images and masks, split them into manageable patches, encode them as NPZ or TFRecord files and divide the dataset into training/validation/test splits | Also creates directory structure for storing patches and exports data in TensorFlow‑friendly TFRecord format |
2. Training_and_Segmentation.ipynb |
Build a PyTorch dataset for NPZ/TFRecord patches, apply data augmentation with Albumentations, define a semantic segmentation model using the TerraTorch library with a Prithvi EO backbone, and run training | Includes a table describing different Prithvi‑EO model sizes (see below) and utility functions for making predictions and visualising results |
3. model_evaluation.ipynb |
Load a trained model and compute precision, recall, F1 score and Intersection‑over‑Union (IoU) on a validation set | Uses helper functions to run inference and aggregate metrics per class |
4. Create_footprint_prithvi.ipynb |
Demonstrate large‑scale inference by downloading big satellite scenes, cropping them into tiles, running the segmentation model on each tile and merging predictions back into a single GeoTIFF | Provides functions for padding/cropping, writing GeoTIFFs and merging predicted masks into a georeferenced footprint |
image_utils.py |
Helper routines for reading and displaying images, padding images, converting between colour spaces and computing simple metrics | Used by the notebooks to visualise data and predictions |
The TerraTorch‑based model definitions in this project support several pretrained Prithvi‑EO backbones. The table below summarises the main variants. Short descriptions are used in accordance with the guidelines above (see the training notebook for detailed descriptions).
| Model | Parameters | Key features |
|---|---|---|
| Prithvi‑EO‑1.0‑100M | 100 M | Original 2‑D ViT backbone trained on US‑only HLS data |
| Prithvi‑EO‑2.0‑100M | 100 M | Same size as 1.0 but pretrained on a global dataset |
| Prithvi‑EO‑2.0‑300M | 300 M | Larger backbone trained on global HLS data (no temporal/location embeddings) |
| Prithvi‑EO‑2.0‑300M‑TL | 300 M | Adds temporal & location embeddings to the 300 M model |
| Prithvi‑EO‑2.0‑600M | 600 M | Very large model without temporal/location embeddings |
| Prithvi‑EO‑2.0‑600M‑TL | 600 M | Largest model with both temporal and location embeddings; this variant outperforms earlier Prithvi versions across diverse tasks |
-
Clone this repository and change into the project directory:
git clone https://github.com/easare377/Prithvi-EO-Segmentation.git cd Prithvi‑EO‑Segmentation -
Create a Python environment (Python ≥ 3.8 is recommended) and install dependencies:
python -m venv .venv source .venv/bin/activate pip install -r requirements.txtA GPU with CUDA support is strongly recommended for training large models. The
requirements.txtfile lists PyTorch, segmentation models, GDAL/GeoPandas for geospatial operations and the TerraTorch library for working with Prithvi‑EO backbones. -
Download satellite data and masks. The notebooks assume access to Sentinel‑2 or Harmonized Landsat–Sentinel imagery and corresponding segmentation masks. You can obtain labelled datasets from public sources such as the Multi‑temporal Crop Classification dataset for the United States (hosted on Hugging Face) or your own annotated data. See the preprocessing notebook for examples of downloading and organising data.
The Pre_process_data.ipynb notebook walks through the steps required
to prepare remote‑sensing data for training:
- Download and organise images and masks into a structured folder.
- Patch and encode large tiles into manageable patches using the provided functions, saving them as NPZ arrays or TFRecords.
- Split the dataset into training, validation and test sets, and export them into dedicated folders.
- Convert to TFRecord (optional) for efficient streaming during training with TensorFlow.
The Training_and_Segmentation.ipynb notebook shows how to train a
semantic segmentation model using PyTorch and TerraTorch:
- Load patches via a custom
Datasetclass that normalises spectral bands and optionally pads images to square shapes. - Augment training data with Albumentations (random flips, rotations and colour jitter).
- Select a backbone from the table above and instantiate a segmentation model. The model uses the Prithvi‑EO encoder and a lightweight decoder (e.g., UpperNet, FCN ,U‑Net‑style). Pretrained weights can be loaded for transfer learning or training from scratch.
- Train the model using your favourite optimiser. The notebook includes a simple training loop with checkpointing and utilities for printing the number of parameters and running a forward pass.
Use model_evaluation.ipynb to assess how well your model performs on
unseen data. The notebook demonstrates:
- Loading the trained model and preparing a validation DataLoader.
- Computing metrics such as precision, recall, F1 score and IoU per class.
- Visualising predictions using the helper functions provided in
image_utils.py.
Large‑scale inference on full‑resolution scenes is handled in
Create_footprint_prithvi.ipynb. This notebook explains how to:
- Download big satellite scenes, e.g., from AWS or the NASA HLS repository.
- Crop and pad each scene into tiles, run the segmentation model on each tile and write GeoTIFFs for individual predictions.
- Merge predictions into a single georeferenced raster or vector footprint using GDAL and GeoPandas. This allows you to create continuous land‑cover maps or footprints for a region of interest.
Below are example figures generated using this pipeline, including real-world environmental impacts and model outputs.
Aerial view of an illegal mining (galamsey) site showing deforestation, exposed soil, and contaminated water pools. These are typical signatures the segmentation model learns to detect.
Left: Sentinel-2 RGB composite. Right: Predicted multiclass segmentation mask highlighting extracted features (e.g., drainage patterns, disturbed land, or mining traces).
If you use the Prithvi‑EO models in your work, please cite the official Prithvi‑EO‑2.0 paper and associated resources. The architecture uses masked autoencoding with 3‑D patch embeddings and includes time and location encodings. The largest model variant exceeds the performance of earlier geospatial foundation models by approximately 8 percentage points across benchmarks.
Contributions are welcome! If you find a bug, have a feature request or want to add a new notebook, please open an issue or submit a pull request. When contributing code, follow the existing coding style and provide clear docstrings and comments.
This project is distributed under the MIT License. See the upstream Prithvi‑EO repositories and TerraTorch for additional licensing details.

