coasts

The coasts package provides a comprehensive data pipeline for processing and analyzing coastal fisheries data from the Western Indian Ocean (WIO) region. It integrates GPS tracking data from Pelagic Data Systems (PDS), regional fisheries metadata, and geospatial information to support fisheries research and management.

Overview

This package is designed to handle the complete workflow for coastal fisheries data processing:

Data Ingestion: Automated retrieval of GPS boat tracking data from Pelagic Data Systems
Data Preprocessing: Spatial gridding and summarization of fishing activity patterns
Data Export: Integration with MongoDB for data storage and geospatial analysis
Metadata Management: Handling of device information and regional boundaries

The package supports data from Kenya and Zanzibar fisheries, with built-in currency conversion and regional harmonization capabilities.

Key Features

GPS Track Processing: Ingest and preprocess boat GPS tracks from PDS API
Spatial Analysis: Grid-based summarization of fishing activity at multiple scales (100m-1km)
Cloud Storage Integration: Seamless upload/download from Google Cloud Storage
MongoDB Integration: Geospatial data storage with 2dsphere indexing
Parallel Processing: Efficient handling of large datasets using parallel computation
Automated Pipeline: GitHub Actions workflow for continuous data processing

Installation

You can install the development version of coasts from GitHub with:

# install.packages("pak")
pak::pak("WorldFishCenter/peskas.coasts")

Configuration

Local Development Setup

For local development, the package uses environment variables managed through a .env file:

Copy the .env.example file to .env:
```
cp .env.example .env
```
Fill in your credentials in the .env file. Required environment variables include:
- PDS_TOKEN, PDS_SECRET, PDS_USERNAME, PDS_PASSWORD, PDS_CUSTOMER_ID: Pelagic Data Systems API credentials
- GCP_SA_KEY: Google Cloud service account key (JSON format)
- MONGODB_CONNECTION_STRING_COASTS, MONGODB_CONNECTION_STRING_TRACKS: MongoDB connection strings
- GOOGLE_SHEET_ID: Google Sheets ID for metadata
- AIRTABLE_TOKEN, AIRTABLE_BASE_ID_FRAME, AIRTABLE_BASE_ID_TRACKS_APP: Airtable credentials

The package automatically loads these environment variables when running locally.

Production Deployment

For production environments, set these environment variables directly in your deployment configuration (e.g., GitHub Secrets, Docker environment, etc.). The package will use them automatically without requiring a .env file.

Main Functions

Data Ingestion

library(coasts)

# Ingest GPS trip data from PDS
ingest_pds_trips()

# Ingest detailed GPS track data
ingest_pds_tracks()

Data Preprocessing

# Preprocess tracks into spatial grids
preprocess_pds_tracks(grid_size = 500)  # 500m grid cells

# Available grid sizes: 100, 250, 500, 1000 meters
preprocess_pds_tracks(grid_size = 1000)  # 1km grid cells

Data Export

# Export processed data to MongoDB
export_geos()

Metadata Management

# Get device metadata from Google Sheets
devices <- get_metadata(table = "devices")

# Get all metadata tables
all_metadata <- get_metadata()

Data Pipeline Workflow

The package implements an automated data pipeline that runs every 2 days via GitHub Actions:

Build Container: Creates a Docker container with R and all dependencies
Ingest PDS Trips: Downloads trip metadata from PDS API
Ingest PDS Tracks: Downloads detailed GPS tracks for each trip
Preprocess Tracks: Creates spatial grid summaries of fishing activity
Export Data: Uploads processed data to MongoDB with geospatial indexing

Data Products

The pipeline produces several key data products:

Trip Data: Basic trip information (start/end times, vessel info)
Track Data: Detailed GPS points with speed, heading, and temporal information
Grid Summaries: Spatial aggregations showing:
- Time spent fishing in each grid cell
- Average speed and vessel metrics
- Temporal patterns of activity
Regional Metrics: Time series data with currency-converted economic indicators

Spatial Analysis Capabilities

The package supports multi-scale spatial analysis:

# Fine-scale analysis (100m grids)
preprocess_pds_tracks(grid_size = 100)

# Broad-scale patterns (1km grids)  
preprocess_pds_tracks(grid_size = 1000)

Grid summaries include: - time_spent_mins: Total fishing time per grid cell - mean_speed: Average vessel speed - n_points: Number of GPS observations - first_seen/last_seen: Temporal extent of activity

Cloud Storage Integration

The package seamlessly integrates with Google Cloud Storage:

# Upload processed data
upload_cloud_file(
  file = "processed_data.parquet",
  provider = "google",
  options = list(bucket = "your-bucket")
)

# Download data for analysis
download_cloud_file(
  name = "pds_trips_v1.0.0.parquet",
  provider = "google", 
  options = list(bucket = "your-bucket")
)

MongoDB Integration

Geospatial data is stored in MongoDB with appropriate indexing:

# Push data with geospatial indexing
mdb_collection_push(
  data = spatial_data,
  connection_string = "mongodb://...",
  collection_name = "fishing_areas",
  geo = TRUE  # Creates 2dsphere index
)

Contributing

This package is part of the WorldFish Center’s Peskas initiative for small-scale fisheries monitoring. Contributions are welcome via GitHub issues and pull requests.

License

GPL (>= 3)

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.github		.github
R		R
inst		inst
man		man
.Rbuildignore		.Rbuildignore
.Rprofile		.Rprofile
.gitignore		.gitignore
.nojekyll		.nojekyll
DESCRIPTION		DESCRIPTION
Dockerfile.prod		Dockerfile.prod
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.Rmd		README.Rmd
README.html		README.html
README.md		README.md
_pkgdown.yml		_pkgdown.yml
coasts.Rproj		coasts.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

coasts

Overview

Key Features

Installation

Configuration

Local Development Setup

Production Deployment

Main Functions

Data Ingestion

Data Preprocessing

Data Export

Metadata Management

Data Pipeline Workflow

Data Products

Spatial Analysis Capabilities

Cloud Storage Integration

MongoDB Integration

Contributing

License

About

Uh oh!

Releases 5

Packages

Uh oh!

Uh oh!

Languages

License

WorldFishCenter/peskas.coasts

Folders and files

Latest commit

History

Repository files navigation

coasts

Overview

Key Features

Installation

Configuration

Local Development Setup

Production Deployment

Main Functions

Data Ingestion

Data Preprocessing

Data Export

Metadata Management

Data Pipeline Workflow

Data Products

Spatial Analysis Capabilities

Cloud Storage Integration

MongoDB Integration

Contributing

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Uh oh!

Languages

Packages