The coasts package provides a comprehensive data pipeline for
processing and analyzing coastal fisheries data from the Western Indian
Ocean (WIO) region. It integrates GPS tracking data from Pelagic Data
Systems (PDS), regional fisheries metadata, and geospatial information
to support fisheries research and management.
This package is designed to handle the complete workflow for coastal fisheries data processing:
- Data Ingestion: Automated retrieval of GPS boat tracking data from Pelagic Data Systems
- Data Preprocessing: Spatial gridding and summarization of fishing activity patterns
- Data Export: Integration with MongoDB for data storage and geospatial analysis
- Metadata Management: Handling of device information and regional boundaries
The package supports data from Kenya and Zanzibar fisheries, with built-in currency conversion and regional harmonization capabilities.
- GPS Track Processing: Ingest and preprocess boat GPS tracks from PDS API
- Spatial Analysis: Grid-based summarization of fishing activity at multiple scales (100m-1km)
- Cloud Storage Integration: Seamless upload/download from Google Cloud Storage
- MongoDB Integration: Geospatial data storage with 2dsphere indexing
- Parallel Processing: Efficient handling of large datasets using parallel computation
- Automated Pipeline: GitHub Actions workflow for continuous data processing
You can install the development version of coasts from GitHub with:
# install.packages("pak")
pak::pak("WorldFishCenter/peskas.coasts")For local development, the package uses environment variables managed
through a .env file:
-
Copy the
.env.examplefile to.env:cp .env.example .env
-
Fill in your credentials in the
.envfile. Required environment variables include:PDS_TOKEN,PDS_SECRET,PDS_USERNAME,PDS_PASSWORD,PDS_CUSTOMER_ID: Pelagic Data Systems API credentialsGCP_SA_KEY: Google Cloud service account key (JSON format)MONGODB_CONNECTION_STRING_COASTS,MONGODB_CONNECTION_STRING_TRACKS: MongoDB connection stringsGOOGLE_SHEET_ID: Google Sheets ID for metadataAIRTABLE_TOKEN,AIRTABLE_BASE_ID_FRAME,AIRTABLE_BASE_ID_TRACKS_APP: Airtable credentials
The package automatically loads these environment variables when running locally.
For production environments, set these environment variables directly in
your deployment configuration (e.g., GitHub Secrets, Docker environment,
etc.). The package will use them automatically without requiring a
.env file.
library(coasts)
# Ingest GPS trip data from PDS
ingest_pds_trips()
# Ingest detailed GPS track data
ingest_pds_tracks()# Preprocess tracks into spatial grids
preprocess_pds_tracks(grid_size = 500) # 500m grid cells
# Available grid sizes: 100, 250, 500, 1000 meters
preprocess_pds_tracks(grid_size = 1000) # 1km grid cells# Export processed data to MongoDB
export_geos()# Get device metadata from Google Sheets
devices <- get_metadata(table = "devices")
# Get all metadata tables
all_metadata <- get_metadata()The package implements an automated data pipeline that runs every 2 days via GitHub Actions:
- Build Container: Creates a Docker container with R and all dependencies
- Ingest PDS Trips: Downloads trip metadata from PDS API
- Ingest PDS Tracks: Downloads detailed GPS tracks for each trip
- Preprocess Tracks: Creates spatial grid summaries of fishing activity
- Export Data: Uploads processed data to MongoDB with geospatial indexing
The pipeline produces several key data products:
- Trip Data: Basic trip information (start/end times, vessel info)
- Track Data: Detailed GPS points with speed, heading, and temporal information
- Grid Summaries: Spatial aggregations showing:
- Time spent fishing in each grid cell
- Average speed and vessel metrics
- Temporal patterns of activity
- Regional Metrics: Time series data with currency-converted economic indicators
The package supports multi-scale spatial analysis:
# Fine-scale analysis (100m grids)
preprocess_pds_tracks(grid_size = 100)
# Broad-scale patterns (1km grids)
preprocess_pds_tracks(grid_size = 1000)Grid summaries include: - time_spent_mins: Total fishing time per grid
cell - mean_speed: Average vessel speed - n_points: Number of GPS
observations - first_seen/last_seen: Temporal extent of activity
The package seamlessly integrates with Google Cloud Storage:
# Upload processed data
upload_cloud_file(
file = "processed_data.parquet",
provider = "google",
options = list(bucket = "your-bucket")
)
# Download data for analysis
download_cloud_file(
name = "pds_trips_v1.0.0.parquet",
provider = "google",
options = list(bucket = "your-bucket")
)Geospatial data is stored in MongoDB with appropriate indexing:
# Push data with geospatial indexing
mdb_collection_push(
data = spatial_data,
connection_string = "mongodb://...",
collection_name = "fishing_areas",
geo = TRUE # Creates 2dsphere index
)This package is part of the WorldFish Center’s Peskas initiative for small-scale fisheries monitoring. Contributions are welcome via GitHub issues and pull requests.
GPL (>= 3)