-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathREADME.Rmd
More file actions
76 lines (49 loc) · 4.69 KB
/
README.Rmd
File metadata and controls
76 lines (49 loc) · 4.69 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# peskas.kenya.data.pipeline <img src="man/figures/logo.png" align="right" />
<!-- badges: start -->
[](https://github.com/WorldFishCenter/peskas.kenya.data.pipeline/actions/workflows/R-CMD-check.yaml)
[](https://app.codecov.io/gh/WorldFishCenter/peskas.kenya.data.pipeline)
<!-- badges: end -->
The goal of peskas.kenya.data.pipeline is to implement, deploy, and execute the data and modelling pipelines that underpin Peskas in Kenya, a partnership between [WorldFish](https://worldfishcenter.org/) and [Wildlife Conservation Society](https://www.wcs.org/).
## The pipeline is an R package
peskas.kenya.data.pipeline is structured as an R package because it makes it easier to write production-grade software. Specifically, structuring the code as an R package allows us to:
- better handle system and package dependencies,
- forces us to split the code into functions,
- makes it easier to document the code, and
- makes it easier to test the code
We make heavy use of [tidyverse style conventions](https://engineering-shiny.org) and the [usethis](https://usethis.r-lib.org) package to automate tasks during project setup and deployment.
For more information about the rationale of structuring the pipeline as a package check [Chapter 3](https://engineering-shiny.org/structuring-project.html#structuring-your-app_) in [*Engineering Production-Grade Shiny Apps*](https://engineering-shiny.org). The book is focused on Shiny applications but the rationale also applies to data pipelines and production-ready code in general.
## How the pipeline works
The pipeline is composed of different modules:
1. Data Collection: On site fishing landing surveys and continuous, solar-powered GPS vessel trackers to collect and send data in near real-time, alongside fishery metadata for a thorough data-gathering process.
2. Pre-processing: Data formatting, shaping, and standardisation to prepare the raw data for analysis.
3. Validation: Outlier detection and error identification, and includes an alert system to maintain data quality.
4. Analytics: Modelling fisheries indicators, nutritional characterization, and data mining to extract valuable insights.
5. Data export: Automated dissemination of processed and analysed fisheries data to ensure accessibility and comprehension. This involves restructuring data for dashboard integration and open publication.
6. isualisation: Tools for data reporting and sharing of insights through a comprehensive dedicated web app dashboard (not hosted in this repository).
See [Peskas: Automated analytics for small-scale, data-deficient fisheries](https://www.researchsquare.com/article/rs-4386336/v1) for further details.
## Getting Started
This package uses a configuration file [`config.yml`](inst/config.yml) to manage environment-specific settings and connections. To get started, familiarize yourself with the package structure, particularly the [`R`](R) directory where the main functions are located.
Each function typically reads the configuration using `read_config()` to access necessary parameters. To work on this package locally, you'll need to set up environment variables using a `.env` file. Copy `.env.example` to `.env` and fill in your actual credentials. The package will automatically load these environment variables when reading the configuration. Remember to run `devtools::load_all()` when testing changes locally. If you're new to R package development, consider reviewing the [*R packages*](https://r-pkgs.org) book by Hadley Wickham and Jenny Brian.
## Quick Guide for Contributors
To keep our repository clean and efficient, please keep these guidelines in mind:
- Always work on a new branch, not directly on main.
- Write clear, concise commit messages.
- Avoid storing intermediate and garbage files, especially in the root folder.
- Strive for soft-coded solutions.
- Maintain consistent code style throughout the project.
- Document your code well - future you (and others) will thank you.
- Test your changes thoroughly before submitting a pull request.
- Keep your fork synced with the main repository.
These practices help us maintain a clean, efficient codebase that's easier for everyone to work with. For more detailed guidelines, check out our [CONTRIBUTING.md](.github/CONTRIBUTING.md) file.