GitHub - steno-aarhus/fakeregs: A collection of scripts to synthesize register data and run a data processing pipeline to illustrate the most common tasks encountered when conducting register-based research

Overview

What this repo (aims to) contain:

An introduction to using Arrow and DuckDB in R
A guide covering some good practices to consider when working with Danish register data
A collection of references to official documentation for commonly used registers
A loosely defined list of handy tips for junior researchers doing data analysis in R for the first time
A walkthrough of a workflow to illustrate common tasks encountered when processing Danish register data
Functions to generate fake register data mimicing commonly used Danish registers

What this repo does NOT contain:

A general beginners intro to the basics of R.

How to use

To get an intro to the why's and how's of high-performance data analysis with Arrow and DuckDB, explore the vignettes
To get familiar working with Danish register data in practice: follow along the common_tasks_dplyr.qmd-vignette
To generate fake register data for other purposes:
- Source generate_data.R and run the functions as needed

Feature roadmap

What is currently covered and planned

Vignette contents

introduction.qmd: background information
- Data storage formats
- Processing engines: getting started with Arrow and DuckDB in R via dplyr
good_practice.qmd-vignette: tips for working with register data
- Thinking ahead: planning analyses to increase efficiency
- Good habits
tips_tricks.qmd: tips & tricks for quality of life
- Tips and tricks

Fake registers supported

Data management operations in walkthrough

Specific tasks

General tasks
- Read Parquet file from disk into R and convert to DuckDB
- Join register data
- Export results
  - Using dput() to convert objects to R code to facilitate exporting from DST
Specific tasks
- Clean bef
- Clean udda
- Clean faik
- Clean akm
- Clean lmdb
- Clean sysi/sssy
- Clean lab
- Clean lpr2
  - lpr_adm
  - lpr_diag
- Clean lpr3
  - lpr_a_kontakt
  - lpr_a_diagnose
  - To be decided: include lpr_f

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github/workflows		.github/workflows
R		R
dev		dev
man		man
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
README.md		README.md
fakeregs.Rproj		fakeregs.Rproj
justfile		justfile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

How to use

Feature roadmap

Vignette contents

Fake registers supported

Data management operations in walkthrough

Specific tasks

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Overview

How to use

Feature roadmap

Vignette contents

Fake registers supported

Data management operations in walkthrough

Specific tasks

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages