The Faketucky synthetic college-going analysis file contains high school and college outcome data for two graduating cohorts of approximately 40,000 students. There are no real children in the dataset, but it mirrors the relationships between variables present in real data.
Faketucky is a demonstration of using machine learning routines to develop synthetic data based on real datasets. It was developed as an offshoot of the Strategic Data Project's college-going diagnostic for Kentucky, using the R synthpop package. Synthetic datasets like Faketucky can be shared freely for teaching and collaboration, and they can be used to test hypotheses before applying for permission to use confidential data.
This repository contains the following files:
faketucky.dta
is a college-going analysis data file in Stata formatfaketucky.rda
is the same data in R formatfaketucky_codebook.txt
contains variable names and descriptions
These materials were originally authored by the Strategic Data Project.
OpenSDP is an online, public repository of analytic code, tools, and training intended to foster collaboration among education analysts and researchers in order to accelerate the improvement of our school systems. The community is hosted by the Strategic Data Project, an initiative of the Center for Education Policy Research at Harvard University. We welcome contributions and feedback.