This repository contains scripts belonging to a thesis, 'Forecasting Smog Clouds With Deep Learning,' and a paper that followed out of it.
For direct contact or questions, please contact: [email protected] (and [email protected] in cc, in case access to the former is lost).
As the original codebase was quite a mess, the core and most essential bits are bundled here. The GRU and HGRU models are runnable, and upon supplementation of the data, the results can be reproduced.
The scripts cover a pipeline from online-available pollution and meteorological data through preprocessing to forecasting four constituents to smog clouds over two location in the Netherlands. This can be divided into two directories:
pipeline/
contains thepipeline
package which takes pollution data published by an initiative of the Dutch Government (including the National Institute for Public Health and the Environment | RIVM) and meteorological data published by the Royal Netherlands Meteorological Institute (KNMI), tidies it, inspects it through various metrics and visualisations, and, eventually, preprocesses it into a ready-to-use dataset. More information about the data below. -- It can be ran either from the command line withpreprocess.py
or from a notebookpreprocess.ipynb
.modelling/
contains the more freely structuredmodelling
package; it defines various classes and functions which come together in therun_models.ipynb
notebook to run the models.
Furthermore, the src/
folder's notebooks/
contains a few experimentative notebooks. The scripts contain a fair amount of comments for more explanation and specifics. And, lastly, plots.py
of both pipeline/
and modelling/
hosts quite some functions for plotting used in the thesis/paper.
The source location was chosen to be in Utrecht, near the headquarters of the KNMI, and the target location is in Breukelen, both in the Netherlands. Data was sampled from years between 2017 and 2023 and obtained from:
- https://data.rivm.nl/data/ - for the pollution data; and
- https://dataplatform.knmi.nl/ - for the meteorological data.
The (raw) data is not uploaded to this repository, but can be added to the /data folders for the code to run.
The dependencies used in this project are listed in requirements.txt, though only "ordinary" libraries such as numpy, pandas, and PyTorch are utilised.