GitHub

My Seminar coursework at the University of Science - VNUHCM, 2024.

Topic: Handling Missing Data (Xử lý dữ liệu khuyết)
Language: Vietnamese
Supervisor: Dr. Hoang Van Ha

Overview

In this course, I try to understand, rephrase, and implement the neural network from the paper "NeuMiss networks: differentiable programming for supervised learning with missing values" (NeurIPS 2020). Though the paper still contains some small errors, this is still an interesting work that focuses on handling missing data in linear regression problems by using a neural network, so-called NeuMiss.

Repo's structure

notebooks/: Contain Jupyter notebooks to demonstrate some experiments
- Neumann_series_approximation.ipynb: Numerical experiment for matrix inverse approximation using Neumann series
- NeuMiss_network.ipynb: Reimplement NeuMiss network architecture and some experiments with different settings
- NeuMiss_sota_network.ipynb: Testing NeuMiss from authors' later work: "What’s a good imputation to predict with missing values?" (2021)
- NeuMiss_vs_Others.ipynb: Experimenting with other impute-then-regress methods
report/: Contain report's pdf and LaTeX code
slide/: Contain slide's pdf and LaTeX code

Further works

Due to my skill issues, and the shortage of time ⌛, I could not do and learn more in this course 🥲. However, here are some ideas/questions/todos I wish I had time to work on:

Make the network work on GPU
Research on better architecture from authors' later work
- Implement functionality for classification problem
The assumption for data (Gaussian), MNAR setting (Gaussian self-masking), and other assumptions are still strong/restrictive.
- Integrated with Random Matrix Theory (?) -> Remove the assumption for Gaussian data.
- Agnostic statistics/Agnostic learning (?)
Compare to more methods:
- Mixture of models: Gaussians Mixture Model (GMM)
- Hierarchical models
- Imputations: Optimal Transport, PCA, Matrix Completion,...
- Neural Network models: GAIN, MisGAN, MIWAE, StableMiss,...
- NeuMiss (Morvan et al. - 2021): NeuMiss can be used for non-linear models by joining it with a MLP,...
- NeuMISE: For missingness shift (or can be view as data drift/shift) -> Use for realtime application?
This network is considered a deep neural network. What if there's a small amount of data? Then which method is the best?
Experiment with more real-world datasets, with linear regression problems
How can NeuMiss be extended to work on large datasets?
How do outliers affect the model performance?
How do categorical variables and continuous variables affect the network?

Resources:

https://github.com/marineLM/NeuMiss (The architecture I reimplemented is the same as this)
https://github.com/marineLM/NeuMiss_sota
NeuMiss' poster
NeuMiss' slide
Supervised learning with missing values - Julie Josse
http://bigdatasummerinst.sph.umich.edu/wiki2022/images/1/1d/Missing_Data.pdf

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
notebooks		notebooks
report		report
slide		slide
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Repo's structure

Further works

Resources:

About

Languages

ngntrgduc/seminar

Folders and files

Latest commit

History

Repository files navigation

Overview

Repo's structure

Further works

Resources:

About

Resources

Stars

Watchers

Forks

Languages