Bewerbung_NLP

General

This project contains machine learning models from the nlp area in order to build a binary classifier for a chinese blogpost dataset. The goal is to predict whether or not a blogpost is going to be censored purely based on its content. The original dataset can be downloaded from: https://gitlab.com/NLP4IF/nlp4-if-censorship-detection

Notebooks

This project contains a notebook for both BERT and DistilBERT for the dataset in its original form as well as an english version. Each of those notebooks can be easily configured by adjusting the parameters in the so called "notebook_parameters" dictionary at the beginning of each notebook. In case you want to save a trained model, feel free to do so by using the code presented in the "save model" section of each notebook. This will allow you to specifiy a foldername and save the models state dict as well as the state of the notebook_parameters and train/validation evaluation scores in 2 sperate files within that specified folder.

Other files

The EDA_toolkit is a selfwritten object-oriented data exploration tool wrapepd around an EDA course on Kaggle to offer a nice and quick-to-use API. It is used to explore the correlation between syntax and the posts likelihood of being cencored in the data exploration notebook. The translator.py was used to translate the original dataset from chinese to english.

Future

I am currently working on adding a metalearner using the best models i have trained from the different notebooks as base models and a shallow nn as a metalearner

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
__pycache__		__pycache__
data/english_datasets		data/english_datasets
notebooks		notebooks
trainings/trained_models		trainings/trained_models
.gitignore		.gitignore
README.md		README.md
eda_toolkit.py		eda_toolkit.py
translater.py		translater.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bewerbung_NLP

General

Notebooks

Other files

Future

About

Releases

Packages

Languages

NickDienemann/Bewerbung_NLP

Folders and files

Latest commit

History

Repository files navigation

Bewerbung_NLP

General

Notebooks

Other files

Future

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages