Assignment 1 - POS tagging

Task description

Part-of-Speech (POS) tagging is an NLP task that involves assigning a grammatical category (part of speech) to each word in a sentence. These categories include nouns, verbs, adjectives, adverbs, pronouns, etc.

Workflow

In this task we train a custom model on a labelled dataset. This dataset consists of a set of documents, where each word has its POS label associated. We train our model on a subset of this dataset, and test it on another subset.

After downloading the corpus (Penn Treebank), we create a vocabulary of words leveraging GloVe embeddings. We use such vocabulary to tokenize our dataset and as a lookup table (or embedding matrix) in the Embedding layer of our neural model.

We try 3 different models. All of them include recurrent neural networks.

Baseline: Embedding layer, Bidirectional LSTM, Dense layer
Model 1: similar to the baseline, but with 2 LSTMs instead of 1
Model 2: similar to the baseline, but with 2 dense layers

All of the models are comparable in term of parameters (20 millions).

The embedding layers are pre-initialized with GloVe embeddings and kept frozen during training.

We train our model using categorical crossentropy loss and optmizing with Adam (lr=0.001). We also plot the macro f1 score, precision and recall during training.

During evaluation, we calculate the f1 score and other metrics for each label individually on the test set.

Interestingly, we notice that rare labels - tags that appear very rarely on the train set - have zero score both on precision and recall. We conjecture that since such labels are so few in the train set, the model struggles a lot to learn them, and as a consequence it scores very poorly on the validation.

Licence

MIT license

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
img		img
README.md		README.md
Report.pdf		Report.pdf
pos_tagging.ipynb		pos_tagging.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Assignment 1 - POS tagging

Task description

Workflow

Licence

About

Releases

Packages

Languages

marcosolime/pos-tagging

Folders and files

Latest commit

History

Repository files navigation

Assignment 1 - POS tagging

Task description

Workflow

Licence

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages