Tweet-Sentiment-Extraction

Authors: Joshua Zwiebel and Teva Zanker

Tweet-Sentiment-Extraction

A repository detailing work involved in the kaggle competition Tweet Sentiment Extraction

Tweet-Sentiment-Extraction

A repository detailing work involved in the kaggle competition Tweet Sentiment Extraction

Purpose and Overview

The purpose of this repository is twofold. Firstly it is to compete in the Tweet Sentiment Extraction kaggle competition. Secondly, it is to document all the thought process and mistakes made while adhering to a data science workflow. You will find many cells that throw runtime errors. This is by design we wanted to keep the cells with errors and demonstrate how we fixed the problems as opposed to working backwards and leaving the reader the impression that everything worked on first attempt. We found there is a lot of value to be had in seeing where mistakes were made.

The Data

The data being processed was as follows (it can be found within the repository for further inspection). A single sample contains up to 4 data points

The text of the tweet
a sentiment of the tweet (i.e positive)
a unique identifier for the tweet
the keywords that exemplify the sentiment

They keyword and location of the tweet had the potential to be blank. The training data contained approx ~27000 samples and more detail into the composition of the data can be found within the exploration folder.

Preprocessing

A couple of different strategies were taken during the preprocessing phase. Cleaning of the data was important. This step involved stripping punctuation and and suffixes from words. As the data came from tweets many of the words were mispelled. We employed pyspellchecker in order to spellcheck all of the tweets simultaneously. The computation was lengthy but completed after running overnight. We then had to employ feature engineering. We knew that because the selected text was taken right from the original tweet it would be a good idea to instead predict the indices of the words that would be usued for the selected text as opposed to predicting the words themselves

Models

Resources and Learning

Results

Future Work and Concessions made

How to use any of the Code

Install the requirements.txt and start running the notebooks! If you encounter any issues please post an issue or email me and [email protected]. Starting with the exploration folder is likely the best place to start.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.deepnote		.deepnote
tweet_sentiment_extraction		tweet_sentiment_extraction
.gitignore		.gitignore
README.md		README.md
init.ipynb		init.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tweet-Sentiment-Extraction

Tweet-Sentiment-Extraction

Purpose and Overview

The Data

Preprocessing

Models

Resources and Learning

Results

Future Work and Concessions made

How to use any of the Code

About

Releases

Packages

Contributors 3

Languages

joshzwiebel/Tweet-Sentiment-Extraction

Folders and files

Latest commit

History

Repository files navigation

Tweet-Sentiment-Extraction

Tweet-Sentiment-Extraction

Purpose and Overview

The Data

Preprocessing

Models

Resources and Learning

Results

Future Work and Concessions made

How to use any of the Code

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages