-
Notifications
You must be signed in to change notification settings - Fork 1
Viewers comment model
The Viewers Comment Model is a two-stage sentiment classifier designed to predict the overall sentiment of a YouTube video based on its top-liked viewer comments.
-
Stage 1 β Comment-level Sentiment Classifier:
- Uses TF-IDF vectorization on comment text.
- Trained using XGBoost on hand-labeled comments (multi-class: 0β4).
- Balanced using SMOTE and RandomUnderSampler in an imbalanced-learn pipeline.
-
Stage 2 β Aggregator Classifier:
- Extracts the top 30 most-liked comments per video.
- Predicts each commentβs sentiment from Stage 1.
- Counts the number of each sentiment class (0β4).
- Uses a Random Forest to predict overall sentiment based on this count vector.
-
Input data:
allcomments_labled.csv8000 of the top 50 comments on different movie clips on youtube, each hand labelled with 0 to 4.- Columns:
text(comment),sentiment(label from 0β4)
- Columns:
-
Preprocessing:
- TF-IDF vectorization with trigrams, min_df=5, max_df=0.7, 30,000 features
- Class balancing with:
-
SMOTEfor minority oversampling -
RandomUnderSamplerfor majority class reduction
-
-
Model:
XGBClassifierwithscale_pos_weightset from computed class weights -
Output:
Multi-class sentiment prediction for each comment:-
0: Neutral -
1: Pleased -
2: Funny -
3: Fear -
4: Sad
-
-
Input data:
trainingimproved.csvcontains 100 data points and is generated from stage 1 model. It correlates a movie sentiment with an array of numbers that show how many comments of a certain emotion label exists within the top 30 most liked comments.- Columns:
Count_0toCount_4(number of each sentiment type from 30 comments) -
Actual Sentiment: final label for each video
- Columns:
-
Model:
RandomForestClassifierwithclass_weight='balanced' -
Output:
Final predicted sentiment of the video -
Evaluation:
Performance measured manually using 120 different movie clips, out of which 87% were predicted accurately. (https://docs.google.com/spreadsheets/d/17vP7-mdsTEPYhxXXGlTWZKuioEFOPvSmzO1iR6PwhoA/edit?usp=sharing).
- Use YouTube API to fetch top comments (via
commentThreads). - Filter top 30 comments by like count.
- Predict each comment's sentiment using the Stage 1 XGBoost model.
- Count the frequency of each class (0β4).
- Feed the counts to the Stage 2 Random Forest model.
- Interpret and return the final sentiment (e.g.,
"funny","fear","pleased").
| Class | Meaning |
|---|---|
| 0 | Neutral |
| 1 | Pleased |
| 2 | Funny |
| 3 | Fear |
| 4 | Sad |
π¬ Predicted sentiment for video 'Inside Out β Official Trailer': Funny