Skip to content

Latest commit

 

History

History
9 lines (9 loc) · 667 Bytes

README.md

File metadata and controls

9 lines (9 loc) · 667 Bytes

NLP

About the dataset

The AG's news topic classification dataset is constructed by choosing 4 largest classes from the original corpus. Each class contains 30,000 training samples and '1,900' testing samples. The total number of training samples is 120,000 and testing 7,600.

task1

we do preprocessing at data and Calculate the probabilities of N_Grams

task2

finally, we end the project by doing :

  • Feature extraction ( apply all 3 algorithms with the classifier and choose the best according to the model's accuracy)
  • ML classifier ( apply any ML classifier SVM, NB, DT, RF, etc.) and evaluation metrics ( including model's accuracy, confusion matrix )