NLP

About the dataset

The AG's news topic classification dataset is constructed by choosing 4 largest classes from the original corpus. Each class contains 30,000 training samples and '1,900' testing samples. The total number of training samples is 120,000 and testing 7,600.

task1

we do preprocessing at data and Calculate the probabilities of N_Grams

task2

finally, we end the project by doing :

Feature extraction ( apply all 3 algorithms with the classifier and choose the best according to the model's accuracy)
ML classifier ( apply any ML classifier SVM, NB, DT, RF, etc.) and evaluation metrics ( including model's accuracy, confusion matrix )

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

NLP

About the dataset

task1

task2

Files

README.md

Latest commit

History

README.md

File metadata and controls

NLP

About the dataset

task1

task2