Skip to content

AyahSoufan/AML_Proj

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AML_Proj

File spec

・Balancing data.ipynb: to balance the data in terms of class: [00:49, 2/18/2018] This is code to generate 5 training files. Each file has the same percentage of toxic comments and non toxic comment. train1, train 2, train 3,train 4 and train 5 have different the same toxic comments but different nn-toxic comments.

・FeatureSelectionbyInformationGain.ipynb: code to calculate the frequency of words and information gain

・TextPreprocessing.ipynb: code to pre-process data

・RandomForest_validation_all(biased).ipynb: code to do primitive random forest using bag of words and confirm the confusion matrix and ROC curve.

Run these files as this order:

  • Data Balancing
  • Data Cleaning
  • Models
  • Submit to Kaggel

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •