AML_Proj

File spec

・Balancing data.ipynb: to balance the data in terms of class: [00:49, 2/18/2018] This is code to generate 5 training files. Each file has the same percentage of toxic comments and non toxic comment. train1, train 2, train 3,train 4 and train 5 have different the same toxic comments but different nn-toxic comments.

・FeatureSelectionbyInformationGain.ipynb: code to calculate the frequency of words and information gain

・TextPreprocessing.ipynb: code to pre-process data

・RandomForest_validation_all(biased).ipynb: code to do primitive random forest using bag of words and confirm the confusion matrix and ROC curve.

Run these files as this order:

Data Balancing
Data Cleaning
Models
Submit to Kaggel

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Data_Cleaning_QI		Data_Cleaning_QI
BalancingData.ipynb		BalancingData.ipynb
Data_Cleaning_Fay.ipynb		Data_Cleaning_Fay.ipynb
Explore_Data.ipynb		Explore_Data.ipynb
Feature.ipynb		Feature.ipynb
FeatureExtractionbyInformation_gain.ipynb		FeatureExtractionbyInformation_gain.ipynb
Generate new dataset.ipynb		Generate new dataset.ipynb
Model.ipynb		Model.ipynb
README.md		README.md
RandomForest_Validation-all(biased).ipynb		RandomForest_Validation-all(biased).ipynb
TextPreprocessing.ipynb		TextPreprocessing.ipynb
code of feature selection.ipynb		code of feature selection.ipynb
train1.csv		train1.csv
train2.csv		train2.csv
train3.csv		train3.csv
train4.csv		train4.csv
train5.csv		train5.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AML_Proj

About

Releases

Packages

Contributors 4

Languages

AyahSoufan/AML_Proj

Folders and files

Latest commit

History

Repository files navigation

AML_Proj

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages