Explanation of some of these scripts can be found on my weblog. Below is the quick guide to getting them running.
-
Download the open bearing dataset.
-
Move the
bearing_IMSdirectory to the same level as thebearing_snippetsdirectory ORModify the first line of the script to point basedir to the
bearing_IMS/1st_testdirectory. -
Run
basic_feature_extraction.R! This writes the basic feature vectors tob1.csvthroughb4.csv.
-
For the first time through, run
basic_feature_extraction.Rto generate the features. Thereafter, the features are written to filesb1.csv,b2.csv,b3.csv, andb4.csv, and you can go straight to step 2. -
Run
basic_feature_graphing.R!
-
Perform steps 1 and 2 of
basic_feature_extraction.R. -
Run
more_features.R! This writes the full feature vectors tob1_all.csvthroughb4_all.csv.
-
Run
more_features.R, so the features are stored in filesb1_all.csvthroughb4_all.csv. -
Run
feature_correlation.R, to see features with high correlation.
-
Run
feature_correlation.Rto output sets of features with high correlation. -
Run
optimise.rbto select the minimal set of uncorrelated features.
-
Run
more_features.R, so the features are stored in filesb1_all.csvthroughb4_all.csv. -
If desired, modify line 25 of
feature_information.Rto include only the features you are interested in (e.g. after runningoptimise.rband finding a different minimal set). -
Run
feature_information.Rto generate an interesting graph! It also writes the full feature vector plus state labels toall_bearings.csv, and the best 14 features plus state labels toall_bearings_best_fv.csv.
-
Run
feature_information.R, so the minimised set of features are written toall_bearings_best_fv.csv. -
Run
kmeans.Rto select the best k-means model! It also writes it tokmeans.obj.
-
Run
feature_information.R, so the minimised set of features are written toall_bearings_best_fv.csv. -
Run
kmeans.R, so the best k-means model is written tokmeans.obj. -
Visualise the results using the graphs generated by
kmeans.R. Alter the filename on line 7 to match the best k-means model. If needed, alter the cluster numbers or class labels inrelabel.Rto better match the data. -
Run
relabel.Rto modify the state labels. It also plots a state transition graph, and writes the new data toall_bearings_relabelled.csv.
-
Requires features and labels in
all_bearings_relabelled.csv, which can be generated byrelabel.R. -
Run
training_set.Rto randomly pick 70% of the data rows as a training set. The row numbers are written totrain.rows.csv.
-
Requires
train.rows.csvandall_bearings_relabelled.csv(which can be generated by earlier scripts). -
Run
ann_mlp.Rto train and test an array of MLP ANNs with varying parameters. Parameters include:- Hidden neurons in the range 2 to 30 inclusive
- Different class weightings to handle uneven counts of class labels
- Data normalisation, neuron range, and neither to handle wide feature range disparities
-
The table of results is written to
ann.results.csv, all trained models are written toann.models.obj, and the best (highest accuracy) model is written tobest.ann.obj.
-
Requires
train.rows.csvandall_bearings_relabelled.csv(which can be generated by earlier scripts). -
Run
rpart.Rto train and test an array of RPART decision trees. Different class weightings are applied to handle uneven counts of class labels. -
The table of results is written to
rpart.results.csv, all trained models are written torpart.models.obj, and the best (highest accuracy) model is written tobest.rpart.obj.
-
Requires
train.rows.csvandall_bearings_relabelled.csv(which can be generated by earlier scripts). -
Run
knn.Rto train and test an array of k-nearest neighbour weighted classifiers with varying parameters. Parameters include:- Different kernels on the weightings (all 10 in the
kknnlibrary) - All k values from {1, 3, 5, 10, 15, 20, 35, 50}
- Different kernels on the weightings (all 10 in the
-
The table of results is written to
knn.results.csv, all trained models are written toknn.models.obj, and the best (highest accuracy) model is written tobest.knn.obj.
-
Requires
train.rows.csvandall_bearings_relabelled.csv(which can be generated by earlier scripts). -
Run
svm.Rto train and test an array of Support Vector Machine classifiers with varying parameters. Parameters include:- Gamma from {10^-6, 10^-5, 10^-4, 10^-3, 10^-2, 10^-1}
- Cost from {10^0, 10^1, 10^2, 10^3}
- Different class weightings to handle uneven counts of class labels
-
These gamma and cost values correspond to a rough grid search. Finer search should be performed in the region of the pair with highest accuracy.
-
The table of results is written to
svm.results.csv, all trained models are written tosvm.models.obj, and the best (highest accuracy) model is written tobest.svm.obj.