Data modeling process for a real set with data from different types of plants. Different techniques were applied like PCA, PAM, HC Clustering. Weka was also used in this project.
The data set is a slightly modified version of a real-world plant data set. The data concerns the classification of plant species from different measurements taken from photographs of these plants. Each record consists of several attribute columns (input), and one class column (output) corresponding to the information about the type of plant.The attributes are markers that have been determined by assessment of different features of each plant, and the class variable is a provisional labelling of the type of plant. The entire data set consists of over 700 instances (plants studied). Some of the variables contain missing values, which are indicated by empty entries.