Text Mining used for a classification task using PubMed unstructured data.
Text Mining tools were used for classification task. We considered unstructured medical data for two different topics - Human Immunodeficiency Virus (HIV) and human papilloma virus (HPV) taken from the National Center for Biotechnology Information (NCBI) databases using the R package RISmed. Text mining processing strategies were applied. We considered the Document Term Matrix struture and performed dimensional using information gain. Results show an accuracy of 81.3%-94.6% when predicting the class of documents.