A machine learning-based web application that automatically predicts the job category of a resume. The system uses Natural Language Processing (NLP) techniques and a trained Support Vector Machine (SVM) model to classify resumes into relevant professional domains.
This project is designed to assist recruiters and hiring platforms by automating the resume screening process and improving efficiency in candidate evaluation.
- Upload resumes in PDF, DOCX, or TXT formats
- Automatic text extraction from uploaded files
- Resume text preprocessing and cleaning using NLP techniques
- TF-IDF vectorization for feature extraction
- Machine learning-based classification using SVM model
- Instant prediction of job category
- Option to view extracted resume text for verification
- Interactive web interface built with Streamlit
- Algorithm: Support Vector Machine (SVM)
- Feature Extraction: TF-IDF Vectorization
- Label Encoding: Used to convert categorical labels into numerical format
- Trained on labeled resume dataset for multi-class classification
- User uploads a resume file (PDF, DOCX, or TXT)
- The system extracts raw text from the file
- Text is cleaned using preprocessing techniques (removal of URLs, symbols, special characters, etc.)
- TF-IDF vectorizer converts text into numerical features
- Trained SVM model predicts the most relevant job category
- The predicted category is displayed on the interface
- Python
- Streamlit (Web Application Framework)
- Scikit-learn (SVM Classifier, TF-IDF Vectorizer, Label Encoder)
- Natural Language Processing (Text preprocessing and cleaning)
- PyPDF2 (PDF text extraction)
- python-docx (DOCX file parsing)
- Regular Expressions (Text cleaning)