A Python script that automatically scans Gmail emails for resume PDF attachments and categorizes them based on your search criteria. Unqualified resumes are automatically labeled as "To Be Deleted" for easy cleanup.
- 🔍 Smart Resume Scanning: Searches through Gmail for emails with PDF attachments
- 📄 PDF Text Extraction: Extracts and analyzes text content from resume PDFs
- 🎯 Keyword Matching: Uses your search criteria to determine resume qualification
- 🏷️ Automatic Labeling: Labels unqualified resumes as "To Be Deleted"
- 📊 Detailed Reporting: Shows scanning statistics and results
- 🤖 AI-Powered Job Application Filtering: Pre-filters emails to only process job applications
- 💾 Memory-Efficient Batch Processing: Processes emails in batches to avoid memory issues
- Python 3.7 or higher
- Gmail account
- Google Cloud Project with Gmail API enabled
pip install -r requirements.txt- Go to Google Cloud Console
- Create a new project or select an existing one
- Enable the Gmail API:
- Go to "APIs & Services" > "Library"
- Search for "Gmail API"
- Click "Enable"
- Go to "APIs & Services" > "Credentials"
- Click "Create Credentials" > "OAuth 2.0 Client IDs"
- Choose "Desktop application" as the application type
- Download the credentials file and rename it to
credentials.json - Place
credentials.jsonin the same directory as the script
-
Copy the environment template:
cp .env.example .env
-
Edit
.envand add your API keys:GEMINI_API_KEY: Your Google Gemini API keyGOOGLE_APP_PASSWORD: Your Google App Password
-
Optional: Configure processing settings:
MAX_EMAILS_TO_PROCESS: Maximum emails to process (default: 1000)PDF_SEARCH_QUERY: Custom Gmail search query (default: has:attachment filename:pdf)
Security Note: The .env file is automatically ignored by Git to keep your API keys secure.
On first run, the script will:
- Open your browser for OAuth authentication
- Ask you to log in to your Google account
- Grant permissions to access your Gmail
- Save authentication tokens for future use
python gmail_resume_scanner.pyThe script will prompt you for search criteria. Examples:
"Find me a candidate who is skilled in Java""Looking for Python developers with Django experience""Need someone with React and TypeScript skills"
- Email Search: Finds all emails with PDF attachments
- PDF Processing: Extracts text from each PDF attachment
- Keyword Analysis: Compares resume content with your search criteria
- Qualification Check: Determines if resume matches requirements (50% keyword match threshold)
- Labeling: Marks unqualified resumes with "To Be Deleted" label
- Reporting: Shows detailed statistics of the scanning process
Gmail Resume Scanner
==================================================
Enter your search criteria: Find me a candidate who is skilled in Java
Scanning emails with criteria: 'Find me a candidate who is skilled in Java'
This may take a few minutes...
==================================================
SCANNING RESULTS
==================================================
Total emails scanned: 15
Emails marked for deletion: 8
Emails that qualify: 7
8 emails have been labeled as 'To Be Deleted'
The script uses a simple keyword matching algorithm. You can modify the check_resume_qualification method in gmail_resume_scanner.py to:
- Adjust the match threshold (currently 50%)
- Add more sophisticated NLP processing
- Implement fuzzy matching
- Add industry-specific keyword dictionaries
To change the label name from "To Be Deleted", modify this line in the scan_resumes method:
label_id = self.create_label_if_not_exists("Your Custom Label Name")-
"credentials.json not found"
- Download credentials from Google Cloud Console
- Ensure file is named
credentials.json - Place in the same directory as the script
-
Authentication Errors
- Delete
token.jsonand re-run the script - Check that Gmail API is enabled in Google Cloud Console
- Ensure OAuth consent screen is configured
- Delete
-
No PDFs Found
- Check that emails actually have PDF attachments
- Verify Gmail search query is working correctly
-
PDF Text Extraction Errors
- Some PDFs may be image-based or password-protected
- The script will skip these and continue processing others
The script provides detailed logging. Check the console output for:
- Authentication status
- Number of emails found
- PDF processing results
- Label creation and application status
- The script only requests Gmail modification permissions (no read access to email content)
- Authentication tokens are stored locally in
token.json - Never share your
credentials.jsonortoken.jsonfiles - Consider using environment variables for production use
- Only processes PDF attachments (not other formats)
- Simple keyword matching (not advanced NLP)
- Processes only the first PDF attachment per email
- Requires manual review of labeled emails before deletion
Feel free to enhance the script with:
- Support for other file formats (DOC, DOCX)
- Advanced NLP for better keyword matching
- Machine learning for resume classification
- Batch processing capabilities
- Email notification features
This project is open source. Feel free to modify and distribute as needed.