TJSR — Tracker for Job Search & Reporting
Continuously discover, classify, and match the latest job openings — then notify you via dashboard, Telegram, and email.
TJSR is a full-stack AI-powered job discovery platform that:
Scrapes career pages and public job APIs every 6 hours automatically
Classifies jobs as tech/non-tech using a fine-tuned DistilBERT model + keyword fallback
Matches jobs to your resume using hybrid keyword + semantic (Qdrant) scoring
Notifies you via in-app notifications, Telegram bot, and email digest
Lets you chat with an AI assistant (Ollama/RAG) about the job database
Visualises company–skill relationships in a Neo4j knowledge graph
Layer
Technology
Frontend
Next.js 16 (App Router), React, Tailwind v4, TanStack Query
Backend
FastAPI (async), SQLAlchemy 2.0, Pydantic v2
Primary DB
PostgreSQL 16
Vector DB
Qdrant (384-dim MiniLM embeddings)
Graph DB
Neo4j 5
Queue
Celery + Redis
LLM
Ollama (local, qwen3) with RAG
ML
Fine-tuned DistilBERT (tech/non-tech classifier)
Auth
Firebase Authentication
Storage
Firebase Storage (resumes)
10 scraper engines : BS4, Playwright, Selenium, Crawl4AI, Scrapling, Newspaper, Phenom, Google Careers, RSS/Atom, Sitemap Discovery
4 public job APIs : RemoteOK, Arbeitnow, The Muse, Adzuna — no URL needed
Scheduled scraping every 6 hours via Celery Beat
Fuzzy deduplication using PostgreSQL pg_trgm similarity
Auto-expiry : jobs older than 30 days are archived
Upload PDF/DOCX/TXT resume → extract 130+ tech skills
Hybrid matching : 60% keyword overlap + 40% Qdrant semantic similarity
Match explanations: matched skills + missing skills (gap analysis)
Per-user job alerts when a new job scores ≥40% skill overlap
RAG-powered chat with Ollama (local LLM)
Context: top 8 semantically similar jobs from Qdrant + DB fallback
Streaming responses, conversation history (Redis, 7-day TTL)
Telegram bot : daily digest, instant match alerts, chatbot responses
Email digest : SMTP-based, personalised per subscriber
In-app notifications : real-time bell icon with unread count
Live stats: total jobs, jobs today, matched jobs (week-over-week %)
Activity feed from logs + applications
Latest job matches with apply links
Docker & Docker Compose
Node.js 18+
Python 3.10+
git clone https://github.com/your-org/Project-TJSR.git
cd Project-TJSR
cp .env.example .env
# Edit .env with your credentials
docker-compose up -d # PostgreSQL, Redis, Neo4j, Qdrant
cd backend
pip install -r requirements.txt
playwright install chromium # for Playwright engine
uvicorn app.main:app --reload --port 8000
4. Celery worker + Beat (optional, for scheduled scraping)
cd backend
celery -A app.workers.celery_app worker --loglevel=info &
celery -A app.workers.celery_app beat --loglevel=info
cd frontend
npm install
npm run dev # http://localhost:3000
Variable
Description
Required
DATABASE_URL
PostgreSQL async URL
✅
SYNC_DATABASE_URL
PostgreSQL sync URL (Celery)
✅
REDIS_URL
Redis URL
✅
FIREBASE_SERVICE_ACCOUNT_KEY
Path to Firebase JSON key
✅
FIREBASE_PROJECT_ID
Firebase project ID
✅
FIREBASE_STORAGE_BUCKET
Firebase Storage bucket
✅
TELEGRAM_BOT_TOKEN
Telegram bot token
Optional
OLLAMA_BASE_URL
Ollama server URL
Optional
OLLAMA_MODEL
Model name (default: qwen3:latest)
Optional
QDRANT_HOST
Qdrant host
Optional
NEO4J_URI
Neo4j bolt URI
Optional
SMTP_HOST
SMTP server for email digests
Optional
SMTP_USER
SMTP username
Optional
SMTP_PASS
SMTP password
Optional
ADZUNA_APP_ID
Adzuna API ID (free tier)
Optional
ADZUNA_APP_KEY
Adzuna API key
Optional
FRONTEND_URL
Frontend URL for CORS
✅
Variable
Description
NEXT_PUBLIC_BACKEND_URL
Backend API URL
NEXT_PUBLIC_FIREBASE_*
Firebase web config
Project-TJSR/
├── backend/
│ └── app/
│ ├── api/v1/endpoints/ # FastAPI route handlers
│ ├── models/ # SQLAlchemy ORM models
│ ├── schemas/ # Pydantic schemas
│ ├── services/
│ │ ├── scraper/ # 10 scraper engines + manager
│ │ ├── classifier/ # DistilBERT + keyword classifier
│ │ ├── rag/ # Qdrant embeddings + chat engine
│ │ ├── graph/ # Neo4j knowledge graph
│ │ ├── telegram/ # Telegram bot
│ │ └── resume/ # Skill extraction
│ └── workers/ # Celery tasks + Beat schedule
├── frontend/
│ ├── app/dashboard/ # Next.js App Router pages
│ ├── components/dashboard/ # Sidebar, Topbar, JobCard, etc.
│ └── lib/ # API client, auth, theme context
├── Classifier_Model_training/ # DistilBERT fine-tuning scripts
└── docs/
├── MASTER_PLAN.md
└── CHANGELOG.md
Engine
Best For
auto
Let the system choose (tries bs4 → scrapling → playwright → ...)
bs4
Static HTML, JSON-LD structured data
playwright
JavaScript SPAs, stealth scraping
selenium
Legacy JS sites
crawl4ai
AI-assisted extraction
phenom
Phenom People ATS (NVIDIA, Comcast, etc.)
google_careers
google.com/about/careers
rss
RSS/Atom job feeds
sitemap
Auto-discover job URLs from sitemap.xml
See docs/CHANGELOG.md for the full version history.
GPL-3.0