A Next.js application that uses advanced AI technology to analyze, interpret, and extract insights from professional documents. Features employee/employer authentication, document upload and management, AI-powered chat, and comprehensive predictive document analysis that identifies missing documents, provides recommendations, and suggests related content.
PDR AI is designed as one connected loop: capture documents β make them searchable β ask questions β spot gaps β act β learn.
-
Authenticate & pick a workspace (Employer / Employee)
Clerk handles auth and role-based access so employers can manage documents + employees, while employees can view assigned materials. -
Upload documents (optionally OCR)
Documents are uploaded via UploadThing. If a PDF is scanned/image-based, you can enable OCR (Datalab Marker API) to extract clean text. -
Index & store for retrieval
The backend chunks the extracted text and generates embeddings, storing everything in PostgreSQL (+ pgvector) so downstream AI features can retrieve the right passages. -
Interact with documents (RAG chat + viewer)
Users open a document in the viewer and ask questions. The AI uses RAG over your indexed chunks to answer with document-grounded context, and chat history persists per document/session. -
Run Predictive Document Analysis (find gaps and next steps)
When you need completeness and compliance help, the predictive analyzer highlights missing documents, broken references, priority/urgency, and recommended actions (see deep dive below). -
Study Agent: StudyBuddy + AI Teacher (learn from your own documents)
Turn uploaded PDFs into a guided study experience. The Study Agent reuses the same ingestion + indexing pipeline so both modes can answer questions with RAG grounded in your uploaded documents.- StudyBuddy mode: a friendly coach that helps you stay consistent (plan, notes, timer, quick Q&A)
- AI Teacher mode: a structured instructor with multiple teaching surfaces (view/edit/draw) for lessons
-
Close the loop
Use insights from chat + predictive analysis + StudyBuddy sessions to upload missing docs, update categories, and keep your organizationβs knowledge base complete and actionable.
The Study Agent is the βlearn itβ layer on top of the same document ingestion + RAG stack.
- Upload or select your study documents (same documents used for document Q&A / analysis)
- Start onboarding at
/employer/studyAgent/onboarding - Choose mode: StudyBuddy or AI Teacher
- Create a study session:
- A new session is created and youβre redirected with
?sessionId=... - Your profile (name/grade/gender/field of study) and preferences (selected docs, AI personality) are stored
- An initial study plan is generated from the documents you selected
- A new session is created and youβre redirected with
- Resume anytime: session data is loaded using
sessionIdso conversations and study progress persist
StudyBuddy is optimized for momentum and daily studying while staying grounded in your documents.
- Document-grounded help (RAG): ask questions about your selected PDFs, and the agent retrieves relevant chunks to answer.
- Voice chat:
- Speech-to-text via the browserβs Web Speech API
- Optional text-to-speech via ElevenLabs (if configured)
- Messages are persisted to your session so you can continue later
- Study Plan (Goals):
- Create/edit/delete goals
- Mark goals complete/incomplete and track progress
- Attach βmaterialsβ (documents) to each goal and one-click βpull upβ the doc in the viewer
- Notes:
- Create/update/delete notes tied to your study session
- Tag notes and keep them organized while you study
- Pomodoro timer:
- Run focus sessions alongside your plan/notes
- Timer state can be synced to your session
- AI Query tab:
- A fast Q&A surface for questions while you keep your call / plan visible
AI Teacher is optimized for guided instruction and βteaching by doingβ across multiple views.
- Voice-led teaching + study plan tracking:
- Voice chat for interactive lessons
- A persistent study plan with material links (click to open the relevant doc)
- Three teaching surfaces (switchable in-session):
- View: document viewer for reading/teaching directly from the selected PDF
- Edit: a collaborative docs editor where you and the AI can build structured notes/explanations and download the result
- Draw: a whiteboard for visual explanations (pen/eraser, undo/redo, clear, export as PNG)
- AI Query tab:
- Ask targeted questions without interrupting the lesson flow
Per sessionId, the Study Agent persists:
- messages (StudyBuddy/Teacher conversations)
- study goals (plan items + completion state + attached materials)
- notes (StudyBuddy notes + updates)
- preferences/profile (selected documents and learner context)
Key API surfaces used by the Study Agent:
POST /api/study-agent/me/session(create session)GET /api/study-agent/me?sessionId=...(load session data)POST /api/study-agent/chat(RAG chat + optional agentic tools for notes/tasks/timer)POST /api/study-agent/me/messages(persist chat messages)POST/PUT/DELETE /api/study-agent/me/study-goals(plan CRUD)POST /api/study-agent/sync/notes(notes sync)
The Predictive Document Analysis feature is the cornerstone of PDR AI, providing intelligent document management and compliance assistance:
- Document Upload: Upload your professional documents (PDFs, contracts, manuals, etc.)
- AI Analysis: Our advanced AI scans through the document content and structure
- Missing Document Detection: Identifies references to documents that should be present but aren't
- Priority Classification: Automatically categorizes findings by importance and urgency
- Smart Recommendations: Provides specific, actionable recommendations for document management
- Related Content: Suggests relevant external resources and related documents
- Compliance Assurance: Never miss critical documents required for compliance
- Workflow Optimization: Streamline document management with AI-powered insights
- Risk Mitigation: Identify potential gaps in documentation before they become issues
- Time Savings: Automated analysis saves hours of manual document review
- Proactive Management: Stay ahead of document requirements and deadlines
The system provides comprehensive analysis including:
- Missing Documents Count: Total number of missing documents identified
- High Priority Items: Critical documents requiring immediate attention
- Recommendations: Specific actions to improve document organization
- Suggested Related Documents: External resources and related content
- Page References: Exact page numbers where missing documents are mentioned
PDR AI includes optional advanced OCR (Optical Character Recognition) capabilities for processing scanned documents, images, and PDFs with poor text extraction:
- Scanned Documents: Physical documents that have been scanned to PDF
- Image-based PDFs: PDFs that contain images of text rather than actual text
- Poor Quality Documents: Documents with low-quality text that standard extraction can't read
- Handwritten Content: Documents with handwritten notes or forms (with AI assistance)
- Mixed Content: Documents combining text, images, tables, and diagrams
Backend Infrastructure:
-
Environment Configuration: Set
DATALAB_API_KEYin your.envfile (optional) -
Database Schema: Tracks OCR status with fields:
ocrEnabled: Boolean flag indicating if OCR was requestedocrProcessed: Boolean flag indicating if OCR completed successfullyocrMetadata: JSON field storing OCR processing details (page count, processing time, etc.)
-
OCR Service Module (
src/app/api/services/ocrService.ts):- Complete Datalab Marker API integration
- Asynchronous submission and polling architecture
- Configurable processing options (force_ocr, use_llm, output_format)
- Comprehensive error handling and retry logic
- Timeout management (5 minutes default)
-
Upload API Enhancement (
src/app/api/uploadDocument/route.ts):- Dual-path processing:
- OCR Path: Uses Datalab Marker API when
enableOCR=true - Standard Path: Uses traditional PDFLoader for regular PDFs
- OCR Path: Uses Datalab Marker API when
- Unified chunking and embedding pipeline
- Stores OCR metadata with document records
- Dual-path processing:
Frontend Integration:
- Upload Form UI: OCR checkbox appears when
DATALAB_API_KEYis configured - Form Validation: Schema validates
enableOCRfield - User Guidance: Help text explains when to use OCR
- Dark Theme Support: Custom checkbox styling for both light and dark modes
// Standard PDF Upload (enableOCR: false or not set)
1. Download PDF from URL
2. Extract text using PDFLoader
3. Split into chunks
4. Generate embeddings
5. Store in database
// OCR-Enhanced Upload (enableOCR: true)
1. Download PDF from URL
2. Submit to Datalab Marker API
3. Poll for completion (up to 5 minutes)
4. Receive markdown/HTML/JSON output
5. Split into chunks
6. Generate embeddings
7. Store in database with OCR metadatainterface OCROptions {
force_ocr?: boolean; // Force OCR even if text exists
use_llm?: boolean; // Use AI for better accuracy
output_format?: 'markdown' | 'json' | 'html'; // Output format
strip_existing_ocr?: boolean; // Remove existing OCR layer
}-
Configure API Key (one-time setup):
DATALAB_API_KEY=your_datalab_api_key
-
Upload Document with OCR:
- Navigate to the employer upload page
- Select your document
- Check the "Enable OCR Processing" checkbox
- Upload the document
- System will process with OCR and notify when complete
-
Monitor Processing:
- OCR processing typically takes 1-3 minutes
- Progress is tracked in backend logs
- Document becomes available once processing completes
| Feature | Standard Processing | OCR Processing |
|---|---|---|
| Best For | Digital PDFs with embedded text | Scanned documents, images |
| Processing Time | < 10 seconds | 1-3 minutes |
| Accuracy | High for digital text | High for scanned/image text |
| Cost | Free (OpenAI embeddings only) | Requires Datalab API credits |
| Handwriting Support | No | Yes (with AI assistance) |
| Table Extraction | Basic | Advanced |
| Image Analysis | No | Yes |
The OCR system includes comprehensive error handling:
- API connection failures
- Timeout management (5-minute limit)
- Retry logic for transient errors
- Graceful fallback messages
- Detailed error logging
The predictive analysis feature automatically scans uploaded documents and provides comprehensive insights:
{
"success": true,
"documentId": 123,
"analysisType": "predictive",
"summary": {
"totalMissingDocuments": 5,
"highPriorityItems": 2,
"totalRecommendations": 3,
"totalSuggestedRelated": 4,
"analysisTimestamp": "2024-01-15T10:30:00Z"
},
"analysis": {
"missingDocuments": [
{
"documentName": "Employee Handbook",
"documentType": "Policy Document",
"reason": "Referenced in section 2.1 but not found in uploaded documents",
"page": 15,
"priority": "high",
"suggestedLinks": [
{
"title": "Sample Employee Handbook Template",
"link": "https://example.com/handbook-template",
"snippet": "Comprehensive employee handbook template..."
}
]
}
],
"recommendations": [
"Consider implementing a document version control system",
"Review document retention policies for compliance",
"Establish regular document audit procedures"
],
"suggestedRelatedDocuments": [
{
"title": "Document Management Best Practices",
"link": "https://example.com/best-practices",
"snippet": "Industry standards for document organization..."
}
]
}
}- Upload Documents: Use the employer dashboard to upload your documents
- Run Analysis: Click the "Predictive Analysis" tab in the document viewer
- Review Results: Examine missing documents, recommendations, and suggestions
- Take Action: Follow the provided recommendations and suggested links
- Track Progress: Re-run analysis to verify improvements
Ask questions about your documents and get AI-powered responses:
// Example API call for document Q&A
const response = await fetch('/api/LangChain', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
question: "What are the key compliance requirements mentioned?",
documentId: 123,
style: "professional" // or "casual", "technical", "summary"
})
});- Contract Management: Identify missing clauses, attachments, and referenced documents
- Regulatory Compliance: Ensure all required documentation is present and up-to-date
- Due Diligence: Comprehensive document review for mergers and acquisitions
- Risk Assessment: Identify potential legal risks from missing documentation
- Employee Documentation: Ensure all required employee documents are collected
- Policy Compliance: Verify policy documents are complete and current
- Onboarding Process: Streamline new employee documentation requirements
- Audit Preparation: Prepare for HR audits with confidence
- Financial Reporting: Ensure all supporting documents are included
- Audit Trail: Maintain complete documentation for financial audits
- Compliance Reporting: Meet regulatory requirements for document retention
- Process Documentation: Streamline financial process documentation
- Patient Records: Ensure complete patient documentation
- Regulatory Compliance: Meet healthcare documentation requirements
- Quality Assurance: Maintain high standards for medical documentation
- Risk Management: Identify potential documentation gaps
- Automated Analysis: Reduce manual document review time by 80%
- Instant Insights: Get immediate feedback on document completeness
- Proactive Management: Address issues before they become problems
- Compliance Assurance: Never miss critical required documents
- Error Prevention: Catch documentation gaps before they cause issues
- Audit Readiness: Always be prepared for regulatory audits
- Standardized Workflows: Establish consistent document management processes
- Quality Control: Maintain high standards for document organization
- Continuous Improvement: Use AI insights to optimize processes
- Document Review Time: 80% reduction in manual review time
- Compliance Risk: 95% reduction in missing document incidents
- Audit Preparation: 90% faster audit preparation time
- Process Efficiency: 70% improvement in document management workflows
- Framework: Next.js 15 with TypeScript
- Authentication: Clerk
- Database: PostgreSQL with Drizzle ORM
- AI Integration: OpenAI + LangChain
- OCR Processing: Datalab Marker API (optional)
- File Upload: UploadThing
- Styling: Tailwind CSS
- Package Manager: pnpm
Before you begin, ensure you have the following installed:
- Node.js (version 18.0 or higher)
- pnpm (recommended) or npm
- Docker (for local database)
- Git
git clone <repository-url>
cd pdr_ai_v2-2pnpm installCreate a .env file in the root directory with the following variables:
# Database Configuration
# Format: postgresql://[user]:[password]@[host]:[port]/[database]
# For local development using Docker: postgresql://postgres:password@localhost:5432/pdr_ai_v2
# For production: Use your production PostgreSQL connection string
DATABASE_URL="postgresql://postgres:password@localhost:5432/pdr_ai_v2"
# Clerk Authentication (get from https://clerk.com/)
# Required for user authentication and authorization
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY=your_clerk_publishable_key
CLERK_SECRET_KEY=your_clerk_secret_key
# Clerk Force Redirect URLs (Optional - for custom redirect after authentication)
# These URLs control where users are redirected after sign in/up/sign out
# If not set, Clerk will use default redirect behavior
NEXT_PUBLIC_CLERK_SIGN_IN_FORCE_REDIRECT_URL=https://your-domain.com/employer/home
NEXT_PUBLIC_CLERK_SIGN_UP_FORCE_REDIRECT_URL=https://your-domain.com/signup
NEXT_PUBLIC_CLERK_SIGN_OUT_FORCE_REDIRECT_URL=https://your-domain.com/
# OpenAI API (get from https://platform.openai.com/)
# Required for AI features: document analysis, embeddings, chat functionality
OPENAI_API_KEY=your_openai_api_key
# LangChain (get from https://smith.langchain.com/)
# Optional: Required for LangSmith tracing and monitoring of LangChain operations
# LangSmith provides observability, debugging, and monitoring for LangChain applications
LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY=your_langchain_api_key
# Tavily Search API (get from https://tavily.com/)
# Optional: Required for enhanced web search capabilities in document analysis
# Used for finding related documents and external resources
TAVILY_API_KEY=your_tavily_api_key
# Datalab Marker API (get from https://www.datalab.to/)
# Optional: Required for advanced OCR processing of scanned documents
# Enables OCR checkbox in document upload interface
DATALAB_API_KEY=your_datalab_api_key
# UploadThing (get from https://uploadthing.com/)
# Required for file uploads (PDF documents)
UPLOADTHING_SECRET=your_uploadthing_secret
UPLOADTHING_APP_ID=your_uploadthing_app_id
# Environment Configuration
# Options: development, test, production
NODE_ENV=development
# Optional: Skip environment validation (useful for Docker builds)
# Set to "true" to skip validation during build
# SKIP_ENV_VALIDATION=false# Make the script executable
chmod +x start-database.sh
# Start the database container
./start-database.shThis will:
- Create a Docker container with PostgreSQL
- Set up the database with proper credentials
- Generate a secure password if using default settings
# Generate migration files
pnpm db:generate
# Apply migrations to database
pnpm db:migrate
# Alternative: Push schema directly (for development)
pnpm db:push- Create account at Clerk
- Create a new application
- Copy the publishable and secret keys to your
.envfile - Configure sign-in/sign-up methods as needed
- Create account at OpenAI
- Generate an API key
- Add the key to your
.envfile
- Create account at LangSmith
- Generate an API key from your account settings
- Set
LANGCHAIN_TRACING_V2=trueand addLANGCHAIN_API_KEYto your.envfile - This enables tracing and monitoring of LangChain operations for debugging and observability
- Create account at Tavily
- Generate an API key from your dashboard
- Add
TAVILY_API_KEYto your.envfile - Used for enhanced web search capabilities in document analysis features
- Create account at Datalab
- Navigate to the API section and generate an API key
- Add
DATALAB_API_KEYto your.envfile - Enables advanced OCR processing for scanned documents and images in PDFs
- When configured, an OCR checkbox will appear in the document upload interface
- Create account at UploadThing
- Create a new app
- Copy the secret and app ID to your
.envfile
pnpm devThe application will be available at http://localhost:3000
# Build the application
pnpm build
# Start production server
pnpm startBefore deploying, ensure you have:
- β All environment variables configured
- β Production database set up (PostgreSQL with pgvector extension)
- β API keys for all external services
- β Domain name configured (if using custom domain)
Vercel is the recommended platform for Next.js applications:
Steps:
-
Push your code to GitHub
git push origin main
-
Import repository on Vercel
- Go to vercel.com and sign in
- Click "Add New Project"
- Import your GitHub repository
-
Set up Database and Environment Variables
Database Setup:
Option A: Using Vercel Postgres (Recommended)
- In Vercel dashboard, go to Storage β Create Database β Postgres
- Choose a region and create the database
- Vercel will automatically create the
DATABASE_URLenvironment variable - Enable pgvector extension: Connect to your database and run
CREATE EXTENSION IF NOT EXISTS vector;
Option B: Using Neon Database (Recommended for pgvector support)
- Create a Neon account at neon.tech if you don't have one
- Create a new project in Neon dashboard
- Choose PostgreSQL version 14 or higher
- In Vercel dashboard, go to your project β Storage tab
- Click "Create Database" or "Browse Marketplace"
- Select "Neon" from the integrations
- Click "Connect" or "Add Integration"
- Authenticate with your Neon account
- Select your Neon project and branch
- Vercel will automatically create the
DATABASE_URLenvironment variable from Neon - You may also see additional Neon-related variables like:
POSTGRES_URLPOSTGRES_PRISMA_URLPOSTGRES_URL_NON_POOLING- Your application uses
DATABASE_URL, so ensure this is set correctly
- Enable pgvector extension in Neon:
- Go to Neon dashboard β SQL Editor
- Run:
CREATE EXTENSION IF NOT EXISTS vector; - Or use Neon's SQL editor to enable the extension
Option C: Using External Database (Manual Setup)
- In Vercel dashboard, go to Settings β Environment Variables
- Click "Add New"
- Key:
DATABASE_URL - Value: Your PostgreSQL connection string (e.g.,
postgresql://user:password@host:port/database) - Select environments: Production, Preview, Development (as needed)
- Click "Save"
Add Other Environment Variables:
- In Vercel dashboard, go to Settings β Environment Variables
- Add all required environment variables:
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEYCLERK_SECRET_KEYOPENAI_API_KEYUPLOADTHING_SECRETUPLOADTHING_APP_IDNODE_ENV=productionLANGCHAIN_TRACING_V2=true(optional, for LangSmith tracing)LANGCHAIN_API_KEY(optional, required ifLANGCHAIN_TRACING_V2=true)TAVILY_API_KEY(optional, for enhanced web search)DATALAB_API_KEY(optional, for OCR processing)NEXT_PUBLIC_CLERK_SIGN_IN_FORCE_REDIRECT_URL(optional)NEXT_PUBLIC_CLERK_SIGN_UP_FORCE_REDIRECT_URL(optional)NEXT_PUBLIC_CLERK_SIGN_OUT_FORCE_REDIRECT_URL(optional)
-
Configure build settings
- Build Command:
pnpm build - Output Directory:
.next(default) - Install Command:
pnpm install
- Build Command:
-
Deploy
- Click "Deploy"
- Vercel will automatically deploy on every push to your main branch
Post-Deployment:
-
Enable pgvector Extension (Required)
- For Vercel Postgres: Connect to your database using Vercel's database connection tool or SQL editor in the Storage dashboard
- For Neon: Go to Neon dashboard β SQL Editor and run the command
- For External Database: Connect using your preferred PostgreSQL client
- Run:
CREATE EXTENSION IF NOT EXISTS vector;
-
Run Database Migrations
- After deployment, run migrations using one of these methods:
# Option 1: Using Vercel CLI locally vercel env pull .env.local pnpm db:migrate # Option 2: Using direct connection (set DATABASE_URL locally) DATABASE_URL="your_production_db_url" pnpm db:migrate # Option 3: Using Drizzle Studio with production URL DATABASE_URL="your_production_db_url" pnpm db:studio
- After deployment, run migrations using one of these methods:
-
Set up Clerk webhooks (if needed)
- Configure webhook URL in Clerk dashboard
- URL format:
https://your-domain.com/api/webhooks/clerk
-
Configure UploadThing
- Add your production domain to UploadThing allowed origins
- Configure CORS settings in UploadThing dashboard
Prerequisites:
- VPS with Node.js 18+ installed
- PostgreSQL database (with pgvector extension)
- Nginx (for reverse proxy)
- PM2 or similar process manager
Steps:
-
Clone and install dependencies
git clone <your-repo-url> cd pdr_ai_v2-2 pnpm install
-
Configure environment variables
# Create .env file nano .env # Add all production environment variables
-
Build the application
pnpm build
-
Set up PM2
# Install PM2 globally npm install -g pm2 # Start the application pm2 start pnpm --name "pdr-ai" -- start # Save PM2 configuration pm2 save pm2 startup
-
Configure Nginx
server { listen 80; server_name your-domain.com; location / { proxy_pass http://localhost:3000; proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection 'upgrade'; proxy_set_header Host $host; proxy_cache_bypass $http_upgrade; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; } }
-
Set up SSL with Let's Encrypt
sudo apt-get install certbot python3-certbot-nginx sudo certbot --nginx -d your-domain.com
-
Run database migrations
pnpm db:migrate
Important: Your production database must have the pgvector extension enabled:
-- Connect to your PostgreSQL database
CREATE EXTENSION IF NOT EXISTS vector;Database Connection:
For production, use a managed PostgreSQL service (recommended):
- Neon: Fully serverless PostgreSQL with pgvector support
- Supabase: PostgreSQL with pgvector extension
- AWS RDS: Managed PostgreSQL (requires manual pgvector installation)
- Railway: Simple PostgreSQL hosting
Example Neon connection string:
DATABASE_URL="postgresql://user:[email protected]/dbname?sslmode=require"
- Verify all environment variables are set correctly
- Database migrations have been run
- Clerk authentication is working
- File uploads are working (UploadThing)
- AI features are functioning (OpenAI API)
- Database has pgvector extension enabled
- SSL certificate is configured (if using custom domain)
- Monitoring and logging are set up
- Backup strategy is in place
- Error tracking is configured (e.g., Sentry)
Health Checks:
- Monitor application uptime
- Check database connection health
- Monitor API usage (OpenAI, UploadThing)
- Track error rates
Backup Strategy:
- Set up automated database backups
- Configure backup retention policy
- Test restore procedures regularly
Scaling Considerations:
- Database connection pooling (use PgBouncer or similar)
- CDN for static assets (Vercel handles this automatically)
- Rate limiting for API endpoints
- Caching strategy for frequently accessed data
# Database management
pnpm db:studio # Open Drizzle Studio (database GUI)
pnpm db:generate # Generate new migrations
pnpm db:migrate # Apply migrations
pnpm db:push # Push schema changes directly
# Code quality
pnpm lint # Run ESLint
pnpm lint:fix # Fix ESLint issues
pnpm typecheck # Run TypeScript type checking
pnpm format:write # Format code with Prettier
pnpm format:check # Check code formatting
# Development
pnpm check # Run linting and type checking
pnpm preview # Build and start production previewsrc/
βββ app/ # Next.js App Router
β βββ api/ # API routes
β β βββ predictive-document-analysis/ # Predictive analysis endpoints
β β β βββ route.ts # Main analysis API
β β β βββ agent.ts # AI analysis agent
β β βββ services/ # Backend services
β β β βββ ocrService.ts # OCR processing service
β β βββ uploadDocument/ # Document upload endpoint
β β βββ LangChain/ # AI chat functionality
β β βββ ... # Other API endpoints
β βββ employee/ # Employee dashboard pages
β βββ employer/ # Employer dashboard pages
β β βββ documents/ # Document viewer with predictive analysis
β β βββ upload/ # Document upload with OCR option
β βββ signup/ # Authentication pages
β βββ _components/ # Shared components
βββ server/
β βββ db/ # Database configuration and schema
βββ styles/ # CSS modules and global styles
βββ env.js # Environment validation
Key directories:
- `/employee` - Employee interface for document viewing and chat
- `/employer` - Employer interface for management and uploads
- `/api/predictive-document-analysis` - Core predictive analysis functionality
- `/api/services` - Reusable backend services (OCR, etc.)
- `/api/uploadDocument` - Document upload with OCR support
- `/api` - Backend API endpoints for all functionality
- `/server/db` - Database schema and configuration
POST /api/predictive-document-analysis- Analyze documents for missing content and recommendationsGET /api/fetchDocument- Retrieve document content for analysis
POST /api/uploadDocument- Upload documents for processing (supports OCR viaenableOCRparameter)- Standard path: Uses PDFLoader for digital PDFs
- OCR path: Uses Datalab Marker API for scanned documents
- Returns document metadata including OCR processing status
POST /api/LangChain- AI-powered document Q&AGET /api/Questions/fetch- Retrieve Q&A historyPOST /api/Questions/add- Add new questions
GET /api/fetchCompany- Get company documentsPOST /api/deleteDocument- Remove documentsGET /api/Categories/GetCategories- Get document categories
GET /api/metrics- Prometheus-compatible metrics stream (seedocs/observability.mdfor dashboard ideas)
- View assigned documents
- Chat with AI about documents
- Access document analysis and insights
- Pending approval flow for new employees
- Upload and manage documents
- Manage employee access and approvals
- View analytics and statistics
- Configure document categories
- Employee management dashboard
| Variable | Description | Required | Example |
|---|---|---|---|
DATABASE_URL |
PostgreSQL connection string. Format: postgresql://user:password@host:port/database |
β | postgresql://postgres:password@localhost:5432/pdr_ai_v2 |
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY |
Clerk publishable key (client-side). Get from Clerk Dashboard | β | pk_test_... |
CLERK_SECRET_KEY |
Clerk secret key (server-side). Get from Clerk Dashboard | β | sk_test_... |
NEXT_PUBLIC_CLERK_SIGN_IN_FORCE_REDIRECT_URL |
Force redirect URL after sign in. If not set, uses Clerk default. | β | https://your-domain.com/employer/home |
NEXT_PUBLIC_CLERK_SIGN_UP_FORCE_REDIRECT_URL |
Force redirect URL after sign up. If not set, uses Clerk default. | β | https://your-domain.com/signup |
NEXT_PUBLIC_CLERK_SIGN_OUT_FORCE_REDIRECT_URL |
Force redirect URL after sign out. If not set, uses Clerk default. | β | https://your-domain.com/ |
OPENAI_API_KEY |
OpenAI API key for AI features (embeddings, chat, document analysis). Get from OpenAI Platform | β | sk-... |
LANGCHAIN_TRACING_V2 |
Enable LangSmith tracing for LangChain operations. Set to true to enable. Get API key from LangSmith |
β | true or false |
LANGCHAIN_API_KEY |
LangChain API key for LangSmith tracing and monitoring. Required if LANGCHAIN_TRACING_V2=true. Get from LangSmith |
β | lsv2_... |
TAVILY_API_KEY |
Tavily Search API key for enhanced web search in document analysis. Get from Tavily | β | tvly-... |
DATALAB_API_KEY |
Datalab Marker API key for advanced OCR processing of scanned documents. Get from Datalab | β | your_datalab_key |
ELEVENLABS_API_KEY |
ElevenLabs API key for StudyBuddy/Teacher voice (text-to-speech). Get from ElevenLabs | β | your_elevenlabs_key |
ELEVENLABS_VOICE_ID |
Default ElevenLabs voice ID (optional). | β | 21m00Tcm4TlvDq8ikWAM |
UPLOADTHING_SECRET |
UploadThing secret key for file uploads. Get from UploadThing Dashboard | β | sk_live_... |
UPLOADTHING_APP_ID |
UploadThing application ID. Get from UploadThing Dashboard | β | your_app_id |
NODE_ENV |
Environment mode. Must be one of: development, test, production |
β | development |
SKIP_ENV_VALIDATION |
Skip environment validation during build (useful for Docker builds) | β | false or true |
- Authentication:
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY,CLERK_SECRET_KEY - Authentication Redirects:
NEXT_PUBLIC_CLERK_SIGN_IN_FORCE_REDIRECT_URL,NEXT_PUBLIC_CLERK_SIGN_UP_FORCE_REDIRECT_URL,NEXT_PUBLIC_CLERK_SIGN_OUT_FORCE_REDIRECT_URL - Database:
DATABASE_URL - AI Features:
OPENAI_API_KEY(used for embeddings, chat, and document analysis) - AI Observability:
LANGCHAIN_TRACING_V2,LANGCHAIN_API_KEY(for LangSmith tracing and monitoring) - Search Features:
TAVILY_API_KEY(for enhanced web search in document analysis) - OCR Processing:
DATALAB_API_KEY(for advanced OCR of scanned documents) - Study Agent Voice (Optional):
ELEVENLABS_API_KEY,ELEVENLABS_VOICE_ID - File Uploads:
UPLOADTHING_SECRET,UPLOADTHING_APP_ID - Build Configuration:
NODE_ENV,SKIP_ENV_VALIDATION
- Ensure Docker is running before starting the database
- Check if the database container is running:
docker ps - Restart the database:
docker restart pdr_ai_v2-postgres
- Verify all required environment variables are set
- Check
.envfile formatting (no spaces around=) - Ensure API keys are valid and have proper permissions
- Clear Next.js cache:
rm -rf .next - Reinstall dependencies:
rm -rf node_modules && pnpm install - Check TypeScript errors:
pnpm typecheck
- OCR checkbox not appearing: Verify
DATALAB_API_KEYis set in your.envfile - OCR processing timeout: Documents taking longer than 5 minutes will timeout; try with smaller documents first
- OCR processing failed: Check API key validity and Datalab service status
- Poor OCR quality: Enable
use_llm: trueoption in OCR configuration for AI-enhanced accuracy - Cost concerns: OCR uses Datalab API credits; use only for scanned/image-based documents
- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Make your changes
- Run tests and linting:
pnpm check - Commit your changes:
git commit -m 'Add feature' - Push to the branch:
git push origin feature-name - Submit a pull request
This project is private and proprietary.
For support or questions, contact the development team or create an issue in the repository.