PDR AI - Professional Document Reader AI

A Next.js application that uses advanced AI technology to analyze, interpret, and extract insights from professional documents. Features employee/employer authentication, document upload and management, AI-powered chat, and comprehensive predictive document analysis that identifies missing documents, provides recommendations, and suggests related content.

🧭 End-to-end workflow (how the features connect)

PDR AI is designed as one connected loop: capture documents → make them searchable → ask questions → spot gaps → act → learn.

Authenticate & pick a workspace (Employer / Employee)
Clerk handles auth and role-based access so employers can manage documents + employees, while employees can view assigned materials.
Upload documents (optionally OCR)
Documents are uploaded via UploadThing. If a PDF is scanned/image-based, you can enable OCR (Datalab Marker API) to extract clean text.
Index & store for retrieval
The backend chunks the extracted text and generates embeddings, storing everything in PostgreSQL (+ pgvector) so downstream AI features can retrieve the right passages.
Interact with documents (RAG chat + viewer)
Users open a document in the viewer and ask questions. The AI uses RAG over your indexed chunks to answer with document-grounded context, and chat history persists per document/session.
Run Predictive Document Analysis (find gaps and next steps)
When you need completeness and compliance help, the predictive analyzer highlights missing documents, broken references, priority/urgency, and recommended actions (see deep dive below).
Study Agent: StudyBuddy + AI Teacher (learn from your own documents)
Turn uploaded PDFs into a guided study experience. The Study Agent reuses the same ingestion + indexing pipeline so both modes can answer questions with RAG grounded in your uploaded documents.
- StudyBuddy mode: a friendly coach that helps you stay consistent (plan, notes, timer, quick Q&A)
- AI Teacher mode: a structured instructor with multiple teaching surfaces (view/edit/draw) for lessons
Close the loop
Use insights from chat + predictive analysis + StudyBuddy sessions to upload missing docs, update categories, and keep your organization’s knowledge base complete and actionable.

Web Search Agent Workflow

🎓 Study Agent (StudyBuddy + AI Teacher)

The Study Agent is the “learn it” layer on top of the same document ingestion + RAG stack.

How sessions work (shared foundation)

Upload or select your study documents (same documents used for document Q&A / analysis)
Start onboarding at /employer/studyAgent/onboarding
Choose mode: StudyBuddy or AI Teacher
Create a study session:
- A new session is created and you’re redirected with ?sessionId=...
- Your profile (name/grade/gender/field of study) and preferences (selected docs, AI personality) are stored
- An initial study plan is generated from the documents you selected
Resume anytime: session data is loaded using sessionId so conversations and study progress persist

StudyBuddy (friendly coach)

StudyBuddy is optimized for momentum and daily studying while staying grounded in your documents.

Document-grounded help (RAG): ask questions about your selected PDFs, and the agent retrieves relevant chunks to answer.
Voice chat:
- Speech-to-text via the browser’s Web Speech API
- Optional text-to-speech via ElevenLabs (if configured)
- Messages are persisted to your session so you can continue later
Study Plan (Goals):
- Create/edit/delete goals
- Mark goals complete/incomplete and track progress
- Attach “materials” (documents) to each goal and one-click “pull up” the doc in the viewer
Notes:
- Create/update/delete notes tied to your study session
- Tag notes and keep them organized while you study
Pomodoro timer:
- Run focus sessions alongside your plan/notes
- Timer state can be synced to your session
AI Query tab:
- A fast Q&A surface for questions while you keep your call / plan visible

AI Teacher (structured instructor)

AI Teacher is optimized for guided instruction and “teaching by doing” across multiple views.

Voice-led teaching + study plan tracking:
- Voice chat for interactive lessons
- A persistent study plan with material links (click to open the relevant doc)
Three teaching surfaces (switchable in-session):
- View: document viewer for reading/teaching directly from the selected PDF
- Edit: a collaborative docs editor where you and the AI can build structured notes/explanations and download the result
- Draw: a whiteboard for visual explanations (pen/eraser, undo/redo, clear, export as PNG)
AI Query tab:
- Ask targeted questions without interrupting the lesson flow

Persistence & sync (what’s saved)

Per sessionId, the Study Agent persists:

messages (StudyBuddy/Teacher conversations)
study goals (plan items + completion state + attached materials)
notes (StudyBuddy notes + updates)
preferences/profile (selected documents and learner context)

Key API surfaces used by the Study Agent:

POST /api/study-agent/me/session (create session)
GET /api/study-agent/me?sessionId=... (load session data)
POST /api/study-agent/chat (RAG chat + optional agentic tools for notes/tasks/timer)
POST /api/study-agent/me/messages (persist chat messages)
POST/PUT/DELETE /api/study-agent/me/study-goals (plan CRUD)
POST /api/study-agent/sync/notes (notes sync)

🔍 Predictive Document Analysis Deep Dive

The Predictive Document Analysis feature is the cornerstone of PDR AI, providing intelligent document management and compliance assistance:

How It Works

Document Upload: Upload your professional documents (PDFs, contracts, manuals, etc.)
AI Analysis: Our advanced AI scans through the document content and structure
Missing Document Detection: Identifies references to documents that should be present but aren't
Priority Classification: Automatically categorizes findings by importance and urgency
Smart Recommendations: Provides specific, actionable recommendations for document management
Related Content: Suggests relevant external resources and related documents

Key Benefits

Compliance Assurance: Never miss critical documents required for compliance
Workflow Optimization: Streamline document management with AI-powered insights
Risk Mitigation: Identify potential gaps in documentation before they become issues
Time Savings: Automated analysis saves hours of manual document review
Proactive Management: Stay ahead of document requirements and deadlines

Analysis Output

The system provides comprehensive analysis including:

Missing Documents Count: Total number of missing documents identified
High Priority Items: Critical documents requiring immediate attention
Recommendations: Specific actions to improve document organization
Suggested Related Documents: External resources and related content
Page References: Exact page numbers where missing documents are mentioned

📖 Usage Examples

OCR Processing for Scanned Documents

PDR AI includes optional advanced OCR (Optical Character Recognition) capabilities for processing scanned documents, images, and PDFs with poor text extraction:

When to Use OCR

Scanned Documents: Physical documents that have been scanned to PDF
Image-based PDFs: PDFs that contain images of text rather than actual text
Poor Quality Documents: Documents with low-quality text that standard extraction can't read
Handwritten Content: Documents with handwritten notes or forms (with AI assistance)
Mixed Content: Documents combining text, images, tables, and diagrams

How It Works

Backend Infrastructure:

Environment Configuration: Set DATALAB_API_KEY in your .env file (optional)
Database Schema: Tracks OCR status with fields:
- ocrEnabled: Boolean flag indicating if OCR was requested
- ocrProcessed: Boolean flag indicating if OCR completed successfully
- ocrMetadata: JSON field storing OCR processing details (page count, processing time, etc.)
OCR Service Module (src/app/api/services/ocrService.ts):
- Complete Datalab Marker API integration
- Asynchronous submission and polling architecture
- Configurable processing options (force_ocr, use_llm, output_format)
- Comprehensive error handling and retry logic
- Timeout management (5 minutes default)
Upload API Enhancement (src/app/api/uploadDocument/route.ts):
- Dual-path processing:
  - OCR Path: Uses Datalab Marker API when enableOCR=true
  - Standard Path: Uses traditional PDFLoader for regular PDFs
- Unified chunking and embedding pipeline
- Stores OCR metadata with document records

Frontend Integration:

Upload Form UI: OCR checkbox appears when DATALAB_API_KEY is configured
Form Validation: Schema validates enableOCR field
User Guidance: Help text explains when to use OCR
Dark Theme Support: Custom checkbox styling for both light and dark modes

Processing Flow

// Standard PDF Upload (enableOCR: false or not set)
1. Download PDF from URL
2. Extract text using PDFLoader
3. Split into chunks
4. Generate embeddings
5. Store in database

// OCR-Enhanced Upload (enableOCR: true)
1. Download PDF from URL
2. Submit to Datalab Marker API
3. Poll for completion (up to 5 minutes)
4. Receive markdown/HTML/JSON output
5. Split into chunks
6. Generate embeddings
7. Store in database with OCR metadata

OCR Configuration Options

interface OCROptions {
  force_ocr?: boolean;        // Force OCR even if text exists
  use_llm?: boolean;          // Use AI for better accuracy
  output_format?: 'markdown' | 'json' | 'html';  // Output format
  strip_existing_ocr?: boolean;  // Remove existing OCR layer
}

Using the OCR Feature

Configure API Key (one-time setup):
```
DATALAB_API_KEY=your_datalab_api_key
```
Upload Document with OCR:
- Navigate to the employer upload page
- Select your document
- Check the "Enable OCR Processing" checkbox
- Upload the document
- System will process with OCR and notify when complete
Monitor Processing:
- OCR processing typically takes 1-3 minutes
- Progress is tracked in backend logs
- Document becomes available once processing completes

OCR vs Standard Processing

Feature	Standard Processing	OCR Processing
Best For	Digital PDFs with embedded text	Scanned documents, images
Processing Time	< 10 seconds	1-3 minutes
Accuracy	High for digital text	High for scanned/image text
Cost	Free (OpenAI embeddings only)	Requires Datalab API credits
Handwriting Support	No	Yes (with AI assistance)
Table Extraction	Basic	Advanced
Image Analysis	No	Yes

Error Handling

The OCR system includes comprehensive error handling:

API connection failures
Timeout management (5-minute limit)
Retry logic for transient errors
Graceful fallback messages
Detailed error logging

Predictive Document Analysis

The predictive analysis feature automatically scans uploaded documents and provides comprehensive insights:

Example Analysis Response

{
  "success": true,
  "documentId": 123,
  "analysisType": "predictive",
  "summary": {
    "totalMissingDocuments": 5,
    "highPriorityItems": 2,
    "totalRecommendations": 3,
    "totalSuggestedRelated": 4,
    "analysisTimestamp": "2024-01-15T10:30:00Z"
  },
  "analysis": {
    "missingDocuments": [
      {
        "documentName": "Employee Handbook",
        "documentType": "Policy Document",
        "reason": "Referenced in section 2.1 but not found in uploaded documents",
        "page": 15,
        "priority": "high",
        "suggestedLinks": [
          {
            "title": "Sample Employee Handbook Template",
            "link": "https://example.com/handbook-template",
            "snippet": "Comprehensive employee handbook template..."
          }
        ]
      }
    ],
    "recommendations": [
      "Consider implementing a document version control system",
      "Review document retention policies for compliance",
      "Establish regular document audit procedures"
    ],
    "suggestedRelatedDocuments": [
      {
        "title": "Document Management Best Practices",
        "link": "https://example.com/best-practices",
        "snippet": "Industry standards for document organization..."
      }
    ]
  }
}

Using the Analysis in Your Workflow

Upload Documents: Use the employer dashboard to upload your documents
Run Analysis: Click the "Predictive Analysis" tab in the document viewer
Review Results: Examine missing documents, recommendations, and suggestions
Take Action: Follow the provided recommendations and suggested links
Track Progress: Re-run analysis to verify improvements

AI Chat Integration

Ask questions about your documents and get AI-powered responses:

// Example API call for document Q&A
const response = await fetch('/api/LangChain', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    question: "What are the key compliance requirements mentioned?",
    documentId: 123,
    style: "professional" // or "casual", "technical", "summary"
  })
});

🎯 Use Cases & Benefits

Industries That Benefit Most

Legal & Compliance

Contract Management: Identify missing clauses, attachments, and referenced documents
Regulatory Compliance: Ensure all required documentation is present and up-to-date
Due Diligence: Comprehensive document review for mergers and acquisitions
Risk Assessment: Identify potential legal risks from missing documentation

Human Resources

Employee Documentation: Ensure all required employee documents are collected
Policy Compliance: Verify policy documents are complete and current
Onboarding Process: Streamline new employee documentation requirements
Audit Preparation: Prepare for HR audits with confidence

Finance & Accounting

Financial Reporting: Ensure all supporting documents are included
Audit Trail: Maintain complete documentation for financial audits
Compliance Reporting: Meet regulatory requirements for document retention
Process Documentation: Streamline financial process documentation

Healthcare

Patient Records: Ensure complete patient documentation
Regulatory Compliance: Meet healthcare documentation requirements
Quality Assurance: Maintain high standards for medical documentation
Risk Management: Identify potential documentation gaps

Business Benefits

Time Savings

Automated Analysis: Reduce manual document review time by 80%
Instant Insights: Get immediate feedback on document completeness
Proactive Management: Address issues before they become problems

Risk Reduction

Compliance Assurance: Never miss critical required documents
Error Prevention: Catch documentation gaps before they cause issues
Audit Readiness: Always be prepared for regulatory audits

Process Improvement

Standardized Workflows: Establish consistent document management processes
Quality Control: Maintain high standards for document organization
Continuous Improvement: Use AI insights to optimize processes

ROI Metrics

Document Review Time: 80% reduction in manual review time
Compliance Risk: 95% reduction in missing document incidents
Audit Preparation: 90% faster audit preparation time
Process Efficiency: 70% improvement in document management workflows

🛠 Tech Stack

Framework: Next.js 15 with TypeScript
Authentication: Clerk
Database: PostgreSQL with Drizzle ORM
AI Integration: OpenAI + LangChain
OCR Processing: Datalab Marker API (optional)
File Upload: UploadThing
Styling: Tailwind CSS
Package Manager: pnpm

📋 Prerequisites

Before you begin, ensure you have the following installed:

Node.js (version 18.0 or higher)
pnpm (recommended) or npm
Docker (for local database)
Git

🔧 Installation & Setup

1. Clone the Repository

git clone <repository-url>
cd pdr_ai_v2-2

2. Install Dependencies

pnpm install

3. Environment Configuration

Create a .env file in the root directory with the following variables:

# Database Configuration
# Format: postgresql://[user]:[password]@[host]:[port]/[database]
# For local development using Docker: postgresql://postgres:password@localhost:5432/pdr_ai_v2
# For production: Use your production PostgreSQL connection string
DATABASE_URL="postgresql://postgres:password@localhost:5432/pdr_ai_v2"

# Clerk Authentication (get from https://clerk.com/)
# Required for user authentication and authorization
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY=your_clerk_publishable_key
CLERK_SECRET_KEY=your_clerk_secret_key

# Clerk Force Redirect URLs (Optional - for custom redirect after authentication)
# These URLs control where users are redirected after sign in/up/sign out
# If not set, Clerk will use default redirect behavior
NEXT_PUBLIC_CLERK_SIGN_IN_FORCE_REDIRECT_URL=https://your-domain.com/employer/home
NEXT_PUBLIC_CLERK_SIGN_UP_FORCE_REDIRECT_URL=https://your-domain.com/signup
NEXT_PUBLIC_CLERK_SIGN_OUT_FORCE_REDIRECT_URL=https://your-domain.com/

# OpenAI API (get from https://platform.openai.com/)
# Required for AI features: document analysis, embeddings, chat functionality
OPENAI_API_KEY=your_openai_api_key

# LangChain (get from https://smith.langchain.com/)
# Optional: Required for LangSmith tracing and monitoring of LangChain operations
# LangSmith provides observability, debugging, and monitoring for LangChain applications
LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY=your_langchain_api_key

# Tavily Search API (get from https://tavily.com/)
# Optional: Required for enhanced web search capabilities in document analysis
# Used for finding related documents and external resources
TAVILY_API_KEY=your_tavily_api_key

# Datalab Marker API (get from https://www.datalab.to/)
# Optional: Required for advanced OCR processing of scanned documents
# Enables OCR checkbox in document upload interface
DATALAB_API_KEY=your_datalab_api_key

# UploadThing (get from https://uploadthing.com/)
# Required for file uploads (PDF documents)
UPLOADTHING_SECRET=your_uploadthing_secret
UPLOADTHING_APP_ID=your_uploadthing_app_id

# Environment Configuration
# Options: development, test, production
NODE_ENV=development

# Optional: Skip environment validation (useful for Docker builds)
# Set to "true" to skip validation during build
# SKIP_ENV_VALIDATION=false

4. Database Setup

Start Local PostgreSQL Database

# Make the script executable
chmod +x start-database.sh

# Start the database container
./start-database.sh

This will:

Create a Docker container with PostgreSQL
Set up the database with proper credentials
Generate a secure password if using default settings

Run Database Migrations

# Generate migration files
pnpm db:generate

# Apply migrations to database
pnpm db:migrate

# Alternative: Push schema directly (for development)
pnpm db:push

5. Set Up External Services

Clerk Authentication

Create account at Clerk
Create a new application
Copy the publishable and secret keys to your .env file
Configure sign-in/sign-up methods as needed

OpenAI API

Create account at OpenAI
Generate an API key
Add the key to your .env file

LangChain (LangSmith) - Optional

Create account at LangSmith
Generate an API key from your account settings
Set LANGCHAIN_TRACING_V2=true and add LANGCHAIN_API_KEY to your .env file
This enables tracing and monitoring of LangChain operations for debugging and observability

Tavily Search API - Optional

Create account at Tavily
Generate an API key from your dashboard
Add TAVILY_API_KEY to your .env file
Used for enhanced web search capabilities in document analysis features

Datalab Marker API - Optional

Create account at Datalab
Navigate to the API section and generate an API key
Add DATALAB_API_KEY to your .env file
Enables advanced OCR processing for scanned documents and images in PDFs
When configured, an OCR checkbox will appear in the document upload interface

UploadThing

Create account at UploadThing
Create a new app
Copy the secret and app ID to your .env file

🚀 Running the Application

Development Mode

pnpm dev

The application will be available at http://localhost:3000

Production Build

# Build the application
pnpm build

# Start production server
pnpm start

🚀 Deployment Guide

Prerequisites for Production

Before deploying, ensure you have:

✅ All environment variables configured
✅ Production database set up (PostgreSQL with pgvector extension)
✅ API keys for all external services
✅ Domain name configured (if using custom domain)

Deployment Options

1. Vercel (Recommended for Next.js)

Vercel is the recommended platform for Next.js applications:

Steps:

Push your code to GitHub
```
git push origin main
```
Import repository on Vercel
- Go to vercel.com and sign in
- Click "Add New Project"
- Import your GitHub repository
Set up Database and Environment Variables

Database Setup:

Option A: Using Vercel Postgres (Recommended)
- In Vercel dashboard, go to Storage → Create Database → Postgres
- Choose a region and create the database
- Vercel will automatically create the DATABASE_URL environment variable
- Enable pgvector extension: Connect to your database and run CREATE EXTENSION IF NOT EXISTS vector;
Option B: Using Neon Database (Recommended for pgvector support)
- Create a Neon account at neon.tech if you don't have one
- Create a new project in Neon dashboard
- Choose PostgreSQL version 14 or higher
- In Vercel dashboard, go to your project → Storage tab
- Click "Create Database" or "Browse Marketplace"
- Select "Neon" from the integrations
- Click "Connect" or "Add Integration"
- Authenticate with your Neon account
- Select your Neon project and branch
- Vercel will automatically create the DATABASE_URL environment variable from Neon
- You may also see additional Neon-related variables like:
  - POSTGRES_URL
  - POSTGRES_PRISMA_URL
  - POSTGRES_URL_NON_POOLING
  - Your application uses DATABASE_URL, so ensure this is set correctly
- Enable pgvector extension in Neon:
  - Go to Neon dashboard → SQL Editor
  - Run: CREATE EXTENSION IF NOT EXISTS vector;
  - Or use Neon's SQL editor to enable the extension
Option C: Using External Database (Manual Setup)
- In Vercel dashboard, go to Settings → Environment Variables
- Click "Add New"
- Key: DATABASE_URL
- Value: Your PostgreSQL connection string (e.g., postgresql://user:password@host:port/database)
- Select environments: Production, Preview, Development (as needed)
- Click "Save"
Add Other Environment Variables:
- In Vercel dashboard, go to Settings → Environment Variables
- Add all required environment variables:
  - NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY
  - CLERK_SECRET_KEY
  - OPENAI_API_KEY
  - UPLOADTHING_SECRET
  - UPLOADTHING_APP_ID
  - NODE_ENV=production
  - LANGCHAIN_TRACING_V2=true (optional, for LangSmith tracing)
  - LANGCHAIN_API_KEY (optional, required if LANGCHAIN_TRACING_V2=true)
  - TAVILY_API_KEY (optional, for enhanced web search)
  - DATALAB_API_KEY (optional, for OCR processing)
  - NEXT_PUBLIC_CLERK_SIGN_IN_FORCE_REDIRECT_URL (optional)
  - NEXT_PUBLIC_CLERK_SIGN_UP_FORCE_REDIRECT_URL (optional)
  - NEXT_PUBLIC_CLERK_SIGN_OUT_FORCE_REDIRECT_URL (optional)
Configure build settings
- Build Command: pnpm build
- Output Directory: .next (default)
- Install Command: pnpm install
Deploy
- Click "Deploy"
- Vercel will automatically deploy on every push to your main branch

Post-Deployment:

Enable pgvector Extension (Required)
- For Vercel Postgres: Connect to your database using Vercel's database connection tool or SQL editor in the Storage dashboard
- For Neon: Go to Neon dashboard → SQL Editor and run the command
- For External Database: Connect using your preferred PostgreSQL client
- Run: CREATE EXTENSION IF NOT EXISTS vector;

Run Database Migrations

After deployment, run migrations using one of these methods:

# Option 1: Using Vercel CLI locally
vercel env pull .env.local
pnpm db:migrate

# Option 2: Using direct connection (set DATABASE_URL locally)
DATABASE_URL="your_production_db_url" pnpm db:migrate

# Option 3: Using Drizzle Studio with production URL
DATABASE_URL="your_production_db_url" pnpm db:studio

Set up Clerk webhooks (if needed)
- Configure webhook URL in Clerk dashboard
- URL format: https://your-domain.com/api/webhooks/clerk
Configure UploadThing
- Add your production domain to UploadThing allowed origins
- Configure CORS settings in UploadThing dashboard

2. Self-Hosted VPS Deployment

Prerequisites:

VPS with Node.js 18+ installed
PostgreSQL database (with pgvector extension)
Nginx (for reverse proxy)
PM2 or similar process manager

Steps:

Clone and install dependencies

git clone <your-repo-url>
cd pdr_ai_v2-2
pnpm install

Configure environment variables

# Create .env file
nano .env
# Add all production environment variables

Build the application
```
pnpm build
```

Set up PM2

# Install PM2 globally
npm install -g pm2

# Start the application
pm2 start pnpm --name "pdr-ai" -- start

# Save PM2 configuration
pm2 save
pm2 startup

Configure Nginx

server {
    listen 80;
    server_name your-domain.com;

    location / {
        proxy_pass http://localhost:3000;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_cache_bypass $http_upgrade;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

Set up SSL with Let's Encrypt

sudo apt-get install certbot python3-certbot-nginx
sudo certbot --nginx -d your-domain.com

Run database migrations
```
pnpm db:migrate
```

Production Database Setup

Important: Your production database must have the pgvector extension enabled:

-- Connect to your PostgreSQL database
CREATE EXTENSION IF NOT EXISTS vector;

Database Connection:

For production, use a managed PostgreSQL service (recommended):

Neon: Fully serverless PostgreSQL with pgvector support
Supabase: PostgreSQL with pgvector extension
AWS RDS: Managed PostgreSQL (requires manual pgvector installation)
Railway: Simple PostgreSQL hosting

Example Neon connection string:

DATABASE_URL="postgresql://user:[email protected]/dbname?sslmode=require"

Post-Deployment Checklist

Monitoring and Maintenance

Health Checks:

Monitor application uptime
Check database connection health
Monitor API usage (OpenAI, UploadThing)
Track error rates

Backup Strategy:

Set up automated database backups
Configure backup retention policy
Test restore procedures regularly

Scaling Considerations:

Database connection pooling (use PgBouncer or similar)
CDN for static assets (Vercel handles this automatically)
Rate limiting for API endpoints
Caching strategy for frequently accessed data

Other Useful Scripts

# Database management
pnpm db:studio          # Open Drizzle Studio (database GUI)
pnpm db:generate         # Generate new migrations
pnpm db:migrate          # Apply migrations
pnpm db:push             # Push schema changes directly

# Code quality
pnpm lint                # Run ESLint
pnpm lint:fix            # Fix ESLint issues
pnpm typecheck           # Run TypeScript type checking
pnpm format:write        # Format code with Prettier
pnpm format:check        # Check code formatting

# Development
pnpm check               # Run linting and type checking
pnpm preview             # Build and start production preview

📁 Project Structure

src/
├── app/                    # Next.js App Router
│   ├── api/               # API routes
│   │   ├── predictive-document-analysis/  # Predictive analysis endpoints
│   │   │   ├── route.ts   # Main analysis API
│   │   │   └── agent.ts   # AI analysis agent
│   │   ├── services/      # Backend services
│   │   │   └── ocrService.ts  # OCR processing service
│   │   ├── uploadDocument/  # Document upload endpoint
│   │   ├── LangChain/     # AI chat functionality
│   │   └── ...            # Other API endpoints
│   ├── employee/          # Employee dashboard pages
│   ├── employer/          # Employer dashboard pages
│   │   ├── documents/     # Document viewer with predictive analysis
│   │   └── upload/        # Document upload with OCR option
│   ├── signup/            # Authentication pages
│   └── _components/       # Shared components
├── server/
│   └── db/               # Database configuration and schema
├── styles/               # CSS modules and global styles
└── env.js                # Environment validation

Key directories:
- `/employee` - Employee interface for document viewing and chat
- `/employer` - Employer interface for management and uploads
- `/api/predictive-document-analysis` - Core predictive analysis functionality
- `/api/services` - Reusable backend services (OCR, etc.)
- `/api/uploadDocument` - Document upload with OCR support
- `/api` - Backend API endpoints for all functionality
- `/server/db` - Database schema and configuration

🔌 API Endpoints

Predictive Document Analysis

POST /api/predictive-document-analysis - Analyze documents for missing content and recommendations
GET /api/fetchDocument - Retrieve document content for analysis

Document Upload & Processing

POST /api/uploadDocument - Upload documents for processing (supports OCR via enableOCR parameter)
- Standard path: Uses PDFLoader for digital PDFs
- OCR path: Uses Datalab Marker API for scanned documents
- Returns document metadata including OCR processing status

AI Chat & Q&A

POST /api/LangChain - AI-powered document Q&A
GET /api/Questions/fetch - Retrieve Q&A history
POST /api/Questions/add - Add new questions

Document Management

GET /api/fetchCompany - Get company documents
POST /api/deleteDocument - Remove documents
GET /api/Categories/GetCategories - Get document categories

Observability

GET /api/metrics - Prometheus-compatible metrics stream (see docs/observability.md for dashboard ideas)

🔐 User Roles & Permissions

Employee

View assigned documents
Chat with AI about documents
Access document analysis and insights
Pending approval flow for new employees

Employer

Upload and manage documents
Manage employee access and approvals
View analytics and statistics
Configure document categories
Employee management dashboard

🛡️ Environment Variables Reference

Variable	Description	Required	Example
`DATABASE_URL`	PostgreSQL connection string. Format: `postgresql://user:password@host:port/database`	✅	`postgresql://postgres:password@localhost:5432/pdr_ai_v2`
`NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY`	Clerk publishable key (client-side). Get from Clerk Dashboard	✅	`pk_test_...`
`CLERK_SECRET_KEY`	Clerk secret key (server-side). Get from Clerk Dashboard	✅	`sk_test_...`
`NEXT_PUBLIC_CLERK_SIGN_IN_FORCE_REDIRECT_URL`	Force redirect URL after sign in. If not set, uses Clerk default.	✅	`https://your-domain.com/employer/home`
`NEXT_PUBLIC_CLERK_SIGN_UP_FORCE_REDIRECT_URL`	Force redirect URL after sign up. If not set, uses Clerk default.	✅	`https://your-domain.com/signup`
`NEXT_PUBLIC_CLERK_SIGN_OUT_FORCE_REDIRECT_URL`	Force redirect URL after sign out. If not set, uses Clerk default.	✅	`https://your-domain.com/`
`OPENAI_API_KEY`	OpenAI API key for AI features (embeddings, chat, document analysis). Get from OpenAI Platform	✅	`sk-...`
`LANGCHAIN_TRACING_V2`	Enable LangSmith tracing for LangChain operations. Set to `true` to enable. Get API key from LangSmith	❌	`true` or `false`
`LANGCHAIN_API_KEY`	LangChain API key for LangSmith tracing and monitoring. Required if `LANGCHAIN_TRACING_V2=true`. Get from LangSmith	❌	`lsv2_...`
`TAVILY_API_KEY`	Tavily Search API key for enhanced web search in document analysis. Get from Tavily	❌	`tvly-...`
`DATALAB_API_KEY`	Datalab Marker API key for advanced OCR processing of scanned documents. Get from Datalab	❌	`your_datalab_key`
`ELEVENLABS_API_KEY`	ElevenLabs API key for StudyBuddy/Teacher voice (text-to-speech). Get from ElevenLabs	❌	`your_elevenlabs_key`
`ELEVENLABS_VOICE_ID`	Default ElevenLabs voice ID (optional).	❌	`21m00Tcm4TlvDq8ikWAM`
`UPLOADTHING_SECRET`	UploadThing secret key for file uploads. Get from UploadThing Dashboard	✅	`sk_live_...`
`UPLOADTHING_APP_ID`	UploadThing application ID. Get from UploadThing Dashboard	✅	`your_app_id`
`NODE_ENV`	Environment mode. Must be one of: `development`, `test`, `production`	✅	`development`
`SKIP_ENV_VALIDATION`	Skip environment validation during build (useful for Docker builds)	❌	`false` or `true`

Environment Variables by Feature

Authentication: NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY, CLERK_SECRET_KEY
Authentication Redirects: NEXT_PUBLIC_CLERK_SIGN_IN_FORCE_REDIRECT_URL, NEXT_PUBLIC_CLERK_SIGN_UP_FORCE_REDIRECT_URL, NEXT_PUBLIC_CLERK_SIGN_OUT_FORCE_REDIRECT_URL
Database: DATABASE_URL
AI Features: OPENAI_API_KEY (used for embeddings, chat, and document analysis)
AI Observability: LANGCHAIN_TRACING_V2, LANGCHAIN_API_KEY (for LangSmith tracing and monitoring)
Search Features: TAVILY_API_KEY (for enhanced web search in document analysis)
OCR Processing: DATALAB_API_KEY (for advanced OCR of scanned documents)
Study Agent Voice (Optional): ELEVENLABS_API_KEY, ELEVENLABS_VOICE_ID
File Uploads: UPLOADTHING_SECRET, UPLOADTHING_APP_ID
Build Configuration: NODE_ENV, SKIP_ENV_VALIDATION

🐛 Troubleshooting

Database Issues

Ensure Docker is running before starting the database
Check if the database container is running: docker ps
Restart the database: docker restart pdr_ai_v2-postgres

Environment Issues

Verify all required environment variables are set
Check .env file formatting (no spaces around =)
Ensure API keys are valid and have proper permissions

Build Issues

Clear Next.js cache: rm -rf .next
Reinstall dependencies: rm -rf node_modules && pnpm install
Check TypeScript errors: pnpm typecheck

OCR Processing Issues

OCR checkbox not appearing: Verify DATALAB_API_KEY is set in your .env file
OCR processing timeout: Documents taking longer than 5 minutes will timeout; try with smaller documents first
OCR processing failed: Check API key validity and Datalab service status
Poor OCR quality: Enable use_llm: true option in OCR configuration for AI-enhanced accuracy
Cost concerns: OCR uses Datalab API credits; use only for scanned/image-based documents

🤝 Contributing

Fork the repository
Create a feature branch: git checkout -b feature-name
Make your changes
Run tests and linting: pnpm check
Commit your changes: git commit -m 'Add feature'
Push to the branch: git push origin feature-name
Submit a pull request

📝 License

This project is private and proprietary.

📞 Support

For support or questions, contact the development team or create an issue in the repository.

Name		Name	Last commit message	Last commit date
Latest commit History 330 Commits
.github/workflows		.github/workflows
__mocks__		__mocks__
__tests__		__tests__
docs		docs
drizzle		drizzle
public		public
scripts		scripts
src		src
.env.example		.env.example
.gitignore		.gitignore
.prettierrc.yml		.prettierrc.yml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
PREDICTIVE_DOCUMENT_ANALYSIS.md		PREDICTIVE_DOCUMENT_ANALYSIS.md
README.md		README.md
drizzle.config.ts		drizzle.config.ts
eslint.config.js		eslint.config.js
jest.babel.config.cjs		jest.babel.config.cjs
jest.config.js		jest.config.js
next.config.ts		next.config.ts
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
postcss.config.js		postcss.config.js
prettier.config.js		prettier.config.js
qodana.yaml		qodana.yaml
start-database.sh		start-database.sh
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json

License

Deodat-Lawson/PDR_AI_v2

Folders and files

Latest commit

History

Repository files navigation

PDR AI - Professional Document Reader AI

🧭 End-to-end workflow (how the features connect)

Web Search Agent Workflow

🎓 Study Agent (StudyBuddy + AI Teacher)

How sessions work (shared foundation)

StudyBuddy (friendly coach)

AI Teacher (structured instructor)

Persistence & sync (what’s saved)

🔍 Predictive Document Analysis Deep Dive

How It Works

Key Benefits

Analysis Output

📖 Usage Examples

OCR Processing for Scanned Documents

When to Use OCR

How It Works

Processing Flow

OCR Configuration Options

Using the OCR Feature

OCR vs Standard Processing

Error Handling

Predictive Document Analysis

Example Analysis Response

Using the Analysis in Your Workflow

AI Chat Integration

🎯 Use Cases & Benefits

Industries That Benefit Most

Legal & Compliance

Human Resources

Finance & Accounting

Healthcare

Business Benefits

Time Savings

Risk Reduction

Process Improvement

ROI Metrics

🛠 Tech Stack

📋 Prerequisites

🔧 Installation & Setup

1. Clone the Repository

2. Install Dependencies

3. Environment Configuration

4. Database Setup

Start Local PostgreSQL Database

Run Database Migrations

5. Set Up External Services

Clerk Authentication

OpenAI API

LangChain (LangSmith) - Optional

Tavily Search API - Optional

Datalab Marker API - Optional

UploadThing

🚀 Running the Application

Development Mode

Production Build

🚀 Deployment Guide

Prerequisites for Production

Deployment Options

1. Vercel (Recommended for Next.js)

2. Self-Hosted VPS Deployment

Production Database Setup

Post-Deployment Checklist

Monitoring and Maintenance

Other Useful Scripts

📁 Project Structure

🔌 API Endpoints

Predictive Document Analysis

Document Upload & Processing

AI Chat & Q&A

Document Management

Observability

🔐 User Roles & Permissions

Employee

Employer

Packages