Documai

An intelligent README generator that analyzes GitHub repositories and creates professional documentation using local AI models. Built with Node.js, Express, and Ollama.

Overview

Documai is a web-based AI agent that automatically generates high-quality README files for any GitHub repository. Instead of manually writing documentation, simply provide a repository URL, and Documai will:

Fetch important source files from the repository
Analyze the codebase using a local AI model
Generate a professional, well-structured README

The entire process runs locally with zero external AI service dependencies—all analysis happens on your machine using Ollama.

Features

Local-first: All AI processing happens on your local machine using Ollama
Zero external dependencies: No cloud APIs required
GitHub integration: Automatically fetches repository structure and code
Multiple model support: Works with mistral, llama3.2, deepseek-r1, and other Ollama models
Easy to use: Simple web interface for non-technical users
Customizable: Pass a GitHub PAT for private repository access
Download-ready: Generated READMEs download as markdown files

Demo

Note: The whole documai process may take much longer depending on your machine ressources and chosen model.

documai-demo.mp4

You can check documai's full version of the generated markdown documentation seen in the above demo video here.

Prerequisites

Before using Documai, ensure you have:

Node.js (v14 or higher)
npm (comes with Node.js)
Ollama installed and running locally
A local Ollama model (e.g., mistral, llama3.2, or deepseek-r1:14b)

Installing Ollama

Download from ollama.ai
Follow the installation guide for your OS
Start the Ollama server: ollama serve
Pull a model: ollama pull mistral (or your preferred model)

Quick Start

1. Install dependencies

npm install

2. Start Ollama

In a separate terminal:

ollama serve

Verify Ollama is running:

curl http://localhost:11434/api/tags

3. Start Documai

npm start

The app will start at http://localhost:3000.

4. Generate a README

Open http://localhost:3000 in your browser
Paste a GitHub repository URL (e.g., https://github.com/owner/repo)
(Optional) Add your GitHub PAT if accessing a private repository
Select a local model (default: mistral)
Click Generate README
Wait for analysis and generation (typically 30-60 seconds)
Download the generated README

Usage

Environment Variables

You can configure Documai using environment variables in a .env file:

# GitHub Personal Access Token (optional, for private repos)
GITHUB_PAT=your_github_pat_here

# Ollama server URL (default: http://127.0.0.1:11434)
OLLAMA_URL=http://127.0.0.1:11434

# Default model (default: mistral)
OLLAMA_MODEL=mistral

# Express server port (default: 3000)
PORT=3000

API Endpoint

You can also call the generation endpoint directly:

curl -X POST http://localhost:3000/generate \
  -H "Content-Type: application/json" \
  -d '{
    "repoUrl": "https://github.com/owner/repo",
    "pat": "your_github_pat",
    "model": "mistral"
  }'

Response:

{
  "owner": "owner",
  "repo": "repo",
  "readme": "# repo\n\n> Repository: https://github.com/owner/repo\n\n..."
}

Project Structure

documai/
├── server.js                 # Express backend and core logic
├── public/
│   ├── index.html           # Web UI (forms, inputs, display)
│   └── app.js               # Client-side JavaScript (form handlers)
├── package.json             # Node.js dependencies
├── .env                     # Environment configuration (local only)
├── .gitignore              # Git ignore rules
├── README.md               # This file
├── LICENSE                 # License information
└── generate-readme.json    # (Legacy) n8n workflow export

Key Files

server.js: REST API server that orchestrates GitHub API calls and Ollama model inference
public/index.html: Web interface with form inputs and README display
public/app.js: Client-side event handling and API communication
.env: Local environment configuration (GitHub token, Ollama URL/model)

Architecture

High-Level Flow

User Input
    ↓
[Browser] → GET/POST requests → [Express Server]
                                    ↓
                            Parse Repository URL
                                    ↓
                            GitHub API: Fetch repo metadata
                                    ↓
                            GitHub API: Fetch file tree
                                    ↓
                            Select important files (.js, .ts, .py, etc.)
                                    ↓
                            For each file:
                                - Fetch file content
                                - Analyze with Ollama model
                                - Extract insights
                                    ↓
                            Combine all insights
                                    ↓
                            Generate final README with Ollama
                                    ↓
                            Return README to client
                                    ↓
                            [Browser] Display & Download

Component Breakdown

Frontend (Browser)

Static HTML with form inputs
JavaScript for form submission and file download
Loading indicator for generation progress
Textarea for displaying generated README

Backend (Express)

/generate POST endpoint for README generation
GitHub API integration to fetch repository data
File filtering logic to select important source files
Ollama integration for AI analysis and README generation
Error handling and helpful error messages

External Services

GitHub API: Repository metadata, file tree, file content retrieval
Ollama API: Local AI model inference for analysis and generation

Workflow

Step-by-Step Generation Process

Parse Repository URL
- Extract owner and repo name from GitHub URL
Fetch Repository Metadata
- Call GitHub API to get default branch, owner info
Fetch File Tree
- Retrieve recursive file tree of the repository
Filter Important Files
- Select up to 10 important files based on extensions:
  - Code: .js, .ts, .jsx, .tsx, .py, .java, .go, .rs, .php, .cpp, .c, .h, .hpp, .cs, .rb
  - Config: requirements.txt
- Exclude folders: node_modules, dist, build, .git, coverage, __pycache__
Analyze Each File
- Fetch file content from GitHub
- Send to Ollama for analysis:
  - What the file does
  - Technologies and frameworks
  - Architectural insights
  - Concise summary
- Store summaries
Generate Final README
- Combine all file summaries
- Send comprehensive prompt to Ollama:
  - Repository name and owner
  - File analysis summaries
  - Instructions for professional README structure
- Ollama generates sections:
  - Overview
  - Features
  - Technologies Used
  - Architecture
  - Installation
  - Usage
  - Workflow
  - Project Structure
  - Conclusion
Return to User
- Display generated README in textarea
- Provide download button

Error Handling

Missing repo URL: Returns 400 error
Invalid GitHub URL: Returns 400 error
Ollama unreachable: Returns 503 with helpful message
GitHub API rate limits: Handled by providing clear error messages
File fetch errors: Skipped individually; generation continues

Tested Models

This project was built and tested with:

mistral (default) - Fast, reliable, good balance of quality and speed
llama3.2 - Lighter model, faster inference
deepseek-r1:14b - Larger model, more detailed analysis

Other Ollama models should work, but may require different timeout configurations for larger or slower models.

Dependencies

express: Web framework for Node.js
axios: HTTP client for API calls
dotenv: Environment variable management

Troubleshooting

"Local model server unreachable"

Cause: Ollama is not running or not accessible at http://127.0.0.1:11434

Solution:

# Terminal 1: Start Ollama
ollama serve

# Terminal 2: Verify it's running
curl http://localhost:11434/api/tags

"Invalid GitHub URL"

Cause: Repository URL is not in the format https://github.com/owner/repo

Solution: Ensure the URL follows the GitHub format

"GitHub API rate limit exceeded"

Cause: Too many requests without authentication

Solution: Provide a GitHub PAT in the .env file or UI field

"Generation takes too long or times out"

Cause: Model is too large or system is slow

Solution:

Use a faster model (e.g., llama3.2 instead of deepseek-r1:14b)
Reduce repository file count by using .gitignore
Increase timeout in Express (modify server.js)

License

See LICENSE for details.

Coming Improvements

Integration with GitHub to auto-commit READMEs
Code snippet extraction and syntax highlighting
Progress indicator for file analysis
README template customization

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
public		public
screenshots		screenshots
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
env-example		env-example
generate-readme.json		generate-readme.json
generated-readme-test.md		generated-readme-test.md
package-lock.json		package-lock.json
package.json		package.json
server.js		server.js

Folders and files

Latest commit

History

Repository files navigation

Documai

Overview

Features

Demo

Prerequisites

Installing Ollama

Quick Start

1. Install dependencies

2. Start Ollama

3. Start Documai

4. Generate a README

Usage

Environment Variables

API Endpoint

Project Structure

Key Files

Architecture

High-Level Flow

Component Breakdown

Workflow

Step-by-Step Generation Process

Error Handling

Tested Models

Dependencies

Troubleshooting

"Local model server unreachable"

"Invalid GitHub URL"

"GitHub API rate limit exceeded"

"Generation takes too long or times out"

License

Coming Improvements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages