Thank you for your interest in contributing to LiteParse! This document provides guidelines and information for contributors.
- Fork the repository
- Clone your fork:
git clone https://github.com/YOUR_USERNAME/liteparse.git cd liteparse - Install dependencies:
npm install
- Build the project:
npm run build
In this project, we welcome a wide range of contributions, but we do want to maintain the spirit of the project. We are primarily focused on:
- Core algorithms for PDF parsing and text extraction
- OCR integrations and improvements
- Different types or modifications to output formats
We are less interested in:
- Markdown output
- Any LLM integration or agent code
- Anything that doesn't directly relate to improving the core parsing and extraction capabilities
While the project is in Typescript today, I'm pretty open to porting to Rust if someone wanted to take that on as a contribution. The core algorithms and logic would be the same, just implemented in Rust instead of Typescript.
npm run build # Build TypeScript
npm run dev # Watch mode for developmentnpm test # Run tests
npm run test:watch # Run tests in watch modenpm run lint # Check for linting issues
npm run lint:fix # Fix linting issues
npm run format # Format code with PrettierYou can test your changes locally:
# Parse a document
./dist/src/index.js parse document.pdf
# Generate screenshots
./dist/src/index.js screenshot document.pdf -o ./screenshotsWe use Changesets to manage versioning and changelogs. When you make a change to source code that should be released:
- Run
npm run changeset - Select the type of change (patch, minor, major)
- Write a description of your changes
- Commit the generated changeset file with your PR
- Fork and create a feature branch from
main - Make your changes
- Add a changeset if needed (
npm run changeset) - Ensure all tests pass (
npm test) - Ensure linting passes (
npm run lint:fixandnpm run format) - Submit a pull request
When you submit a PR, a number of CICD checks will run. Among these, your code will be tested against a regression suite of documents to ensure that your changes don't break existing parsing capabilities. It will be up to the maintainers discretion to determine if any changes to the regression set are expected/positive or unexpected/negative.
- Keep PRs focused on a single change
- Update documentation if needed
- Add tests for new functionality
- For parsing issues, include a test document if possible
If you're reporting a problem with document parsing:
- You must attach the document or provide a way to reproduce the issue
- Include the command you ran
- Show the expected vs actual output
- Include your LiteParse version (
lit --version)
Issues without reproducible examples will be closed.
For other bugs:
- Describe what you expected vs what happened
- Include steps to reproduce
- Include error messages/stack traces
- Include version information
See AGENTS.md for detailed documentation about the codebase structure and architecture.
Key directories:
src/core/- Main orchestrator and configurationsrc/engines/- PDF and OCR engine implementationssrc/processing/- Text extraction and spatial analysissrc/output/- Output formatterscli/- CLI implementation
- Open a Discussion for questions
- Check existing issues before opening new ones
- Read the README for usage documentation
By contributing, you agree that your contributions will be licensed under the Apache 2.0 License.