Skip to content

Improve documentation files with large langugage models

License

Notifications You must be signed in to change notification settings

argonne-lcf/drdoc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dr. Doc

Dr. Doc is currently a toy but useful project to improve documentation files by identifying and correcting grammar, formatting errors, and broken links using large language models. Currently, this project uses the Argo API, which provides access to OpenAI models for Argonne researchers. Future updates will include support for other models, structured output to simplify prompts, as well as GitHub and GitLab actions for continuous integration.

Please note that the current version of gpt4o used by Argo is limited to 4096 output tokens. Therefore, the largest files you can process with Argo/gpt4o are around 15 KB.

Features

  • Fixes grammar, formatting, and link issues in documentation files.
  • Supports Markdown (.md), reStructuredText (.rst), and plain text formats.
  • Provides a detailed explanation of changes made to the documentation.
  • Optional Git integration to commit changes directly.

Requirements

  • Argo API credentials (ARGO_URL and ARGO_USER must be defined in the environment)
  • Python 3.8 or higher
  • requests>=2.25.0

Setup

  1. Clone the repository and navigate to the project directory:

    git clone <repository-url>
    cd drdoc
  2. Define the required environment variables for the Argo API:

    export ARGO_URL=<your-argo-url>
    export ARGO_USER=<your-argo-user>
  3. (Optional) Install the package:

    pip install -e .

Usage

If you have installed Dr. Doc with pip as described above, you can run it with drdoc (drdoc -h for help menu). If not, you need to run the Python script with python <path_to_drdoc>/drdoc.py.

drdoc <doc_path> [options]

or without installation:

python <path_to_drdoc>/drdoc.py <doc_path> [options]

Command Line Options

  • doc_path: (Required) Path to the documentation file or directory containing files to process.
  • --argo_url: (Optional) Argo API endpoint URL (default: value of ARGO_URL environment variable).
  • --argo_user: (Optional) Argo API user (default: value of ARGO_USER environment variable).
  • --model: (Optional) Model to use (e.g., gpt4o, gpt35; default: gpt4o).
  • --temperature: (Optional) Sampling temperature for the model (default: 0.1).
  • --top_p: (Optional) Top-p sampling for the model (default: 0.9).
  • --max_tokens: (Optional) Max tokens for the prompt (default: 4096).
  • --max_completion_tokens: (Optional) Max tokens for the completion (default: 16000).
  • --inplace: (Optional) Modify the original file in place instead of creating a new one.
  • --commit: (Optional) Commit changes to Git with the explanation as the commit message.
  • --format: (Optional) Format of the documentation file (md, rst, or txt; default: md).

Example Commands

Process a Markdown file:

drdoc doc/sample.md

This would create doc/sample_fixed.md.

Process all ReStructuredText documentation files (*.rst files) in the doc directory:

drdoc doc/ --format rst

Process a file and modify it in-place:

drdoc doc/sample.md --inplace

Process a file in place and commit changes (you need to run it inside the git project):

cd <your_git_repo>
drdoc README.md --inplace --commit

TODO

  • Add support for LangChain to use other models.
  • Optionally ask for confirmation for each change.
  • Enable using ALCF inference endpoints.
  • Add GitHub and GitLab actions to process documentation files for CI.
  • Improve the prompts and user experience with feedback.

Contributing

We welcome contributions to improve Dr. Doc! Please open an issue or submit a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

About

Improve documentation files with large langugage models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages