Skip to content

Document Q&A

v2ray edited this page May 11, 2023 · 2 revisions

Overview

Starting from version v1.2.0, this program supports loading large document files. It automatically splits the document into chunks, gets embeddings for each chunk, and saves the data in a JSON file within the documentQA folder.

Getting Started

Follow these steps to convert and load a document file:

  1. Install the program according to the User Guide. Ensure you have a valid API key and the program works as expected.
  2. Launch the program. You'll be prompted to choose between loading the initial prompt, loading saved chat history, loading saved document Q&A, or creating a new document Q&A. To convert a text file to a JSON format that the program can understand, press C and then Enter.
  3. Enter the filename of the .txt document you want to convert when prompted. Place a .txt file in the documentQA folder(e.g., doc.txt). Enter the filename without the extension(e.g., doc) and press Enter.
  4. Specify the maximum number of tokens per chunk when prompted. Due to token limitations for each API request(4095 for gpt-3.5-turbo, 8191 for gpt-4), large documents must be split into smaller parts. The program uses embeddings to determine shared contexts between chunks and new inputs/questions, injecting relevant snippets into the initial prompt for the language model to reference. The recommended token count per chunk is around 300, but this may vary depending on the document type. Experiment to find the optimal count for your use case.
  5. Enter the initial prompt's filename when prompted. The initial prompt serves as a header message for each input, setting the model's tone and personality. Create custom initial prompts for different documents. For instance, if you have an initial prompt file in the initial folder named docinitial.txt, enter docinitial and press Enter. To use the default initial prompt, press Enter without inputting anything.
  6. Enter the desired filename for the saved document Q&A when prompted. Press Enter to use the original document text file's name, or input a custom filename(e.g., doc1). Press Enter to save the converted document Q&A file(e.g., doc1.json) in the documentQA folder.
  7. To load the converted document Q&A file, open the program and choose between loading the initial prompt, loading saved chat history, loading saved document Q&A, or creating a new document Q&A. Since the file is already converted, press D and then Enter.
  8. Enter the document Q&A's filename when prompted(e.g., doc1 from step 6) and press Enter.
  9. You can now ask questions about the content in the document file. The language model will provide answers based on the document, even if its size exceeds the model's maximum token count.

Supported File Types

This program currently supports the following file type:

  • .txt files with UTF-8 encoding.

There are currently no plans to add support for other file types in the near future. However, if you'd like to contribute by submitting a pull request for additional file type support, your contributions would be greatly appreciated, and I'll be glad to merge them.

Unsupported File Types

The following file types are currently not supported:

  • .pdf
  • .doc
  • .docx
  • .rtf
  • .odt
  • .epub
  • .mobi
  • .csv
  • .xls
  • .xlsx
  • etc.
Clone this wiki locally