-
Notifications
You must be signed in to change notification settings - Fork 38
Document Q&A
v2ray edited this page May 11, 2023
·
2 revisions
Starting from version v1.2.0, this program supports loading large document files. It automatically splits the document into chunks, gets embeddings for each chunk, and saves the data in a JSON file within the documentQA
folder.
Follow these steps to convert and load a document file:
- Install the program according to the User Guide. Ensure you have a valid API key and the program works as expected.
- Launch the program. You'll be prompted to choose between loading the initial prompt, loading saved chat history, loading saved document Q&A, or creating a new document Q&A. To convert a text file to a JSON format that the program can understand, press
C
and thenEnter
. - Enter the filename of the
.txt
document you want to convert when prompted. Place a.txt
file in thedocumentQA
folder(e.g.,doc.txt
). Enter the filename without the extension(e.g.,doc
) and pressEnter
. - Specify the maximum number of tokens per chunk when prompted. Due to token limitations for each API request(4095 for
gpt-3.5-turbo
, 8191 forgpt-4
), large documents must be split into smaller parts. The program uses embeddings to determine shared contexts between chunks and new inputs/questions, injecting relevant snippets into the initial prompt for the language model to reference. The recommended token count per chunk is around 300, but this may vary depending on the document type. Experiment to find the optimal count for your use case. - Enter the initial prompt's filename when prompted. The initial prompt serves as a header message for each input, setting the model's tone and personality. Create custom initial prompts for different documents. For instance, if you have an initial prompt file in the
initial
folder nameddocinitial.txt
, enterdocinitial
and pressEnter
. To use the default initial prompt, pressEnter
without inputting anything. - Enter the desired filename for the saved document Q&A when prompted. Press
Enter
to use the original document text file's name, or input a custom filename(e.g.,doc1
). PressEnter
to save the converted document Q&A file(e.g.,doc1.json
) in thedocumentQA
folder. - To load the converted document Q&A file, open the program and choose between loading the initial prompt, loading saved chat history, loading saved document Q&A, or creating a new document Q&A. Since the file is already converted, press
D
and thenEnter
. - Enter the document Q&A's filename when prompted(e.g.,
doc1
from step 6) and pressEnter
. - You can now ask questions about the content in the document file. The language model will provide answers based on the document, even if its size exceeds the model's maximum token count.
This program currently supports the following file type:
- .txt files with UTF-8 encoding.
There are currently no plans to add support for other file types in the near future. However, if you'd like to contribute by submitting a pull request for additional file type support, your contributions would be greatly appreciated, and I'll be glad to merge them.
The following file types are currently not supported:
- .doc
- .docx
- .rtf
- .odt
- .epub
- .mobi
- .csv
- .xls
- .xlsx
- etc.