This project demonstrates a simple Retrieval-Augmented Generation (RAG) pipeline using DSPy and ChromaDB. The pipeline processes a PDF document, stores the text in a ChromaDB collection, and uses a language model to answer questions based on the retrieved context.
pipeline.ipynb
: Jupyter notebook containing the code for the RAG pipeline.data/tesla10K.pdf
: Sample PDF file used for text extraction and processing.
-
Clone the repository:
git clone https://github.com/marioyordanoff/dspy
-
Create a virtual environment and activate it:
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
Install the required packages:
pip install -r requirements.txt
-
Create a
.env
file:touch .env
-
Add your environment variables to the
.env
file. For example:OPENAI_API_KEY=your_openai_api_key
This project is licensed under the MIT License. See the LICENSE file for details.