This example application demonstrates how to perform web crawling, semantic search, and Retrieval-Augmented Generation (RAG) using Korvus and Firecrawl.
- Web crawling using Firecrawl
- Semantic search over crawled content
- RAG (Retrieval-Augmented Generation) for question answering
- Python 3.7+
- Firecrawl API key
- PostgresML database URL
-
Clone this repository:
git clone https://github.com/postgresml/example-korvus-firecrawl cd example-korvus-firecrawl
-
Install the required packages:
pip install -r requirements.txt
-
Create a
.env
file in the project root and add your credentials:FIRECRAWL_API_KEY=your_firecrawl_api_key KORVUS_DATABASE_URL=your_postgresml_database_url CRAWL_URL=https://example.com CRAWL_LIMIT=100
The application supports three main actions: crawl, search, and rag.
-
Crawl a website:
python main.py crawl
-
Perform semantic search:
python main.py search
-
Use RAG for question answering:
python main.py rag
For search and RAG, you'll be prompted to enter queries. Type 'q' to quit the input loop.
- The application uses Firecrawl to crawl the specified website and extract markdown content.
- Crawled data is processed and stored using Korvus.
- Semantic search allows you to find relevant documents based on your queries.
- RAG combines retrieved context with a language model to answer questions.
Contributions are welcome! Please feel free to submit a Pull Request.