The Multi-Agent Automation System is a modular Python-based system designed to automate web scraping, AI-powered use case generation, and Kaggle dataset collection. The system utilizes three independent agents, each performing a specific task:
- Agent 1: Web Scraping - Extracts text from websites based on a list of keywords.
- Agent 2: Use Case Generator - Uses AI to generate relevant use cases from the extracted text.
- Agent 3: Resource Collector - Uses the Kaggle API to search and download datasets related to the use cases.
The system is built using the Retrieval-Augmented Generation (RAG) framework for data retrieval and processing, LangChain for handling the flow of data between agents, and the Gemini AI API for generating AI-based outputs.
Technologies Used:
RAG (Retrieval-Augmented Generation): This system uses RAG for enhancing the quality of AI generation by retrieving relevant data before processing.
LangChain: Utilized for building the system's workflow, making it possible to manage agents and tasks seamlessly.
Gemini AI API: Used for generating use cases and text processing tasks via an API key.
By leveraging these powerful tools, this multi-agent system can efficiently scrape websites, generate useful insights, and retrieve datasets for further analysis.
The system is coordinated by a central controller script (main_agent.py
), and it stores the output in various files within the data/
directory.
- Project Overview
- Architecture Diagram
- Data Storage Files
- Sample Input and Output
- Execution Instructions
- License
Below is the architecture diagram of the system:
+-------------------------------+
| Main Controller |
| main_agent.py |
+-------------------------------+
| | |
-----------------------------+ | +--------------------------------
| | |
+------------------------+ +------------------------+ +------------------------+
| Agent 1: Web Scraper | | Agent 2: Use Case Gen | | Agent 3: Resource Coll |
| agent1_webscrap.py | | agent2_usecase.py | | agent3_resource.py |
+------------------------+ +------------------------+ +------------------------+
| | |
| | |
+------------------------+ +------------------------+ +------------------------+
| Extracted Text File | | Generated Use Cases | | Downloaded Datasets |
| extracted_text.txt | | use_cases.txt | | resource_links.csv |
+------------------------+ +------------------------+ +------------------------+
|
+------------------------+
| |
| keywords.txt |
+------------------------+
The following files are created by the system to store the output:
File Name | Data Type | Description |
---|---|---|
extracted_text.txt |
Plain Text | Extracted text from web pages based on keywords. |
use_cases.txt |
Plain Text | Generated use cases using AI (from extracted text). |
keywords.txt |
Plain Text | List of keywords for web scraping. |
resource_links.csv |
CSV | Resource links related to Kaggle datasets. |
Company Name | Industry | Segment | Wikipedia Link |
---|---|---|---|
Tata Motors | Automotive | Manufacturing of commercial and passenger vehicles | Visit Wikipedia |
Mahindra & Mahindra | Automotive | Manufacturing of SUVs, tractors, and commercial vehicles | Visit Wikipedia |
Bajaj Auto | Automotive | Manufacturing of motorcycles, scooters, and three-wheelers | Visit Wikipedia |
Larsen & Toubro (L&T) | Construction | Engineering, construction, and infrastructure development | Visit Wikipedia |
DLF Limited | Real Estate | Residential and commercial property development | Visit Wikipedia |
Godrej Properties | Real Estate | Residential and commercial real estate projects | Visit Wikipedia |
Reliance Industries | Energy | Oil refining, petrochemicals, and renewable energy | Visit Wikipedia |
Indian Oil Corporation | Energy | Oil refining, distribution, and marketing | Visit Wikipedia |
NTPC Limited | Energy | Power generation and renewable energy | Visit Wikipedia |
Infosys | IT Services | Software development, consulting, and IT outsourcing | Visit Wikipedia |
TCS (Tata Consultancy Services) | IT Services | IT services, consulting, and business solutions | Visit Wikipedia |
Wipro | IT Services | IT services, consulting, and digital transformation | Visit Wikipedia |
HDFC Bank | Finance | Retail banking, corporate banking, and loans | Visit Wikipedia |
ICICI Bank | Finance | Retail banking, corporate banking, insurance | Visit Wikipedia |
Bajaj Finserv | Finance | Financial services including lending, insurance, and wealth management | Visit Wikipedia |
Apollo Hospitals | Healthcare | Multispecialty hospitals and healthcare services | Visit Wikipedia |
Fortis Healthcare | Healthcare | Multispecialty hospitals and diagnostics | Visit Wikipedia |
Dr. Reddy's Laboratories | Pharmaceuticals | Manufacturing of generic drugs, active pharmaceutical ingredients (APIs) | Visit Wikipedia |
Cipla | Pharmaceuticals | Manufacturing of generic drugs and respiratory care products | Visit Wikipedia |
Flipkart | E-commerce/Retail | Online retail platform for electronics, fashion, groceries, etc. | Visit Wikipedia |
Company Name | Industry | Segment | Wikipedia Link |
---|---|---|---|
BYJU'S | Education | Online learning platform for K-12 students | Visit Wikipedia |
Vedantu | Education | Online tutoring platform for school students | Not available on Wikipedia |
Unacademy | Education | Online education platform for competitive exams | Not available on Wikipedia |
Simplilearn | Education | Online certification training courses in technology and business | Not available on Wikipedia |
Toppr (Acquired by BYJU'S) | Education | Online learning app for K-12 students | Not available on Wikipedia |
extracted_text.txt
This file contains the raw text extracted from websites based on the keywords provided.
Extracted Text |
---|
"Startup ideas are crucial for growing a business..." |
"Entrepreneurship requires a strong vision and strategy..." |
use_cases.txt
This file contains the AI-generated use cases derived from the extracted text.
Use Case |
---|
"AI for business growth" |
"Use of AI in marketing" |
keywords.txt
This file contains a list of keywords that the scraper will use to identify relevant content for extraction from websites.
Keyword |
---|
startup ideas |
entrepreneurship |
AI use cases |
resource_links.csv
This file contains the Kaggle dataset resources related to the generated use cases.
Dataset Name | Link |
---|---|
"Business Growth AI" | https://www.kaggle.com/dataset-xyz |
"Marketing AI" | https://www.kaggle.com/dataset-abc |
/Multi-agent architecture
│
├── main_agent.py # Main controller that triggers agent execution
├── agents/ # Folder containing agent scripts
│ ├── agent1_webscrap.py # Web scraping agent
│ ├── agent2_usecase.py # Use case generation agent
│ ├── agent3_resources.py # Resource collection agent
│
├── data/ # Folder where output files are saved
│ ├── extracted_text.txt # Raw text scraped from URLs
│ ├── use_cases.txt # Generated use cases
│ ├── keywords.txt # Extracted keywords
│ ├── resource_links.csv # Collected resource links in CSV format
│
├── .env # Contains API keys and environment variables
├── requirements.txt # Python dependencies
├── myenv # Virtual environment directory
When running main_agent.py
, you will provide one or more URLs as input:
python main_agent.py "https://example1.com" "https://example2.com"
Text extracted from the website https://example1.com:
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Text extracted from the website https://example2.com:
Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
Use Case 1: Text extraction from websites.
Description: This use case involves extracting relevant data from multiple sources.
...
Text, Website, Extraction, Use Case, AI, Automation
URL,Category,Description
https://resource1.com,AI,Resource for AI research
https://resource2.com,Web Scraping,Guides for efficient web scraping
First, clone the repository to your local machine:
git clone https://github.com/your-username/multi-agent-automation.git
cd multi-agent-automation
Create a virtual environment (optional but recommended):
python -m venv menv
Activate the virtual environment:
- On Windows:
menv\Scripts\activate
- On macOS/Linux:
source menv/bin/activate .\menv\Scripts\activate
Install the required Python dependencies:
pip install -r requirements.txt
Create a .env
file in the root directory and add your API keys for Google Generative AI and Kaggle:
GOOGLE_API_KEY=your-google-api-key
KAGGLE_API_KEY=your-kaggle-api-key
JINA_API_KEY =your-kaggle-api-key
To run the multi-agent automation system, simply execute the main_agent.py
script:
python main_agent.py
This will trigger the sequence of tasks:
- Web scraping (Agent 1)
- Use case generation (Agent 2)
- Resource collection (Agent 3)
Run the main agent:
To execute the scraping process, run the main_agent.py
with the list of URLs:
python main_agent.py "https://example1.com" "https://example2.com"
- Python 3.9 or later
- Libraries listed in
requirements.txt
- Virtual environment for isolating dependencies
This project is licensed under the MIT License - see the LICENSE file for details.