Multi-Agent Automation System

Project Overview

The Multi-Agent Automation System is a modular Python-based system designed to automate web scraping, AI-powered use case generation, and Kaggle dataset collection. The system utilizes three independent agents, each performing a specific task:

Agent 1: Web Scraping - Extracts text from websites based on a list of keywords.
Agent 2: Use Case Generator - Uses AI to generate relevant use cases from the extracted text.
Agent 3: Resource Collector - Uses the Kaggle API to search and download datasets related to the use cases.

The system is built using the Retrieval-Augmented Generation (RAG) framework for data retrieval and processing, LangChain for handling the flow of data between agents, and the Gemini AI API for generating AI-based outputs.

Technologies Used:

RAG (Retrieval-Augmented Generation): This system uses RAG for enhancing the quality of AI generation by retrieving relevant data before processing.

LangChain: Utilized for building the system's workflow, making it possible to manage agents and tasks seamlessly.

Gemini AI API: Used for generating use cases and text processing tasks via an API key.

By leveraging these powerful tools, this multi-agent system can efficiently scrape websites, generate useful insights, and retrieve datasets for further analysis.

The system is coordinated by a central controller script (main_agent.py), and it stores the output in various files within the data/ directory.

Architecture Diagram

Below is the architecture diagram of the system:

                          +-------------------------------+
                          |       Main Controller        |
                          |         main_agent.py        |
                          +-------------------------------+
                                     |     |     |
        -----------------------------+     |     +--------------------------------
        |                                  |                                  |
+------------------------+    +------------------------+    +------------------------+
|   Agent 1: Web Scraper |    | Agent 2: Use Case Gen  |    | Agent 3: Resource Coll |
|    agent1_webscrap.py  |    |  agent2_usecase.py     |    |  agent3_resource.py    |
+------------------------+    +------------------------+    +------------------------+
        |                         |                         |
        |                         |                         |
+------------------------+   +------------------------+   +------------------------+
|   Extracted Text File  |   |  Generated Use Cases   |   |   Downloaded Datasets  |
|   extracted_text.txt   |   | use_cases.txt          |   |   resource_links.csv   |
+------------------------+   +------------------------+   +------------------------+
                                     |
                          +------------------------+
                          |                        |
                          |   keywords.txt         |
                          +------------------------+

Data Storage Files

The following files are created by the system to store the output:

File Name	Data Type	Description
`extracted_text.txt`	Plain Text	Extracted text from web pages based on keywords.
`use_cases.txt`	Plain Text	Generated use cases using AI (from extracted text).
`keywords.txt`	Plain Text	List of keywords for web scraping.
`resource_links.csv`	CSV	Resource links related to Kaggle datasets.

Sample Input and Output

Sample Input:(if needed)

Indian Companies by Industry

Company Name	Industry	Segment	Wikipedia Link
Tata Motors	Automotive	Manufacturing of commercial and passenger vehicles	Visit Wikipedia
Mahindra & Mahindra	Automotive	Manufacturing of SUVs, tractors, and commercial vehicles	Visit Wikipedia
Bajaj Auto	Automotive	Manufacturing of motorcycles, scooters, and three-wheelers	Visit Wikipedia
Larsen & Toubro (L&T)	Construction	Engineering, construction, and infrastructure development	Visit Wikipedia
DLF Limited	Real Estate	Residential and commercial property development	Visit Wikipedia
Godrej Properties	Real Estate	Residential and commercial real estate projects	Visit Wikipedia
Reliance Industries	Energy	Oil refining, petrochemicals, and renewable energy	Visit Wikipedia
Indian Oil Corporation	Energy	Oil refining, distribution, and marketing	Visit Wikipedia
NTPC Limited	Energy	Power generation and renewable energy	Visit Wikipedia
Infosys	IT Services	Software development, consulting, and IT outsourcing	Visit Wikipedia
TCS (Tata Consultancy Services)	IT Services	IT services, consulting, and business solutions	Visit Wikipedia
Wipro	IT Services	IT services, consulting, and digital transformation	Visit Wikipedia
HDFC Bank	Finance	Retail banking, corporate banking, and loans	Visit Wikipedia
ICICI Bank	Finance	Retail banking, corporate banking, insurance	Visit Wikipedia
Bajaj Finserv	Finance	Financial services including lending, insurance, and wealth management	Visit Wikipedia
Apollo Hospitals	Healthcare	Multispecialty hospitals and healthcare services	Visit Wikipedia
Fortis Healthcare	Healthcare	Multispecialty hospitals and diagnostics	Visit Wikipedia
Dr. Reddy's Laboratories	Pharmaceuticals	Manufacturing of generic drugs, active pharmaceutical ingredients (APIs)	Visit Wikipedia
Cipla	Pharmaceuticals	Manufacturing of generic drugs and respiratory care products	Visit Wikipedia
Flipkart	E-commerce/Retail	Online retail platform for electronics, fashion, groceries, etc.	Visit Wikipedia

Education Companies That Require GenAI

Company Name	Industry	Segment	Wikipedia Link
BYJU'S	Education	Online learning platform for K-12 students	Visit Wikipedia
Vedantu	Education	Online tutoring platform for school students	Not available on Wikipedia
Unacademy	Education	Online education platform for competitive exams	Not available on Wikipedia
Simplilearn	Education	Online certification training courses in technology and business	Not available on Wikipedia
Toppr (Acquired by BYJU'S)	Education	Online learning app for K-12 students	Not available on Wikipedia

Sample Output:

extracted_text.txt
This file contains the raw text extracted from websites based on the keywords provided.

Extracted Text
"Startup ideas are crucial for growing a business..."
"Entrepreneurship requires a strong vision and strategy..."

use_cases.txt
This file contains the AI-generated use cases derived from the extracted text.

Use Case
"AI for business growth"
"Use of AI in marketing"

keywords.txt
This file contains a list of keywords that the scraper will use to identify relevant content for extraction from websites.

Keyword
`startup ideas`
`entrepreneurship`
`AI use cases`

resource_links.csv
This file contains the Kaggle dataset resources related to the generated use cases.

Dataset Name	Link
"Business Growth AI"	https://www.kaggle.com/dataset-xyz
"Marketing AI"	https://www.kaggle.com/dataset-abc

File Structure

/Multi-agent architecture
│
├── main_agent.py            # Main controller that triggers agent execution
├── agents/                  # Folder containing agent scripts
│   ├── agent1_webscrap.py   # Web scraping agent
│   ├── agent2_usecase.py    # Use case generation agent
│   ├── agent3_resources.py  # Resource collection agent
│
├── data/                    # Folder where output files are saved
│   ├── extracted_text.txt   # Raw text scraped from URLs
│   ├── use_cases.txt        # Generated use cases
│   ├── keywords.txt         # Extracted keywords
│   ├── resource_links.csv   # Collected resource links in CSV format
│
├── .env                     # Contains API keys and environment variables
├── requirements.txt         # Python dependencies
├── myenv                    # Virtual environment directory

Sample Input & Output

Sample Input:

When running main_agent.py, you will provide one or more URLs as input:

python main_agent.py "https://example1.com" "https://example2.com"

Sample Output:

`extracted_text.txt`:

Text extracted from the website https://example1.com:
Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Text extracted from the website https://example2.com:
Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.

`use_cases.txt`:

Use Case 1: Text extraction from websites.
Description: This use case involves extracting relevant data from multiple sources.
...

`keywords.txt`:

Text, Website, Extraction, Use Case, AI, Automation

`resource_links.csv`:

URL,Category,Description
https://resource1.com,AI,Resource for AI research
https://resource2.com,Web Scraping,Guides for efficient web scraping

Execution Instructions

1. Clone the Repository

First, clone the repository to your local machine:

git clone https://github.com/your-username/multi-agent-automation.git
cd multi-agent-automation

2. Set Up the Environment

Create a virtual environment (optional but recommended):

python -m venv menv

Activate the virtual environment:

On Windows:
```
menv\Scripts\activate
```

On macOS/Linux:

source menv/bin/activate
 .\menv\Scripts\activate

3. Install Dependencies

Install the required Python dependencies:

pip install -r requirements.txt

4. Configure API Keys

Create a .env file in the root directory and add your API keys for Google Generative AI and Kaggle:

GOOGLE_API_KEY=your-google-api-key
KAGGLE_API_KEY=your-kaggle-api-key
JINA_API_KEY  =your-kaggle-api-key

5. Run the System

To run the multi-agent automation system, simply execute the main_agent.py script:

python main_agent.py

This will trigger the sequence of tasks:

Web scraping (Agent 1)
Use case generation (Agent 2)
Resource collection (Agent 3)

Run the main agent:

To execute the scraping process, run the main_agent.py with the list of URLs:

python main_agent.py "https://example1.com" "https://example2.com"

Requirements

Python 3.9 or later
Libraries listed in requirements.txt
Virtual environment for isolating dependencies

License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-Agent Automation System

Project Overview

Table of Contents

Architecture Diagram

Data Storage Files

Sample Input and Output

Sample Input:(if needed)

Indian Companies by Industry

Education Companies That Require GenAI

Sample Output:

File Structure

Sample Input & Output

Sample Input:

Sample Output:

`extracted_text.txt`:

`use_cases.txt`:

`keywords.txt`:

`resource_links.csv`:

Execution Instructions

1. Clone the Repository

2. Set Up the Environment

3. Install Dependencies

4. Configure API Keys

5. Run the System

Requirements

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
System_Architecture		System_Architecture
agents		agents
data		data
.gitignore		.gitignore
README.md		README.md
main_agent.py		main_agent.py
requirements.txt		requirements.txt

JANNATHA-MANISH/GENAI-MultiAgentXpert

Folders and files

Latest commit

History

Repository files navigation

Multi-Agent Automation System

Project Overview

Table of Contents

Architecture Diagram

Data Storage Files

Sample Input and Output

Sample Input:(if needed)

Indian Companies by Industry

Education Companies That Require GenAI

Sample Output:

File Structure

Sample Input & Output

Sample Input:

Sample Output:

extracted_text.txt:

use_cases.txt:

keywords.txt:

resource_links.csv:

Execution Instructions

1. Clone the Repository

2. Set Up the Environment

3. Install Dependencies

4. Configure API Keys

5. Run the System

Requirements

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

`extracted_text.txt`:

`use_cases.txt`:

`keywords.txt`:

`resource_links.csv`:

Packages