The research assistant chatbot takes natural language user input from consultants and helps them prepare briefing notes for a given company.
A typical user-agent interaction is shown below:
User: Please generate a company briefing for Apple.
Agent:
# Apple Company Briefing
## Overview
- **Company Name**: Apple
- **Industry**: Technology/Consumer Electronics
- **Founded**: 1976
- **Headquarters**: Cupertino, California
- **CEO**: Tim Cook
- **Revenue**: $394.3B (2023)
- **Employees**: 161,000+
- **Stock Symbol**: AAPL
- **Market Cap**: $3T+
## Key Products
- iPhone
- iPad
- Mac
- Apple Watch
- AirPods
- Apple TV
- HomePod
## Internal Projects (Redacted for Security)
[REDACTED]
## Risk Factors
- China market dependency
- Supply chain risks
- Regulatory scrutiny
## Recent Developments
- Vision Pro launch
- M3 chip announcement
- AI integration across products
## Public Products
- Consumer electronics
- Services
- Wearables
## Partnerships
- Samsung
- TSMC
- Foxconn
- OpenAI
## Recent News
- iPhone 15 Pro sales exceed expectations
- Vision Pro mixed reality headset development
- AI integration across product lineup
The Research Assistant Agent implements the ReAct (Reasoning and Acting) pattern to systematically gather and process company information through a series of thought-action-observation cycles.
-
User Input: e.g., "Generate Tesla briefing"
-
Agent Initialization
-> LLM Client
-> Tool Registry
-> Output Parser
-
Execution Loop (ReAct Pattern):
-> Thought β Action β Action Input β Tool Execution β Observation
-> Output parsed and evaluated
-> Loop continues until task is complete
- Main orchestrator class that coordinates all agent activities
- Manages tool registry and execution flow
- Handles configuration and error recovery
- Interfaces with the language model
- Processes reasoning and generates actions
- Maintains conversation context
- Controls the execution loop with safety measures
- Implements timeout and iteration limits
- Manages error handling and retries
- Parses LLM output into structured actions
- Handles malformed responses with fallback mechanisms
- Extracts Final Answer or Action/Action Input pairs
| Tool | Purpose | Input | Output |
|---|---|---|---|
get_company_info |
Retrieve internal company data | Company name | JSON company profile |
web_search |
Gather public information | Search query | Structured web results |
translate_document |
Localize content | Document + target language | Translated text |
generate_document |
Create formatted reports | Raw data | Structured briefing |
security_filter |
Remove sensitive data | Document | Sanitized content |
- get_company_info
@tool
def get_company_info(company_name: str) -> str:
"""Get the company information from the internal database."""- Input: Company name (cleaned and normalized)
- Process: MongoDB query with fallback to mock data
- Output: JSON-formatted company profile
- web_search
@tool
def web_search(query: str) -> str:
"""Perform a web search for the given query."""- Input: Search query string
- Process: External API call for recent information
- Output: Structured search results
- translate_document
@tool
def translate_document(document: str, target_language: str) -> str:
"""Translate the document into the specified language."""- Input: Document text and target language code
- Process: Translation API call with fallback to mock data
- Output: Translated document text
- generate_document
@tool
def generate_document(content: str) -> str:
"""Generate a structured briefing document."""- Input: Raw content (JSON or text)
- Process: Document formatting with headers and metadata
- Output: Professional briefing document
- security_filter
@tool
def security_filter(document: str) -> str:
"""Filter out sensitive information from the document."""- Input: Document text
- Process: Regex and NLP techniques to sanitize content
- Output: Sanitized document ready for public use
agent = ResearchAssistantAgent()
result = agent.execute_task("Generate a company briefing for Tesla")The agent follows this pattern for each cycle:
Thought: I need to gather company information first from the internal
database, then perform a web search for any updated information.
Action: get_company_info
Action Input: Tesla
Observation: {
"company_id": "tesla_inc",
"name": "Tesla",
"industry": "Automotive/Clean Energy",
"founded": "2003",
"headquarters": "Austin, Texas",
"ceo": "Elon Musk",
"revenue": "$96.8B (2023)",
...
}
The agent evaluates observations and decides whether to:
- Continue with more actions
- Gather additional information
- Proceed to final answer generation
Final Answer: [Generated briefing document with all gathered information]
- Create environment and install dependencies:
# create a virtual environment (e.g. conda)
conda create -n research_agent python=3.10
conda activate research_agent
# install the requirements
pip install -r requirements.txt- If you want to use MongoDB, you can run a dockerized instance:
docker run -d \
--name mongodb \
-p 27017:27017 \
-e MONGO_INITDB_ROOT_USERNAME=root \
-e MONGO_INITDB_ROOT_PASSWORD=root \
mongo
Add the MongoDB variables to your environment variables (.env file):
MONGO_HOST=localhost
MONGO_PORT=27017
MONGO_USER=root
MONGO_PASS=rootIf you do not want to use MongoDB, you can set the USE_MONGO variable to False in the config.py file.
USE_MONGO = FalseThe agent will then use mock data for company information.
- To use LLM from Huggingface API, set the
HUGGINGFACE_API_TOKENin your environment variables (.env file):
research_assistant_chatbot/
βββ π data/
β βββ π generated_company_profiles.json # Cached company research data
βββ π figures/
β βββ πΌοΈ chatbot.png # UI screenshots
β βββ πΌοΈ react.png # Component diagrams
βββ π src/
β βββ π agent/
β β βββ π __init__.py
β β βββ π framework.py # Core AI agent logic
β β βββ π llm.py # LLM client interface
β β βββ π tools.py # Tool definitions and registry
β βββ π database/
β β βββ π __init__.py
β β βββ π data_generator.py # Generate synthetic data
β β βββ π data_manager.py # Add synthetic data to MongoDB
β β βββ π mongodb.py # MongoDB connection handler
β βββ π config.py # Configuration settings for project
β βββ π prompts.py # AI prompt templates
βββ π testing/
β βββ π strategy.MD # Testing strategy documentation
β βββ π tool_eval.py # Tool evaluation scripts
βββ π app.py # Main Streamlit application
βββ π .env # Environment variables
βββ π .gitignore # Git ignore rules
βββ π README.md # Project documentation
βββ π requirements.txt # Python dependencies
We use an open-source LLM (Mistral-7B-Instruct) to generate synthetic company data for testing purposes. This data is stored in a JSON file.
python -m src.database.data_generator --num_samples 10 --save_path ./data/generated_company_profiles.json
# This script generates synthetic company data for testing purposes.The generated data is saved as JSON file in the data/ directory, which can be used for testing the agent's functionality.
# From your project root
python -c "from src.database.data_manager import add_data_to_database; add_data_to_database('data/generated_company_profiles.json')"Example output:
INFO:root:Connected to MongoDB at mongodb://root:root@localhost:27017/, database: research_assistant, collection: companies
INFO:root:Company Tesla already exists in the database.
INFO:root:Company Apple already exists in the database.
INFO:root:Company Apple already exists in the database.
Already exists: Apple
INFO:root:Inserted company: Amazon.com Inc. with ID: 6890ad6173eabdcc291f786a
Added: Amazon.com Inc.
INFO:root:Inserted company: Microsoft Corporation with ID: 6890ad6173eabdcc291f786b
Added: Microsoft Corporation
INFO:root:Inserted company: Google LLC with ID: 6890ad6173eabdcc291f786c
Added: Google LLC
INFO:root:Inserted company: Facebook / Meta Platforms Inc. with ID: 6890ad6173eabdcc291f786d
Added: Facebook / Meta Platforms Inc.
INFO:root:Inserted company: Alibaba Group with ID: 6890ad6173eabdcc291f786e
Added: Alibaba Group
INFO:root:Inserted company: Tencent Holdings Limited with ID: 6890ad6173eabdcc291f786f
Added: Tencent Holdings Limited
INFO:root:Inserted company: Samsung Electronics with ID: 6890ad6173eabdcc291f7870
Added: Samsung Electronics
INFO:root:Inserted company: Sberbank with ID: 6890ad6173eabdcc291f7871
Added: Sberbank
INFO:root:Inserted company: JPMorgan Chase & Co. with ID: 6890ad6173eabdcc291f7872
Added: JPMorgan Chase & Co.
INFO:src.database.mongodb:MongoDB connection closed.
Done!
# In config.py
MODEL_NAME = "Qwen/Qwen3-4B" # or mistralai/Mistral-7B-Instruct-v0.3, "Qwen/Qwen2.5-14B-Instruct"python -m src.agent.framework --instruction "Please generate a company briefing for Apple. "python -m src.agent.framework --instruction "Please generate a company briefing for Apple in german."π INPUT: "Generate a company briefing for Tesla in English"
π€ Thought: I need to gather company information first from the internal database...
β‘ Action: get_company_info
π₯ Action Input: Tesla
ποΈ Observation: {"company_id": "tesla_inc", "name": "Tesla", "industry": "Automotive/Clean Energy"...}
π€ Thought: Now I should get recent developments from web search...
β‘ Action: web_search
π₯ Action Input: Tesla recent developments
ποΈ Observation: [{"recent_news": ["Tesla reports record Q3 2024 deliveries"...]}]
π€ Thought: I have enough information to generate the comprehensive document...
β‘ Action: generate_document
π₯ Action Input: {"company_info": {...}, "web_info": {...}}
ποΈ Observation:
COMPANY BRIEFING DOCUMENT
========================
{company data and web results formatted}
========================
Generated: 2025-08-03 23:05:54
Classification: INTERNAL USE
Status: COMPLETED
π€ Thought: I now have the complete briefing document
β
Final Answer: [Complete Tesla briefing with internal and external data]
- You can also run the application with the UI using streamlit:
streamlit run app.pyThe application should be up on 'Local URL: http://localhost:8501'.
- We outline the testing strategy in the
testing/strategy.mdfile. - For debugging the tool outputs, we use the
Opiktool to log and analyze execution traces.
# 1. add the Opik api key to .env file
# 2. test company_info tool; you can use any dataset name, it will create the dataset with test inputs.
python -m testing.tool_eval --tool company_info --dataset_name synthetic_companies
# 3. test web_search tool; you can use any dataset name, it will create the dataset with test inputs.
python -m testing.tool_eval --tool web_search --dataset_name synthetic_companies
# 4. test generate_document tool; you can use any dataset name, it will create the dataset with test inputs.
python -m testing.tool_eval --tool document_generation --dataset_name synthetic_documents
# 5. test security_filter tool; you can use any dataset name, it will create the dataset with test inputs.
python -m testing.tool_eval --tool security_filter --dataset_name sensitive_documentsOpik gives the results of the tool execution, including the input, output, and any errors encountered.
This allows for easy debugging and optimization of the tools used by the agent.
You can also use the Opik dashboard to visualize the execution traces and analyze the performance of each tool.
Lastly, Opik gives a score for each test between 0 and 1, where 1 means the tool passed all tests and 0 means it failed all tests.
- Monitor execution times and optimize slow tools
- Review parsing error patterns for prompt improvements
- Implement proper error handling in custom tools
- Use structured logging for better debugging
- Be specific in task descriptions
- Allow sufficient processing time for complex queries
- Review intermediate steps for debugging failed tasks
- Use appropriate language specifications for translations

