A powerful AI-powered web agent built with Mastra that can search, scrape, and extract data from the open web with ease.
This agent combines OpenAI's GPT-4o-mini model with Bright Data's powerful SDK to create an intelligent web research assistant that can:
- Search the web using Google, Bing, or Yandex with anti-bot protection
- Scrape website content in clean markdown format
- Extract detailed Amazon product information (pricing, reviews, specifications)
- Collect LinkedIn profile data (experience, education, skills)
- Maintain conversation context with persistent memory
- Provide accurate, well-sourced responses with citations
The easiest way to start building Mastra agents with web access - simply add your API keys and start exploring the web!
- AI-Powered Intelligence: Uses OpenAI GPT-4o-mini for intelligent reasoning and responses
- Bright Data SDK Integration: Direct integration with Bright Data's powerful web data tools
- Multiple Data Sources: Search engines, web scraping, Amazon products, LinkedIn profiles
- Anti-Bot Protection: Automatic CAPTCHA and bot detection bypass
- Persistent Memory: Maintains conversation history using LibSQL database
- Source Citations: Always provides URLs and source attribution
- Structured Logging: Built-in logging with Pino for monitoring and debugging
- TypeScript Support: Fully typed for better development experience
- Web Agent (src/mastra/agents/web-agent.ts): Main AI agent with web research capabilities
- Bright Data Tools (src/mastra/tools/web-tools.ts): SDK integration for web scraping, search, and data extraction
- Mastra Core (src/mastra/index.ts): Central configuration and orchestration
@mastra/core- Main framework for agents and workflows@brightdata/sdk- Bright Data SDK for web scraping and data collection@mastra/memory- Persistent memory management@mastra/libsql- LibSQL database adapter@mastra/loggers- Logging utilitieszod- Schema validation for tool inputs
- Node.js: >= 20.9.0
- npm: Latest version
- Bright Data API Key: For web scraping and data collection capabilities
- OpenAI API Key: For GPT-4o-mini model access
git clone https://github.com/brightdata/brightdata-mastra-tools
cd brightdata-tools-test
npm installCopy the example environment file and add your API keys:
cp .env.example .envEdit .env and add your API keys:
OPENAI_API_KEY=your_openai_api_key_here
BRIGHTDATA_API_KEY=your_brightdata_api_key_hereBright Data API Key:
- Sign up at Bright Data
- Navigate to your dashboard
- Generate an API key for SDK access
- Enable the zones you need (automatically created if
autoCreateZones: true)
OpenAI API Key:
- Go to OpenAI Platform
- Navigate to API keys section
- Create a new API key
- Ensure you have access to GPT-4o-mini model
The agent uses LibSQL for persistent memory:
- Development: Uses in-memory database (
:memory:) in src/mastra/index.ts - Agent Memory: Uses file-based storage (
file:../mastra.db) for conversation history
No additional setup required - the database will be created automatically.
Start the development server with hot reloading:
npm run devBuild the application:
npm run buildnpm run startThe Web Agent is designed to provide comprehensive web research capabilities:
- Web Search: Search across Google, Bing, or Yandex
- Localized Results: Specify country codes for region-specific results
- Multiple Formats: Get results in HTML or clean markdown format
- Anti-Bot Protection: Automatic bypass of CAPTCHAs and bot detection
- Clean Markdown: Extract website content in readable markdown format
- Proxy Support: Use country-specific proxies for geo-restricted content
- CAPTCHA Bypass: Automatically handle anti-bot protection
- Raw Content: Access unprocessed website data
- Product Details: Get pricing, ratings, reviews, and specifications
- Location-Specific: Provide ZIP codes for regional pricing and availability
- Comprehensive Data: Access detailed product information and reviews
- Structured Output: Receive data in clean JSON format
- Professional Data: Collect work experience, education, and skills
- Batch Processing: Fetch multiple profiles in a single request
- Multiple Formats: Get results in JSON or JSONL format
- Detailed Profiles: Access comprehensive professional information
- Conversation History: Remember previous interactions
- Context Awareness: Build on prior research results
- Session Persistence: Maintain context across sessions
- Research Tracking: Store search and scraping history for reference
The agent has access to four powerful Bright Data tools:
searchTool({
query: "your search query",
searchEngine: "google" | "bing" | "yandex",
country: "us", // optional 2-letter country code
dataFormat: "markdown" | "html"
})scrapeTool({
url: "https://example.com",
country: "us" // optional 2-letter country code
})amazonProductTool({
url: "https://amazon.com/dp/PRODUCTID",
zipcode: "10001" // optional for location-specific data
})linkedinCollectProfilesTool({
urls: ["https://www.linkedin.com/in/profile1", "https://www.linkedin.com/in/profile2"],
format: "json" | "jsonl"
})brightdata-tools-test/
├── src/
│ └── mastra/
│ ├── agents/
│ │ └── web-agent.ts # Main AI agent
│ ├── tools/
│ │ └── web-tools.ts # Bright Data SDK tools
│ └── index.ts # Mastra configuration
├── .env.example # Environment template
├── .gitignore # Git ignore rules
├── package.json # Dependencies and scripts
├── tsconfig.json # TypeScript configuration
└── README.md # This file
The agent is configured in src/mastra/agents/web-agent.ts:
- Model: OpenAI GPT-4o-mini
- Memory: Persistent LibSQL file storage (
file:../mastra.db) - Tools: Four Bright Data tools (search, scrape, Amazon, LinkedIn)
- Instructions: General-purpose web research with clear citation guidelines
Tools are configured in src/mastra/tools/web-tools.ts:
- API Key: Required for all tools
- Auto Create Zones: Automatically creates Bright Data zones when needed
- Input Validation: Zod schemas ensure correct input format
- Error Handling: Comprehensive error messages for debugging
Storage settings in src/mastra/index.ts:
- Observability: In-memory database for observability data
- Agent Memory: File-based storage for conversation history (relative to
.mastra/outputdirectory)
"Bright Data API key is required to initialize tools"
- Ensure
.envfile exists with validBRIGHTDATA_API_KEY - Check that the API key is not expired
- Verify the API key has permissions for SDK access
"OpenAI API key not found"
- Verify
OPENAI_API_KEYin.env - Ensure the API key has access to GPT-4o-mini model
- Check your OpenAI account has available credits
Tool initialization failures
- Check which specific tools failed (error message will list them)
- Verify your Bright Data account has access to required datasets
- Ensure zones are properly configured (or
autoCreateZones: trueis set)
Search or scrape timeouts
- Check internet connectivity
- Verify Bright Data service status
- Consider network latency and proxy location
Amazon or LinkedIn tool errors
- Ensure URLs are in correct format (Amazon: must contain
/dp/or/gp/product/) - Verify LinkedIn URLs are valid profile URLs
- Check that your Bright Data plan includes dataset access
The application uses Pino for structured logging. Logs include:
- Agent responses and reasoning
- Tool calls and responses
- Error details and stack traces
- Performance metrics
- API interactions
Check logs in development mode for detailed debugging information.
- New Tools: Add to src/mastra/tools/ directory
- Agent Modifications: Update src/mastra/agents/web-agent.ts
- Configuration Changes: Modify src/mastra/index.ts
The brightDataTools function in web-tools.ts supports:
- Selective Tool Loading: Use
excludeToolsto disable specific tools - Custom Clients: Modify
createBrightDataClientfor custom configuration - Additional Tools: Add new tool creators following existing patterns
Example:
const tools = brightDataTools({
apiKey: process.env.BRIGHTDATA_API_KEY!,
excludeTools: ['linkedinCollectProfiles'] // Disable LinkedIn tool
});Currently no tests are configured. To add testing:
npm install --save-dev jest @types/jest ts-jestCreate a jest.config.js:
module.exports = {
preset: 'ts-jest',
testEnvironment: 'node',
};Ask the agent to research any topic and it will use search and scrape tools to gather current information:
"What are the latest developments in quantum computing?"
Get detailed Amazon product information:
"Compare the features and reviews of the top 3 noise-cancelling headphones"
Scrape competitor websites and analyze their offerings:
"Analyze the pricing structure on example.com and compare it to industry standards"
Collect LinkedIn profiles for research:
"Get the professional background of executives at [company name]"
- API Keys: Never commit
.envfile to version control - Data Privacy: Use tools responsibly and respect privacy regulations
- Rate Limiting: Bright Data handles rate limiting automatically
- Personal Data: Avoid collecting unnecessary personal information
ISC
For issues with:
- Mastra: Visit Mastra Documentation
- Bright Data SDK: Check Bright Data Documentation
- This Project: Open an issue in the repository
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
Built with Mastra and Bright Data