This repository provides two approaches for extracting business intelligence data from Crunchbase:
- Basic Scraper Script: Lightweight, browser-automated scraper for limited data collection.
- Bright Data Crunchbase Scraper API: A robust, scalable, and maintenance-free solution for high-volume and reliable data extraction.
- Basic Crunchbase Scraper
- Bright Data Crunchbase Scraper API
- API Configuration & Delivery Options
- No-Code Scraper Interface
- Alternative: Pre-Collected Crunchbase Datasets
- Resources & Support
A Python implementation demonstrating how to extract fundamental company data from Crunchbase profiles.
This script collects publicly available data points, including:
- Company fundamentals (description, website, founding date)
- Contact information (email, phone)
- Operational metrics (status, employee count, location)
- Leadership information (founders)
- Industry classifications
- Python 3.x installed
- SeleniumBase library:
pip install seleniumbase
-
Get the Code: Access the script file here: free-crunchbase-scraper/crunchbase-scraper.py
-
Set Target URL: Open the script and modify the
target_url
variable to the specific Crunchbase company profile you wish to scrape.target_url = "https://www.crunchbase.com/organization/your-target-company"
-
Run the Script: Execute the script from your terminal:
python crunchbase-scraper.py
💡 Note: This script uses SeleniumBase, an advanced Selenium wrapper with built-in tools for handling CAPTCHAs and other browser challenges. Learn more: Web Scraping with SeleniumBase and SeleniumBase with Proxies.
The script extracts structured data in the following format:
This approach encounters significant web scraping challenges that make it unsuitable for production-scale data collection:
-
IP Blocking & Rate Limiting: Crunchbase actively monitors and limits requests from individual IP addresses. Your IP will likely be blocked quickly after some scraping attempts.
-
Sophisticated Anti-Bot Measures: Crunchbase employs advanced security, including CAPTCHAs (like Cloudflare Turnstile) and behavioral analysis, specifically designed to detect and block automated scripts.
-
Dynamic Website Structure: Crunchbase frequently updates its website layout and code. Any change can break the script, requiring constant, time-consuming maintenance.
-
Scalability Issues: This method cannot scale to handle multiple URLs efficiently or process large volumes of data.
-
Maintenance Overhead: You are responsible for managing infrastructure, handling blocks, updating the script, and ensuring compliance.
The Bright Data Crunchbase Scraper API provides a robust, scalable, and hassle-free way to extract comprehensive data from Crunchbase without dealing with the complexities of scraping.
- Bypasses Technical Challenges: Automatically handles IP blocks, CAPTCHAs, and rate limits using advanced proxy rotation and web unlocking technology.
- Enterprise Scalability: Designed for high-volume data collection.
- High Reliability: Ensures consistent data delivery with enterprise-grade uptime.
- Developer-Friendly: Simple API integration eliminates complex scraper development and maintenance.
- Structured Data Format: Delivers clean, normalized data ready for analysis.
- Regulatory Compliance: Adheres to data privacy regulations, including GDPR and CCPA.
- Flexible Pricing: Pay-as-you-go model based on successful data delivery.
- Dedicated Support: Access 24/7 expert technical support.
- Implementation Options: Use the API programmatically or through the No-Code Scraper interface.
- Create Account: Sign up for a Bright Data account (New users receive $5 credits after adding a payment method).
- Generate API Token: Obtain your unique API key from your dashboard.
- Implementation Guide: For detailed configuration steps for both API methods and No-Code interface, see: setup-bright-data-crunchbase-scraper.md
The API offers two primary data collection approaches:
Retrieves comprehensive profile information for specific Crunchbase company URLs.
Input Parameters:
Parameter | Required | Description |
---|---|---|
url |
Yes | The full Crunchbase company URL. |
Example Request (Python):
config = {
"api_token": "YOUR_API_TOKEN", # Replace with actual token
"organizations": [
{"url": "https://www.crunchbase.com/organization/apple"},
{"url": "https://www.crunchbase.com/organization/brightdata"},
],
"output_file": "crunchbase-company-profiles.json", # Optional custom filename
}
# ... rest of the script uses this config
- Replace
"YOUR_API_TOKEN"
with your actual Bright Data API token. - Modify the
organizations
list with your target Crunchbase URLs. - See the full runnable script: crunchbase-scraper-api/crunchbase-profile-fetcher.py
Example Request (cURL):
curl -X POST \
"https://api.brightdata.com/datasets/v3/trigger?dataset_id=gd_l1vijqt9jfj7olije&include_errors=true" \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '[{"url":"https://www.crunchbase.com/organization/apple"},{"url":"https://www.crunchbase.com/organization/brightdata"}]'
Sample Output Snippet:
The API returns comprehensive, structured data. Below is a small fraction of the available fields for a single company:
{
"companyName": "Bright Data",
"legalName": "Bright Data",
"website": "https://brightdata.com",
"description": "Offers a platform for ethical web data collection and analysis...",
"foundedDate": "2014-01-01",
"location": {"city": "New York", "state": "New York", "country": "United States"},
"companyType": "For-Profit",
"operatingStatus": "Active",
"ipoStatus": "Private (Acquired)",
"employeeSizeRange": "251-500",
"industries": ["Business Intelligence", "Cloud Data Services", "..."],
"keyPersonnel": {
"ceo": {"name": "Or Lenchner", "...": "..."},
"founders": [{"name": "Derry Shribman", "...": "..."}, {"name": "Ofer Vilenski", "...": "..."}]
},
"webTraffic": {"monthlyVisits": 865525, "source": "Semrush", "...": "..."},
"technology": {"activeTechCount": 19, "exampleTechUsed": ["Cloudflare Hosting", "..."]},
"products": {"totalActive": 23, "exampleProductNames": ["Residential Proxies", "..."]},
"acquisitionDetails": {"acquiredBy": "EMK Capital", "priceUSD": 200000000, "...": "..."},
"intellectualProperty": {"patentsGranted": 199, "trademarksRegistered": 18}
// Additional data fields available
}
View complete sample response: crunchbase-data/crunchbase-company-profiles.json
Identifies companies associated with specific keywords or industries (e.g., "AI", "Venture Capital", "SaaS").
Input Parameter:
Parameter | Required | Description |
---|---|---|
keyword |
Yes | The keyword(s) to search for related companies. |
Example Request (Python):
config = {
"api_token": "YOUR_API_TOKEN", # Replace with actual token
"keywords": [
{"keyword": "AI"},
{"keyword": "Venture Capital"},
{"keyword": "SaaS"}
# Add more keywords as needed
],
"output_file": "crunchbase-keyword-results.json", # Optional: Customize output filename
}
# ... (script uses this config to make the API call)
- Replace
"YOUR_API_TOKEN"
. - Modify the
keywords
list. - See the full runnable script:
crunchbase-scraper-api/crunchbase-keyword-search.py
Example Request (cURL):
curl -X POST \
"https://api.brightdata.com/datasets/v3/trigger?dataset_id=gd_l1vijqt9jfj7olije&include_errors=true&type=discover_new&discover_by=keyword" \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '[{"keyword":"AI"},{"keyword":"Venture Capital"}]'
Sample Output Snippet:
The response includes data for multiple companies matching the keyword search. This shows the structure for one result:
{
"companyName": "Airbus", // Example result for "AI" keyword
"legalName": "Airbus Defense and Space Holdings, Inc.",
"website": "https://us.airbus.com",
"description": "Airbus designs, manufactures, and delivers aerospace products...",
"foundedDate": "2014-01-01",
"location": {
"city": "Herndon",
"state": "Virginia",
"country": "United States"
},
"companyType": "For-Profit",
"operatingStatus": "Active",
"ipoStatus": "Private",
"employeeSizeRange": "10001+",
"industries": [
"Aerospace",
"Commercial",
"Manufacturing"
],
// ... includes similar detailed fields as the 'Collect by URL' method
}
View complete sample response: crunchbase-data/crunchbase-keyword-results.json
Customize your data collection jobs using additional parameters within the API request:
Parameter | Type | Description | Example |
---|---|---|---|
limit |
integer |
Sets the maximum number of results per input (URL or keyword). | limit=50 |
include_errors |
boolean |
Includes detailed error information in the response if issues occur. | include_errors=true |
format |
enum |
Specifies the desired output format (json , csv , ndjson ). |
format=csv |
notify |
url |
Provides a webhook URL to receive notifications upon job completion. | notify=https://... |
Data can be delivered directly to your preferred external storage or via a webhook.
For comprehensive documentation on the Web Scraper API and triggering collections, see:
For users who prefer a visual, point-and-click approach, Bright Data also offers the No-Code Scraper. This interface allows you to configure and launch Crunchbase data collection tasks using the same powerful underlying infrastructure, without writing any code. See the Setup Guide for guidance.
If you require immediate access to large amounts of structured Crunchbase data without running scraping jobs yourself, consider Bright Data's pre-collected Crunchbase Datasets.
- Ready-to-Use: Access validated and structured Crunchbase data instantly.
- Comprehensive Coverage: Datasets include over 100 data points per company profile.
- Regular Updates: Choose from various data freshness options (daily, weekly, monthly, or custom).
- Flexible Purchase Options: Acquire the entire dataset or specific subsets tailored to your needs and budget.
- Easy Integration: Integrate datasets seamlessly via API or direct download.
- Sample Data Available: Request a sample to evaluate data quality and fit.
- Bright Data Documentation:
- Guides & Blog Posts:
- Technical Support: Contact the Bright Data support team 24/7 via your account dashboard or email at [email protected].