Skip to content

A fast and efficient JavaScript-based web scraping tool for extracting transportation company leads in Texas, including comprehensive contact information.

Notifications You must be signed in to change notification settings

spence709/scraping-tool

Repository files navigation

Transportation Lead Scraper 🚚

A fast and efficient JavaScript-based web scraping tool for extracting transportation company leads in Texas, including comprehensive contact information.

Features ✨

  • Multi-Source Scraping: Yellow Pages, business directories, and company websites
  • Contact Extraction: Automatically extracts emails, phone numbers, and owner/executive information
  • Website Enrichment: Visits company websites to find additional contact details
  • CSV Export: Outputs clean, formatted CSV files ready for use
  • Mock Data Generation: Fast testing mode with realistic sample data
  • Deduplication: Removes duplicate entries automatically

Installation 📦

# Install dependencies
npm install

Quick Start 🚀

Generate Sample Data (Fast)

npm run example

This generates 200 mock transportation companies instantly for testing.

Run Full Scraper

npm run scrape

Usage Examples 💡

Basic Usage

import { scrapeTransportationCompanies } from './scraper.js';

const results = await scrapeTransportationCompanies({
  location: 'Texas',
  targetCount: 200,
  useMockData: true
});

Advanced Usage with Web Enrichment

const results = await scrapeTransportationCompanies({
  location: 'Texas',
  searchTerms: ['transportation', 'trucking', 'freight', 'logistics'],
  targetCount: 200,
  useMockData: false,
  enrichWebsites: true  // Extract emails from company websites
});

Output Format 📊

The scraper generates CSV files with the following columns:

Column Description
Business Name Company name
Email Primary business email
Phone Primary business phone
Contact Name Owner/Executive name
Contact Title Job title (Owner, CEO, President, etc.)
Contact Email Direct contact email
Contact Phone Direct contact phone
Address Street address
City City name
State State (TX)
Website Company website URL
Source Data source

Configuration Options ⚙️

Option Type Default Description
location string 'Texas' State to search
searchTerms array ['transportation', 'trucking', 'freight', 'logistics'] Keywords to search
targetCount number 200 Number of companies to extract
useMockData boolean true Use generated data for testing
enrichWebsites boolean false Visit websites to extract contact info

Project Structure 📁

scraping-tool/
├── scrapers/
│   ├── yellowPagesScraper.js      # Yellow Pages scraper
│   ├── businessDirectoryScraper.js # Business directory scraper
│   ├── websiteScraper.js          # Website enrichment
│   └── mockDataGenerator.js       # Test data generator
├── utils/
│   ├── extractors.js              # Email/phone extraction
│   └── csvWriter.js               # CSV export
├── scraper.js                     # Main scraper
├── example.js                     # Example usage
└── output/                        # Generated CSV files

Performance ⚡

  • Mock Data Mode: Generates 200 companies in < 1 second
  • Basic Scraping: 200 companies in ~10-15 minutes
  • With Website Enrichment: 200 companies in ~30-60 minutes

Important Notes ⚠️

  1. Rate Limiting: The tool includes delays to respect website rate limits
  2. Legal Compliance: Ensure compliance with website terms of service and data protection laws
  3. Data Accuracy: Real scraping depends on source website structure (may change)
  4. Mock Mode: Default mode uses generated data - set useMockData: false for real scraping

Troubleshooting 🔧

Issue: Module errors

Solution: Make sure to run npm install first

Issue: Puppeteer fails to launch

Solution: On Windows, you may need to install Chrome/Chromium

Issue: No data extracted

Solution: Website structures change - check scraper selectors may need updating

Sample Output 📄

Check the output/ directory for generated CSV files. Each run creates a timestamped file.

Example: texas_transportation_companies_2025-10-10.csv

License 📜

MIT

Support 💬

For issues or questions, please review the code comments and examples provided.

About

A fast and efficient JavaScript-based web scraping tool for extracting transportation company leads in Texas, including comprehensive contact information.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published