TerraForma - Business Open/Closed Prediction

Predicting whether businesses listed in Overture Maps are currently open or permanently closed, using a 6-signal ensemble approach built on top of Overture Places data.

See also: Approach 2 (terraforma-v1 branch) — a separate CatBoost + LightGBM pipeline that trains on 45+ Overture features with a web crawl + LLM (Llama) feedback loop. That approach uses signals only for training labels and needs no API calls at inference, making it cheaper to scale. It reached 70.2% balanced accuracy with 80.6% closed recall.

Overview

This project builds a metamodel that combines 6 independent signals to predict business status:

Signal	Weight	Description
XGBoost	2.258 (highest)	19-feature model trained on Overture place attributes
Foursquare	1.017	Cross-references Foursquare venue data
Website	1.008	Checks if business website is alive/dead
Yelp	0.420	Yelp review activity and status
Text/OCR	0.270	Text signals (OCR from Mapillary was explored but dropped — imagery too outdated)
TomTom	0.006	TomTom POI cross-reference

A logistic regression metamodel combines these signal scores into a final open/closed prediction.

Results

85-93% accuracy across 5 test cities (SF, LA, Chicago, Miami, Philadelphia)
XGBoost model-only accuracy: 51.8% baseline -> 62.5% after retraining with signal labels
Trained on 6,367 labeled samples (4,977 open, 1,390 closed)

Training Data

The XGBoost model was trained on 6,367 Overture places with known open/closed labels from 3 sources:

Source	Samples	Closed Rate	Description
Overture Project C (original)	3,179	8.7%	Overture's own labeled sample dataset — mostly open businesses
Overture Project C (updated)	2,740	39.7%	Updated batch with more balanced closed representation
Yelp API	448	5.8%	Yelp `is_closed` field for businesses matched to Overture

Each sample has 8 raw Overture features (confidence, source_age_days, has_website, has_phone, has_brand, address_complete, category, fields_populated) which get expanded into 19 features via engineering. Only 448 samples (7%) have Yelp rating/review data — the rest use NaN (XGBoost handles missing values natively).

The metamodel was evaluated on 407 test samples across 5 cities using Leave-One-City-Out cross-validation, where all 6 signals (XGBoost, Foursquare, Website, Yelp, Text, TomTom) score each business independently.

XGBoost Features (19 total)

Base features (10): category present, has phone, has website, has email, source count, has social media, address completeness, has brand, name length, has hours

Engineered features (9): old source flag, sparse record, category closure rate, multi-source agreement, contact richness, chain indicator, address quality, digital presence, data completeness

Top feature by importance: category_closure_rate (28.1%)

Iterative Retraining

The retraining pipeline uses high-confidence signal outputs as training labels to progressively improve the XGBoost model-only accuracy:

Round	Training Samples	Avg Accuracy	Best Improvement
R0: Baseline	6,367	51.8%	--
R1: +Yelp labels	6,655	62.5%	Miami 47->73.5%, Philly 60->89.2%
R2: +Foursquare	6,861	56.2%	--
R3: +Website	7,146	54.1%	--
R4: +Metamodel	7,369	60.0%	SF 50->60.5%, Miami 47->76.5%

Yelp labels provided the single biggest accuracy boost. The full ensemble still outperforms model-only predictions, but retraining narrows the gap.

Recent Improvements

SMOTE oversampling: Addresses the 3.6:1 class imbalance (4,977 open vs 1,390 closed) by synthetically generating minority-class samples, improving closed-business detection
Early stopping: XGBoost now uses early stopping (30 rounds) during both grid search and final training to prevent overfitting
Expanded hyperparameter search: Added deep-tree + strong regularization and shallow-wide ensemble configs to the grid search
SMOTE-aware cross-validation: SMOTE is applied per-fold during CV (only on training splits) to avoid data leakage

Project Structure

training/                  # Model training
  train_xgboost.py         # XGBoost classifier (19 features, grid search, Platt scaling, SMOTE)
  train_metamodel.py        # Logistic regression metamodel over 6 signals
  retrain_pipeline.py       # Iterative retraining using signal labels
  feature_engineering.py    # Feature extraction from Overture place data

signals/                   # External signal checkers
  check_website_liveness.py
  check_facebook.py
  check_tomtom.py
  enrich_yelp.py
  ocr_model.py             # OCR from Mapillary (dropped)
  run_vision.py

scoring/                   # Prediction & optimization
  predict.py
  generate_predictions.py
  optimize_*.py            # Threshold optimization variants

data/
  ingest/                  # Data download & extraction
  labeling/                # Ground truth collection
  candidates/              # Overture candidate JSONs per city
  training_data/           # Training datasets (yelp_training_data.json)

model/                     # Saved models & weights
  metamodel.json           # Metamodel weights & LOCO-CV results
  xgboost_model.json
  xgb_feature_importance.json

evaluation/                # Evaluation outputs
  retrain_results.json     # Per-round retraining accuracy
  confusion_matrix.png
  feature_importance.png

analysis/                  # Error analysis & evaluation scripts
pipeline/                  # Pipeline orchestration
frontend/                  # React + Vite map visualization
tests/                     # Test files

Usage

Train XGBoost model

pip install xgboost scikit-learn imbalanced-learn
python training/train_xgboost.py

Train metamodel

python training/train_metamodel.py

Run iterative retraining

python training/retrain_pipeline.py

Results saved to evaluation/retrain_results.json.

Generate predictions

python scoring/predict.py

Run frontend map

cd frontend && npm install && npm run dev

How It Scales

This approach is designed to scale to Overture's 100M+ places:

XGBoost model runs on Overture attributes alone -- no external API calls needed
Signal ensemble adds accuracy where external data is available
Retraining pipeline allows the model to learn from signal outputs, gradually reducing dependence on expensive API calls
Per-city evaluation ensures the model generalizes across geographies

Limitations

Training Data

Small dataset: 6,367 samples is small for a model meant to generalize to 100M+ places. Most ML models for this task would use 50k-500k labeled samples
Class imbalance: 3.6:1 open-to-closed ratio means the model sees far fewer closed businesses. SMOTE helps but synthetic samples aren't real-world closed businesses
No city labels: All 6,367 training samples lack city metadata — the model can't learn geographic patterns (e.g., NYC restaurants close faster than rural ones)
Overture-labeled data quality: The bulk of training data (5,919 samples) comes from Overture's own Project C labeled sets, which may have labeling inconsistencies or biases toward certain business types
Yelp bias: The 448 Yelp-labeled samples skew heavily open (94.2% open) because Yelp's API mostly returns active businesses. Closed businesses get delisted, so the Yelp source underrepresents closures
Category skew: Hotels (356 samples) and professional services (201) are overrepresented. Common categories like "restaurant" only have 96 samples — the model likely performs worse on underrepresented categories

Model

XGBoost model-only is weak: 51.8% baseline accuracy (barely better than a coin flip) means the model alone isn't useful — it relies on the 6-signal ensemble to reach 85%+
No temporal features: The model sees a single snapshot of each business. It can't detect changes over time (e.g., a place that just lost its phone number vs one that never had one)
OCR signal dropped: Mapillary street-level imagery was too outdated to be useful — this was meant to be a strong signal but contributed nothing
TomTom near-zero weight: TomTom's signal (0.006 weight) adds almost no value. Effectively a 5-signal ensemble
Test set is small: 407 test samples across 5 US cities. Performance on international cities, rural areas, or non-English businesses is unknown
Metamodel is simple: Logistic regression can't learn non-linear signal interactions (e.g., "Foursquare says open BUT website is dead" should be weighted differently than either signal alone)

Signals

API dependency at inference: Unlike approach 2, this approach needs to call Foursquare, Yelp, and website-check APIs for every prediction. This is expensive at scale and adds latency
Foursquare deprecation risk: Foursquare has changed API access terms before — the signal could break if they restrict access
Website liveness is noisy: A dead website doesn't always mean a closed business (site might be temporarily down), and a live website doesn't always mean open (abandoned sites stay up for years)

Future Improvements

More training data: 6,367 samples is too small. Scraping 50k+ Yelp/Google labels across more cities and categories would directly improve generalization
Overture release deltas: Diff consecutive Overture releases (free) — places that lose sources, change categories, or drop confidence between releases are strong closure signals. This is the single highest-ROI improvement
Google Places signal: Google's business_status field would be the single highest-accuracy signal, but requires API costs at ~$5/1000 lookups
Temporal features: Track how features change over time (e.g., a place losing its phone number between releases is more predictive than never having one)
Category-specific models: Train separate models for food/retail/services — closure patterns differ significantly by industry (restaurants close at ~15%, hospitals at ~1%)
Active learning: Instead of random signal lookups, prioritize checking businesses where the model is least confident to maximize label value per API call
Metamodel upgrade: Replace logistic regression with a gradient-boosted metamodel that can learn non-linear signal interactions
Better Yelp sampling: Actively search for closed businesses on Yelp (filter by "closed" status) to balance the Yelp label source
International training data: Current data is US-only. Adding European/Asian labeled data would test whether the model transfers across geographies

Test Cities

City	Test Samples
San Francisco	76
Los Angeles	76
Chicago	76
Miami	68
Philadelphia	111
Total	407

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TerraForma - Business Open/Closed Prediction

Overview

Results

Training Data

XGBoost Features (19 total)

Iterative Retraining

Recent Improvements

Project Structure

Usage

Train XGBoost model

Train metamodel

Run iterative retraining

Generate predictions

Run frontend map

How It Scales

Limitations

Training Data

Model

Signals

Future Improvements

Test Cities

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
analysis		analysis
data		data
evaluation		evaluation
frontend		frontend
model		model
pipeline		pipeline
scoring		scoring
signals		signals
tests		tests
training		training
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

TerraForma - Business Open/Closed Prediction

Overview

Results

Training Data

XGBoost Features (19 total)

Iterative Retraining

Recent Improvements

Project Structure

Usage

Train XGBoost model

Train metamodel

Run iterative retraining

Generate predictions

Run frontend map

How It Scales

Limitations

Training Data

Model

Signals

Future Improvements

Test Cities

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages