Skip to content
Joanne edited this page Sep 30, 2025 · 24 revisions

PSRC Household Travel Survey Data Cleaning Project

Project Purpose and Goals

  • Purpose: create tools to clean consultant delivered Household Travel Survey data
  • Goals: get the HTS data to a certain level of quality for PSRC analysis and modeling purposes

Scope of work

  1. Assessment of data quality

    • develop metrics (number of error flags, NAs) to assess if the data needs cleaning, how much cleaner the data is after each process
  2. Rulesy

    • a set of scripts in SQL for automatic data cleaning process, identifying error flags and generating tables ready for Shiny-Fixie
  3. Manual data cleaning with Shiny-Fixie

    Shiny-Fixie includes

  4. Post-Fixie cleaning

    • update all derived variables from cleaned data
  5. hhts_cleaning Database (hhts_cleaning repo)

    • a database that stores all data tables, views and stored procedures
    • temporal data tables in hhts_cleaning database that tracks all data edits (includes all previous records and when records are valid from and valid to, but not who made the edits)

Revision Code

1-6 are missing/error dest_purpose recodes:

  • 1 - Return leg of loop trip to 'Home'
  • 2 - Pickup/dropoff by behavior
  • 4 - School purpose by location
  • 5 - Home or work purpose by location
  • 6 - Missing purpose assigned by common destination within household

other codes

  • 7 - Impute missing mode by speed
  • 8 - Link trip
  • 12 - Revise excessive speed (with distance matrix API travel time)
  • 13 - Impute purpose from destination (using location recognition API)
  • 15 - Shiny-Fixie manual data edit

4 is obsolete (we don't collect license data anymore); I've chosen not to run the procedures behind 9-11 (silent passenger insertions; recode when work purpose assigned to accompanying dependents), and 14 isn't relevant either (split trip from traces; trace data changed in 2023).

Clone this wiki locally