Skip to content

Cluster Analysis

Alex Bettinardi edited this page Jan 9, 2025 · 1 revision

Clustering Analysis

The cluster analysis groups the census households into clusers based on household category (income and size), dwelling type, dwelling floor area, and occupation. The method of clustering used was k-means because of its effectiveness and accuracy. K-means clustering partitions the observations into k clusters in which each observation belongs to the cluster with the closest centroid. It does so in a way to minimize the within-cluster sum of squares. For similar projects in the past, two-step clustering has been used for performance reasons, but was not necessary in this case.

Certain unusual household behavior were grouped into small clusters. Clusters with a size of less than four observations were combined together, respecting the constraint that each cluster can only contain observations for one dwelling type.

This is further described in "Development of Household Choice Clusters for Oregon SWIM 2".

Clone this wiki locally