1 Overview

Cluster ananlysis is an exploratory, descriptive, “bottom-up” approach to structure heterogeneity. From a “data mining” perspective cluseter analysis is an “unsupervised learning” approach.

A key underpinning of cluster analysis is an assumption that a sample is NOT homogeneous. The method is used to examine and describe distinct sub-populations in the sample.

Can groups of individuals (observations) be identified whose members (a) are similar on group-defining variables, and (b) differ from members of other groups?

Often cluster analysis (and other “mixture” methods) are considered as a person-oriented approach - where a research objective is to identify “types of persons”. A constrast is made with variable-oriented approaches, such as factor analysis and regression - where the research objective is to identify groups of variables or relations among variables. An intuitive representation of the contrast is shown by whether one is interested in data reduction across columns (= variable-oriented) or rows (= person-oriented) of a persons x variables data matrix.

Note: Generally, the “person-specific” terminology used here at PSU is fundamentally different than the “person-oriented” or “person-centered” terminology used elsewhere. (We would rather label the approach used elsewhere as a “sub-group-oriented” approach.)

This is the general idea …