Skip to content

Archetypal Analysis in Segmentation

October 19, 2023

Archetypal Analysis is an unsupervised learning method that’s gaining popularity in marketing research. Rather than describing data by “average” observations (cluster centers), the analysis represents it by extremal points in the multidimensional space (the archetypes). We’re comparing Archetypal Analysis with traditional approaches to market segmentations and showing how Archetypal Analysis might lead to more high-quality actionable solutions.

The objective of a segmentation is to divide consumers into approachable groups based on demographics, needs, attitudes, interests, and other psychographic and behavioral criteria. In each group, we want consumers to be similar to each other and to be as different as possible to consumers in other groups.

But how do we define “similar” and “different” to efficiently exploit uncovered heterogeneity in product or service marketing? If there was a small set of variables responsible for the groups, the problem would be trivial. If we have lots of respondents described by many variables, however, the task becomes significantly more challenging. Traditionally, to develop a segmentation, researchers use a wide range of unsupervised machine learning algorithms called clustering methods. Cluster analysis focuses on groupings within the cloud of individual respondents. Generally, groups are formed around “average” members that are used as a “prototype” for each cluster.

Archetypal Analysis offers an alternative approach to grouping that might present multiple benefits to a market segmentation. It searches the periphery of the data cloud and focuses on extreme individuals. The goal of the analysis in this case is to identify the pure types (the archetypes) and then to describe each point in a data set as a convex combination of a set of archetypes.

Archetypal Analysis approximates the convex hull of a set of data. For a large dataset, the number of points in the convex hull might be relatively large and dimensionality might be high, so the first step is to look for an approximated hull with a reasonable number of points minimizing the residual of the approximation. Similar to a clustering approach, a researcher might need to consider and compare multiple solutions derived using Archetypal Analysis and choose the most appropriate for the segmentation.
Points inside of an approximated hull can be represented as a convex combination of archetypes (points on the outside are represented by their nearest point on the archetype hull). Similar to methods like latent class, Archetypes Analysis leads to a fuzzy segmentation. For any individual, we know exactly the share of each archetype contributing to their traits. We can interpret these shares as probabilities to belong to each of the archetypes and classify each individual to the archetype with the highest probability.

Archetypal Analysis offers an interesting perspective to market segmentations.

Should we define and describe each segment by its extreme (archetype) or based on its average (prototype)? If the goal is accuracy and the clusters are compact, then the average is a good group descriptor. However, the average does not perform well when clusters are elongated or sparce, which is a realistic scenario in marketing applications. In some cases, use of traditional clustering techniques leaves researchers with solutions and personas that are hard to define. Our clients are often looking for a set of contrastive categories to describe the market, and Archetypal Analysis can be most helpful in achieving this goal in segmentation studies.

Let us consider a simple example.

Imagine two matchmakers are evaluating their pool of 100 candidates to introduce the best matches to their most demanding clients. (Figure 1) The candidates are described by multiple traits, but we are representing each candidate with a point on a two-dimensional chart for visualization.


Figure 1. Comparison of Archetypal Analysis and Cluster Analysis

The first matchmaker decides to use Archetypal Analysis to better understand the candidates’ pool. The approximated convex hull she has chosen has five vertices – five archetypes. By definition, it is easy to point to the trait describing each archetype, even if you are not an expert in matchmaking. After the archetypes are selected and defined, each candidate in the pool can be viewed as a combination of the archetypal traits, and, if needed, can be assigned to a single archetype he is the closest to.
Let us assume that the second matchmaker decided to perform a traditional cluster analysis. As you can see on Figure 1 (2), she has found five relatively tight clusters in the data, but she might run into a problem describing each cluster and matching candidates with her clients based on her solution. The first cluster (C1) includes both athletic and attractive candidates, and one of these two traits might not be represented very strongly in some candidates. Cluster C4 might present the same problem as C1 since it includes both wealthy and smart people. There could be difficulties even with cluster C5, since the matchmaker will be looking at an average person in this cluster, and the “creative” trait might not be that obvious. Clusters C2 and C3 would just include candidates with no particular traits standing out. Moreover, these two clusters are somewhat like each other. It would be hard for the matchmaker to define these two clusters and match them with any of her clients.

Pure types or archetypes are a form of contrastive categorization. We define groups in terms of idealized types to magnify the contrast with all other groups and competing options. The researcher seeking to find and exploit consumer heterogeneity is not looking for a perfect mathematical solution with the most compact clusters. The idea is to capture differences between groups and individuals that translate into actionable marketing strategies. In the situation when consumers are trying to make sense out of an infinite space of varying products and offerings, we need to look for meaningful distinctions to be able to reach consumers and lead them to a purchase. For many marketing strategies it is helpful to see consumers and products as more distinct and well-separated, and we want the segments to be more extreme too. Archetypes are much easier for interpretation than standard segment personas and can serve better to describe the specifics of each segment and to define potential market strategy.

At Big Village, we believe that Archetypal Analysis could be a promising approach for a market segmentation and other applications and can help to maximize the outcome for your business.

Written by Faina Shmulyian, VP Insights.