‘Recommended for You’: How Well Does Personalized Marketing Work?

Anyone who shops regularly online has encountered recommender systems that point out one or two other products or pieces of content we might like, based on past purchases or other behavior. In two new papers, Wharton operations, information and decisions professor Kartik Hosanagar examined when recommender systems work well, and when they don’t, and whether certain types of products tend to do particularly well when included in such systems. He also looked at how recommender systems interact with other features designed to drive purchases, such as customer reviews.

Both “People Who Liked This Study Also Liked”: An Empirical Investigation of the Impact of Recommender Systems on Sales Volume and Diversity,” and “When Do Recommender Systems Work The Best? The Moderating Effects Of Product Attributes And Consumer Reviews On Recommender Performance,” were co-authored by Hosanagar and Carnegie Mellon business analytics professor Dokyun Lee.

Hosanagar recently sat down with Knowledge at Wharton to discuss what his research reveals about when we’re most likely to be influenced by those clever algorithms and how these systems are changing the way we discover new products.

An edited transcript of the conversation appears below.

The Influence of Recommendations

An important theme of my research is how personalized recommendations and similar algorithms affect consumer choice. We’re all flooded by product choices today, and these kinds of personalized recommendations play an important role in helping us discover new products or sorting through large choice sets. We see a personalized recommendations in a number of industries, whether it’s in retail — for example, Amazon says “People who bought this also bought …,” or in media, such as in Netflix or YouTube. We see it with news as well. For example, Google News will recommend personalized news stories.

We know these recommendations have a pretty significant impact on consumer choice. For example, at Amazon, they drive anywhere from a quarter to a third of the choices that consumers make.

But although we know that they have a big impact on consumer choice, we don’t fully understand what kinds of products are more likely to be accepted by consumers when recommended, nor do we know when recommenders work well and when they don’t. So in my research with professor Dokyun Lee at Carnegie Mellon, we look at two main questions.

The first is: What kinds of products are more likely to benefit from recommendations? Specifically, are mainstream products or niche projects more likely to benefit from recommendations? The other question we look at is: What is it about a product that makes it more likely to elicit a response from a consumer when it’s recommended? For example, the ratings of the products or the price of the product or the type of the product — do they influence whether recommendations are effective for that product?

“Because common recommendation systems are based on sales and ratings — for example, people who bought this also bought this — they’re unable to surface truly novel items that have not been discovered by many other people.”

Surprising Discoveries about Discovering

In one of our research studies, we looked at whether recommendation systems help us discover novel and niche items that we might not otherwise discover, but are a great fit for us personally. What we found is that, because common recommendation systems are based on sales and ratings — for example, people who bought this also bought this — they’re unable to surface truly novel items that have not been discovered by many other people. This tends to create a “rich gets richer” effect for popular items, and it might also prevent consumers from finding better product matches because of this bias for items that have been purchased by others or that have been rated well by others.

Now, this is a finding that a lot of people find surprising. Many of my friends tell me that they do find very new items that they previously did not know about through recommendations. And in line with that, we find that these recommendation systems can push us as individuals to new items. But they push all of us toward the same new items, and thus at the aggregate level, we don’t see this great increase in diversity of purchases from consumers.

In another research study, we looked at what it is about products that makes them more likely to elicit a response from a consumer when they are recommended. For example, we looked at interactions between a product’s rating and the recommendation response, and we found that, as one would expect, recommendations help all kinds of products, whether they’re rated high or whether they’re rated low.

But interestingly, we find that it’s the products that have a low average rating that elicit a greater response from the consumer — that is, the purchase probability of a product goes up a lot more for low-rated products than for higher-rated products. This tells us that recommendations and ratings are in some ways substitutes for each other. If a product has high ratings to begin with, then the recommendation has an impact, but it’s not as great. But when it has low ratings, in the absence of the recommendation, we might not even respond to that product. But when that product is recommended, then we are willing to give the product the benefit of the doubt. We think: Maybe the product isn’t right for everyone out there, but perhaps it’s right for me.

Another aspect we looked at is whether the type of the product matters. So we classified all the products in our data set into two groups: what we call utilitarian products and hedonic products. Utilitarian products serve some functional purpose — for example, appliances or groceries. Hedonic products don’t serve a functional purpose, but really appeal to some sensory perception — for example, jewelry. We found that recommendations have a low to moderate impact for utilitarian products, but for the hedonic products, they have a very significant impact. And these hedonic products — things like jewelry, that we don’t really need — when a recommendation suggests that something is a great fit for us, or says that people with similar tastes like this product, that really moves the needle in terms of making us respond to that recommendation.

We also looked at other things, like did the description of the product matter? The price, does it matter? And we find what you would expect: People respond more to recommendations for lower priced products than to higher priced products, and where there is a better description for the product than if there is very limited description.

“Today, producers are used to thinking about, how does our product get discovered by consumers? They need to also ask, how does our product get discovered by algorithms?”

Key Findings

There were two conclusions that surprised us. One was that recommendations don’t necessarily help us discover niche products. That’s interesting, because there has been a lot of discussion for at least a decade now about how online systems, whether they use search engines or personalized recommendations, will help us find niche items. The theory is that they will help benefit what we call the long tail — products that are not super-popular, that almost don’t get produced, that may get produced but don’t really sell much. The promise of recommendation systems is that they really give a fair opportunity for these kinds of products. But we find that for the common designs, it doesn’t happen, and that’s pretty surprising.

There are some designs where you make modifications so that you favor niche items. You can make it work. But the most common designs that are used by most retailers don’t do that, and we found that it might have the opposite effect.

Now, another result that surprised us was that recommendations and ratings are substitutes for each other. Again, prior to the study, we expected that people would respond to recommendations for things when they are highly rated, and we did find that. But what surprised us was that their response is even greater when the products have a lower rating. That suggests that there’s this effect of recommendations as substitutes for ratings, and we hadn’t predicted that beforehand.

Key Takeaways

Our research has implications for retailers, for producers and even consumers. For retailers, to the extent that there are strategies to offer a wide product assortment, our research suggests that the choices of technology that they make may not always be consistent with that strategy.

For example, Amazon’s strategy is that you can find any product on Earth at Amazon. Similarly, many online retailers also offer wide product assortments. Our research suggests that if you offer great product assortment, you also need to think about how will consumers discover that wide product assortment, and recommendations are an important part of the solution. But they don’t often work in practice, because they have this bias towards products that have been bought before and rated before. So, our research suggests that they need to think about technology choices, and think about how to modify this common design so it is consistent with their product choices in general.

For producers, our research shows that recommendations and similar algorithms drive consumer choice in a big way. Today, producers are used to thinking about, how does our product get discovered by consumers? They need to also ask, how does our product get discovered by algorithms?

For consumers, we find that these systems are great at helping us as individuals discover new products, but at the aggregate level, we’re not seeing that diversity. That is not necessarily troublesome for consumers, but it does suggest that there are products out there that could be this needle-in-the-haystack, perfect product for you, but which may not be surfaced by recommendations. So one has to be open to other sources of discovery as well.

The Consequences of Built-in Biases

There’s a lot of talk these days about big data and analytics, and how there’s so much data, and companies are building intelligent algorithms that can help them be smart, help consumers find products they like, and so on. Our research shows that these efforts do work, but at the same time, we have to be cautious about unintended consequences.

For example, the idea of recommendations is that they help us find novel items, but we don’t want them to have biases built into them where they favor certain kinds of items versus others, and to the extent they favor certain kinds of items, that might have an unintended consequence. Think about that in the context of news: If we all consume news through personalized recommendations, we may not get the breadth of perspective we want. Algorithms may be driving a lot of our choice with media, so we need to think hard about how big data and algorithms can be de-biased. Largely, they work, but they do have some biases we need to be cautious of.

A Data-rich Empirical Approach

One of the things that’s really novel about our work is that our research is informed by really large-scale data and analysis of that data. There have been a lot of theories about how recommendations impact consumer choice, what kind of products they favor, when consumers respond to them and so on. In practice, there had been very little empirical evidence. That’s partly because, in order to answer these questions — for example, do recommendations favor niche items or mainstream items? — we need a contrast between people exposed to recommendations and unexposed to recommendations. Most retailers observe consumers only after they come to their websites and are exposed to recommendations.

“Algorithms may be driving a lot of our choice with media, so we need to think hard about how big data and algorithms can be de-biased.”

Our study is based on an experiment that was done with a large retailer in North America with whom we ran an A/B experiment where some people got recommendations, some did not, and we ran this for hundreds of different product categories. So we were able to not only get that contrast needed to answer the question, but we were also able to generalize beyond a single product category, because we had so many different products. That, I think, is what sets this research apart: It’s based on concrete evidence, and on a large enough sample point, and a large very representative sample of consumers.

Looking Ahead

In terms of next steps, we’re trying to generalize some of our results beyond recommendations and think about all kinds of search tools online, such as social tools and social news feeds. For example, on Facebook, we discover products that include search engines. We’re also trying to understand this world from the producer perspective. As I mentioned, there are implications for producers, and they need to think hard about how consumers will discover their products through algorithms, and product discovery by the algorithms as well. There’s very limited understanding of what is it that makes a recommendation algorithm pick one product among thousands of potential candidates. We’re trying to study that, and hopefully we’ll provide some insights to producers so that they can be active participants in helping their products get discovered rather than passive observers.

More From Knowledge at Wharton

Take 5 Oil Change CMO, Doug Zarkin

In the Wake of Tariffs, Can Dynamic Pricing Work?

Inside Sprite, McDonald’s, and Dude Wipes: Branding That Sticks

Looking for more insights?