As social media grows more popular, it’s increasingly becoming a way for industries, companies and brands to figure out what the cultural zeitgeist is thinking — and from there, to recommend other things consumers might like.
Wharton operations and information management professor Shawndra Hill has extensively studied how social TV — i.e., people contributing or consuming commentary about their favorite programs — can be used by Hollywood and advertisers to better reach their target demographics. In this interview, she discusses her method and findings, and also responds to a recent announcement by Nielsen that it will now include demographic information as part of its Twitter TV ratings service. That service offers data that gauge which shows are generating the most chatter on the social network.
An edited transcript of the conversation appears below.
Knowledge at Wharton: Many people today tweet to provide commentary as they are watching TV shows. Why is this important to the media industry — as part of figuring out how many people are actually watching a show — and to advertisers?
Shawndra Hill: There are a number of reasons why it is interesting and also important. The first one is that TV shows can observe, in real time, immediate response and engagement to the content in the show. That can include organic response to [what is happening in the plot], or TV shows actually incorporating social media content into the shows, and asking people to, for example, vote while watching TV. But then in addition to looking at real-time engagement, one could also look at the viewers and estimate things like demographics and interest, and get a sense for who is actually watching the TV show.
Knowledge at Wharton: The Nielsen company has just announced that it will start providing demographic data as part of its Twitter TV ratings product, which looks at which U.S. TV shows have the largest audience on Twitter. Why is this notable?
Hill: We [Hill’s research team] have been estimating demographics from tweets for quite a long time, as the tweets relate to television viewers and television content. So, while it’s notable in that it’s an added service for Nielsen customers, it’s something that can actually be done with publicly available data for free. What’s nice about it is that if you have a methodology to infer demographics of groups or individuals, you can do so for a really large number of people and Twitter handles. Twitter handles represent shows and brands. And so, what’s nice about the ability to actually infer demographics is that you can do so for a wide range of Twitter handles, more so than the popular television shows that Nielsen typically follows.
Knowledge at Wharton: Can you tell us a little bit more about your approach and how it differs from what Nielsen has announced that it’s going to do?
“People are what they say. The words that people use are highly indicative of both their demographics and interests.”
Hill: Our approach works in the following way: We start with Twitter handles. Usually we focus on television shows and brands. But the Twitter handles can represent anyone. For example, I could use the Twitter handle of my [personal] account. From the Twitter handles, we grab [each person’s] followers. For those followers, we grab their tweets. And so, each Twitter handle is represented by all of the tweets of all of the followers. You can think of this as, for a particular show we have all the follower tweets — not just tweets about the show, but tweets about their daily lives.
Once we have this document of all of the tweets of the followers of a particular handle, then we basically create what’s called a bag of words. Think of it as just creating one big vector of words and the associated counts with them.
We then correlate the counts; we normalize the counts in a special way. But we correlate the counts on these words with aggregate-level demographics of the shows. We get that data in an interesting way — from Facebook, through the Facebook advertising API, which allows us to get estimates of the aggregate level of proportions of people who follow a particular thing. It could be a television show like I mentioned, or a brand or a person. Then we correlate the proportion of people who follow a brand or television show or person with these words [from the collected tweets], and find the words that are correlated with different proportions of demographics. Examples of those demographics that we’ve looked at are gender, age and education level. But then in addition to that, Facebook also allows you to target different interests. So, we could even look at estimates of the population for people who like gardening, for example, or cooking.
Even at this aggregate level where we take the words associated with the people who follow the shows, and these aggregate-level demographics, we can do a really good job — when we build models to correlate the words and the demographics — of predicting the demographics of held-out groups of people. Why this is so powerful is that you don’t have to restrict yourself to just popular things that perhaps a company like Nielsen would typically have estimates for, reliable estimates. You can make estimates for just about anything that has a group of people who talk about their daily lives….
Knowledge at Wharton: You have something that you call the talkographic profile. Can you explain exactly what that is, and how it works?
Hill: We’ve just coined this term to basically mean that people are what they say. Groups and individuals … use specific language on Twitter and on social media…. The words that people use are highly indicative of both their demographics and interests.
Knowledge at Wharton: The Nielsen service offers demographic data to the industry, but your approach takes it one step further and actually makes recommendations based on the data. How do you do that, and why is it different than traditional recommendation systems?
Hill: We have built a recommendation engine on top of what we call these profiles for shows. We wanted to show that these profiles actually had value. What we do is simply calculate the similarity between shows — or really, Twitter handles — based on the words that people who follow the shows [use]. In doing so, we can calculate the similarity between anything, any Twitter handle. And when we have a new set of users, we can then feed one of the shows or Twitter handles that the user follows into our big correlation matrix of items, or shows, and ask, based on the similarity between this item that we give our system and all of the calculations that we’ve done, what are the things that the user would be most likely to follow? We find that calculating the similarity between shows in this way, just by using the words, is highly predictive [of] what people will follow.
The nice thing about this is, while there are a lot of strategies for building recommendation engines — for example, using the product network or the network of Twitter handles that form by looking at Twitter handles that are commonly followed by a lot of people — using the text means that you don’t need those networks. So, while Twitter has a large network of users, there are a lot of websites that don’t. [What we’re] saying is that perhaps tweets or texts can be used as a substitute for this network data when it’s not there. And when it is there, it can be used to complement it. In addition to showing the value there, it also helps with what’s called a cold start problem for recommendation engines.
Knowledge at Wharton: What exactly is that?
Hill: This problem is, when you have an item or a product — or in the case of TV as we’re talking about now, a show — that doesn’t really have that many followers, then you’re not going to recommend it, right? It’s quicker to get these tweets. You only need a few followers to start calculating the similarity between the Twitter handles, and the [show or product] will be more likely to be recommended sooner. It also tends towards making more diverse recommendations, as opposed to making more popular recommendations….
“While Twitter has a large network of users, there are a lot of websites that don’t. [What we’re] saying is that perhaps tweets or texts can be used as a substitute for this network data when it’s not there.”
Knowledge at Wharton: Can you give me an example of a specific instance where you’ve been able to make some recommendations using your recommendation engine, and what the results were?
Hill: We test this on Twitter users, so we’re assuming that the things that Twitter users follow are representative of the things that they’re interested in. We basically build our models on one subset of users, and then make predictions on a hold-out set of users. We’ve done this in the context of television shows, reliably. And then we’ve also done this in the context of brands.
We’ve focused mostly on television shows and brands…. Our approach is generalizable [because] we take all of the words that people say, without restricting them in any way. What that means is, we can calculate the similarity between any two things. Not just television shows, not just brands. And so it makes the approach very flexible.
Knowledge at Wharton: Could a company or a brand use your approach in-house? And if so, how would they go about it?
Hill: Absolutely. What’s nice about the approach that we’ve developed is it relies strictly on publicly available data, which means the data is free. We’ve tested our approach on Twitter users who have revealed their preferences by following [certain people or entities]. There would still be this extra step needed for a firm to test it in their particular context. But if they are trying to just infer the demographics of Twitter users, then it would work that way. But if they wanted to use the approach for making recommendations in their context, they would have to test it against their own users.
Knowledge at Wharton: A TechCrunch story about Nielsen’s new offering says that the company has found that while there are a significant number of people tweeting about TV, an even larger number are passively consuming that content. Sometimes, those consumers shed new light on what type of demographic groups watch a particular show. Have you seen that in your research? Why is it so important to tease out passive observers along with people who are actively creating content?
Hill: We haven’t focused on that distinction, mostly because we decide that somebody is interested in a show based on the fact that they follow that show, not based on the fact that they’re tweeting about it. All of the users that follow a show would be included in our data set, both those who actively tweet and those who don’t.
We could easily compare them. My guess is that they’re not extremely different. Perhaps they are. But we could compare them pretty easily. The nice thing about that would be, if there are in fact differences, then it would provide insights to a company. But we’ve focused mostly on, of the people who follow your brand or TV show, can we infer the overall demographics? But it would be easy, because our approach infers information for groups of users, to infer the demographics of the subset of people who tweet and the subset of people [who merely observe], for a particular show.
“While you can make inferences for a larger number of shows, you’re going to be biased, and you’ll have to correct for that bias, for the fact that not everybody is on Twitter and that it skews younger.”
Knowledge at Wharton: People inside and outside the media industry have complained for years that the traditional system of Nielsen ratings doesn’t accurately count how many people are actually watching TV shows. What are the stakes here, and why is doing that so important? How can social media be a game-changer in this?
Hill: [Social media] enables us to watch a larger number of viewers for free, and therefore make inferences about a larger number of users and TV shows for free. Most of the criticism of Nielsen data — there are many — but one of the main ones is that there is not a lot of coverage for niche shows, or shows that aren’t that popular. Because the way that historically Nielsen has collected data is based on a relatively small panel of users who have a device in their home to track their viewing patterns. What social media does is it opens up the space of users to pretty much anybody who is tweeting. And so, you’re not restricted, then, to infer only the demographics of popular shows, because there will be coverage for all shows and all things on Twitter.
Now, I say that with the main caveat that not everybody’s on Twitter, right? There’s going to be a bias, of course, if you use only the social media data. While you can make inferences for a larger number of shows, you’re going to be biased, and you’ll have to correct for that bias, for the fact that not everybody is on Twitter and that it skews younger. And that has to be accounted for. But the promise is that you can make inferences cheaper, faster, for more people and for more shows.
Knowledge at Wharton: Any last comments? Would you like to say anything about a question that I should have asked, but I haven’t?
Hill: I think maybe just talking about the future and predicting what all this means for the TV industry. It’s nice to see Nielsen and other companies beginning to combine different types of data to make more comprehensive pictures of their viewing audience. But what would be nice to see from companies like Nielsen are partnerships with data — having partnerships with people who have different types of data that the masses couldn’t otherwise get.
So, with Twitter, inferring demographics is something that we do. It’s something that a lot of researchers now are starting to do for various reasons. That’s kind of easy. What would be nice to see would be — and perhaps customers wouldn’t want this — to see them do things like partner with credit card companies, and partner with companies for which it has typically been really difficult to get that data, and then see what insights can be drawn about TV viewers by combining data from disparate sources that aren’t easy to get.