Wharton's Anoop Menon and Jaeho Choi discuss their research on using natural language processing to analyze competitive strategy.

Every business would love to know the minds of its competitors, and what they are likely to do next. Strategy analysts have thus far used simple tools that employ mostly financial and other structured data to try and predict competitors’ moves. But new research at Wharton has shown how natural language processing techniques could be used to parse tomes of unstructured data such as text buried in conference calls or annual reports to more accurately anticipate competitor strategies.

The research opens new pathways to measure and test assumptions firms make in their competitive strategies, and to “visualize how firms are positioned with respect to each other, and then map that on to performance consequences,” says Wharton management professor Anoop Menon. His research paper, “What You Say Your Strategy Is and Why It Matters: Natural Language Processing of Unstructured Texts,” is co-authored with Jaeho Choi, a Wharton doctoral student, and Haris Tabakovic, an associate at The Brattle Group, a Boston-based international arbitration services firm.

For their study, the researchers used natural language processing (NLP) techniques to measure “strategic change, positioning, and focus,” across their sample of 50,506 business descriptions of publicly held companies contained in their 10-K annual reports, from 1997 to 2016.

Menon and Choi shared the main takeaways for business strategy analysis from their research with Knowledge at Wharton.

An edited transcript of the conversation follows.

Knowledge at Wharton: Anoop, could you tell us what led you to explore this topic in your research? What was your objective?

Anoop Menon: The notion that there is a lot of information that is buried in unstructured text has been around for a while. We know that strategy is very complicated, but we tend to measure it using very, “simple metrics” like a few financials here and there. But we all agree and understand there is a huge amount of information that is buried in text like conference calls and annual reports that gets at the meat of the strategy, how the strategists are thinking about competition and product market choices.

Sadly, we currently don’t have a really good technique or set of techniques to get at that information. So that was the starting point. About six or seven years ago, my co-author Haris [Tabakovic] and I came across this burgeoning line of research in computer science about using natural language processing techniques to extract text, but in very different fields – not ours. [There were] some applications to political science but not at all to strategy. We said we should be able to take some of those techniques and get at the information that is buried in the text.

Knowledge at Wharton: Jaeho, what exactly are these natural language processing techniques that Anoop was talking about?

Jaeho Choi: In computer science, natural language processing is the technique that enables machines to learn and understand human languages. [There is] a vast array of techniques that do certain kinds of tasks in terms of understanding human language, and we are applying a few of those in our research.

Knowledge at Wharton: The use of natural language processing and artificial intelligence in understanding strategy seems to be an interesting idea. What were some of the main takeaways? What did you learn in your research?

Menon: We have been pleasantly surprised by the amount of traction we were able to get. We weren’t sure there would be much here. We took the annual reports that publicly traded companies have to file and extracted their business descriptions. [Next], we compared changes in their strategies across the years, or their relative position in the marketplace with respect to their competition, [to see] how focused their strategies were.

“We can see how, say, Pepsi, Coke and Dr. Pepper move in the strategy space with respect to each other, the relative distance between them, and how that [impacts] performance.”–Anoop Menon

These are all classic constructs in strategy and until now, we didn’t have a simple way to measure them across industries. For example, airlines we might have specific metrics like the percentage of seats utilized per route or something like that to get at different aspects of strategy, but that would not apply to, say, cement.

But now we are able to put all these firms in the same vector space model in some ways, and see, oh who is close to whom? We can see how, say, Pepsi, Coke and Dr. Pepper move in the strategy space with respect to each other, the relative distance between them, and how that [impacts] performance.

We are able to see and verify some basic claims that strategy has made. The bigger the change [a company] makes over the years, the more dangerous it is in terms of performance consequences. In terms of your positioning with respect to rivals, it is good to be differentiated, but if you are too differentiated, you start losing legitimacy and your performance suffers. It is important to be focused. A certain degree of diversification is important for various reasons, but [it may not help] if you are too diversified.

We can put numbers on those constructs in a general way, measure those, visualize how firms are positioned with respect to each other, and then map that on to performance consequences to verify some of the claims that strategy has made until now.

Knowledge at Wharton: How did you go about conducting your research? What were some of the challenges in using natural language processing to study strategy?

Choi: In our research we use a specific kind of corporate filing called the 10-K annual reports that all public firms in the U.S. submit to the Securities and Exchange Commission (SEC). The SEC opened the access to this data set since 1994, and over the past 24 years, a huge data set has accrued in its database.

Oftentimes, unstructured textual data is very dirty in a sense, so we had to tackle some encoding problems, and we faced some problems like applying some of our core techniques to this unstructured data as well. It took us some time to apply the techniques that we gleaned from computer science and apply it to our textual data.

Knowledge at Wharton: Could natural language processing help overcome some of the limitations of traditional analytic methods, and if so, how?

“From a practitioner perspective, this can be a quick [representation of trends] – almost like a dashboard to see what my strategy is, and how it compares to what the competition is doing.”–Anoop Menon

Menon: Yes, it can. What might be some examples? How do we quantify analytic methods, and what does that even mean? Right now, for the most part, these are financial fundamentals that you can pull out from the annual filings [of public companies].

But you have detailed deep dives that scholars or analysts do. They go deep into the context, they talk to a bunch of industry participants, and get a sense of what is going on. That is a very high-touch method and is just not scalable. One of the hopes we have with this technique is that we can straddle the tools there.

[Take] a certain industry, say nanotechnology. [You could] count the number of nanotechnology instances in [conversations], and then plot that over time and say, “Nanotechnology is trending right now.” That is very simple, we would argue.

The more complicated versions of it are cognitive maps. [They represent] cost-effect relationships that you can map out in terms of the belief systems of the industry participants, and what they think is going on [as in] the costs and the consequences [of an endeavor].

Knowledge at Wharton: What are some of the implications for business practitioners of what you have learned? Why should your research matter to a busy executive who is involved in strategy?

Menon: There is a big set of implications. In talking to executives who do strategy for a living, [we learned that] it is hard to keep track of all of your competition — who is doing what, where they are, and what they are up to these days.

We could refine these techniques, build upon them and create a set of tools where you can actually visualize where you are in the strategy space with respect to everybody else, and how that is changing over time. We already have a simple version of that, and some surprising insights come from that.

For example, in pretty much all of the strategy cases we use, Southwest Airlines was the classic example of a low-cost, highly focused strategy. But that analysis stopped in the 1980s and the 1990s. What we see is that since the 1990s and the 2000s in particular, Southwest has converged a lot more with the rest of the airline industry than we thought. It is not nearly as much of an outlier as it used to be. [With] its acquisition of [low-cost airline] AirTran, it spread their networks into Latin America as well as the East Coast.

So [Southwest Airlines] is starting to converge more, but it is still different and there is a performance gap because of that difference. We could visualize that [in our model]. Or how when PepsiCo acquired Quaker Oats [in 2001], that started a trend of it diversifying into snacks. We can see that differentiation and change in focus in some of our metrics as well.

The hope is, from a practitioner perspective, this can be a quick [representation of trends] — almost like a dashboard to see what my strategy is, and how it compares to what the competition is doing.

Choi: For financial analysts who look into these corporate reports, our tool could be a way to analyze large sets of companies … and understand visually how firms compare with each other.

Knowledge at Wharton: Natural language processing and AI seem to be becoming more and more pervasive, and the technology is getting better and better. As this process continues, what are some of the implications for the ways in which firms could either formulate strategy or even better calibrate their strategy towards better business outcomes?

“Oftentimes, unstructured textual data is very dirty in a sense, so we had to tackle some encoding problems, and we faced some problems like applying some of our core techniques to this unstructured data as well.”–Jaeho Choi

Menon: Ideally, these techniques would help us identify some of the core beliefs of the competition. [For example,] how they are thinking about the world compared to how we are thinking about the world. Or, what areas seem to be exciting, and why. Another stream of my research looks at cognitive biases and people’s mental models, and the amazing effect that it has on competitive outcomes without really understanding how competition is thinking about a certain situation.

So how can we get inside of [competitors’] minds? Some of these techniques might be helpful in getting at that. But then, we get to questions such as where we would get the data from, since people don’t declare how they are thinking publicly. We are working with a computer science faculty member to try to tackle some of these issues in another project.

Choi: AI tools could be used to discover patterns that we don’t know about from the data that we are accumulating in our servers. By applying AI techniques, we may be able to add questions we haven’t thought about, and also find new findings that we haven’t figured out yet. Those would be the directions in using AI in business research.

Knowledge at Wharton: Of all of that you learned and discovered through your research, what are some of the insights that surprised you the most?

Menon: In this first paper, the attempt was to not create too many deviations. This is almost a proof-of-concept [to show] that we can do the things that we have been talking about and trying to do for decades, and that we can bring techniques from AI to demonstrate those.

Many of the “findings” are more to demonstrate that we can replicate [what we already know] using these new techniques. It is more of a translation and proof-of-concept attempt. In that sense, the meta implication of the findings is that while [firm-level] strategy is extremely complicated, we can start digging our teeth into that, even by using some simple techniques.

We were using vector space models and topic models, and these are not necessarily the cutting edge of natural language processing. Yet, we are able to get some traction with those, and that was surprising.

Knowledge at Wharton: What are some of the new questions for research that have come up in your mind that you would like to pursue in the future?

Menon: One is, if these tools can be refined, can we get deeper in terms of understanding [competitors’] cognition and their decision-making process as well? If you had a strategic advisor, that is what you would want them to be telling you. For example, if you are Google, [we might be able to show that] Facebook is thinking about AI in this particular way, and this is where they seem to be going based on what we see. Could we create these mental models from text, based on how companies are describing their world? Could we create cost-effect relationship maps and then use them for much more quantitative analysis?

All that would require us to develop new tools — this is the project I mentioned earlier. We are working with a computer science professor to create [what we call] hierarchical nested semantic networks.

Choi: I am also interested in developing these more sophisticated tools to get a more qualitative and conceptual understanding from the unstructured texts that we are examining. In the NLP world, there are sophisticated techniques called name-identity recognition, semantic networks and text classification. I am interested in how we could apply and tweak these techniques for our purpose and create a more complicated and more sophisticated tool to understand strategy.