Over the past six weeks, Wharton statistics professor Abraham Wyner and several MBA students, along with Wharton practice professor of legal studies and business ethics Scott Rosner, studied data provided by ESPN to determine whether the amount of money that sports teams pay for their players can predict how well the team will perform. Here is what they found:
Surprisingly, among the major American professional sports, it is ice hockey for which performance is most predictable, given team salary data. In baseball, salary matters a lot, but there is still tremendous uncertainty as to predictability — i.e., teams that don’t spend a lot can actually win more games than teams that do. In football, how much a team spends on its players has very slight predictive power, due to a strong revenue sharing system and salary caps, among other reasons.
Basketball is not as easy to predict as hockey or baseball, but it’s easier than football. Again, such factors as individual coaches, strategies and combinations of teams and players can all influence outcomes. The fifth sport, European soccer, was by far the most predictable of the five. Based on data related to the European premier league, Wyner and his students concluded that the teams that spend the most money on players win year after year. “Not only do they win, but we know they are going to win,” says Wyner.
While his analysis of the five different sports is relevant to owners and managers of sports teams, it is not part of his statistical research, which focuses more on scientific fields. But one research area overlaps both business and science: data mining — the process of discovering patterns out of large data sets, or as Wyner describes it: “extracting gems from a mine full of stuff, some of it valuable and some of it not, and looking for patterns that you can apply to your business.”
Wyner doesn’t like the term “data mining,” because “it means that you don’t know what you are looking for.” In fact, many companies don’t. As has been widely noted over the past few years, businesses collect such a huge amount of data, often on their customers, that they are drowning in information they have no idea how to effectively use. The right approach, says Wyner, is for companies to ask themselves what kinds of problems they have and, then, what data is available to solve those problems. “Instead, companies tend to say, ‘Here is this data: What can we find in it that might help us solve this problem?’ The term ‘data mining’ is pejorative because it suggests that managers let the data drive the problem, instead of letting the problem drive the data analysis.”
Referring to his study of the five sports, Wyner says that the owners and managers of the teams “should rely on data to make business decisions — rather than let these decisions be driven by tradition, history, what was done last year, inertia, convention or custom.” Some of the conclusions he and his team reached in their analysis go against conventional wisdom. For example, “we found that, in baseball, spending money on pitching is far more effective than spending money on hitting,” Wyner notes. “Convention says that you spend more money on hitting because more of your players are hitters. But we found that you get a more productive dollar when it’s spent on pitching.”
In basketball, spending money on the centers and forwards was more productive than spending it on the guards. “Part of that could be due to salary caps,” says Wyner. “[Miami Heat pro guard] Lebron James gets the same salary as a good but not great guard because of a cap. But James is so much better than everyone else. So it’s hard to argue that spending money on a guard is going to be a good investment for a team that doesn’t have Lebron James.” In other words, teams should put their salary cap maximum on a center because they get more value than they would putting it on a guard.
In hockey and baseball, Wyner concludes that defense is a more productive investment than offense (pitching is considered defense). In football, however, Wyner and his team found no obvious value proposition. “We’re not sure why, except that maybe it’s because on-field performance is so unpredictable on a yearly basis. Coaching, and the entirety of the team, matter more than any individual player you can acquire, even a quarterback. The problem is, there are too many quarterbacks who are paid a lot of money who don’t do well. That distorts the statistical view.” Football teams are built to be on parity with one another, Wyner adds. “Whatever they are doing to make football competitive, it’s working. It’s good for the fans and the owners, probably not so great for the players.”
Everything that the research team has done for baseball “can be done for your company,” Wyner adds. “You just need to collect the data that drives your business and analyze it.”
In addition to research interests that include probabilistic modeling, information theory, data compression, boosting and temperature reconstructions, Wyner is a huge baseball fan, which explains the title of one of his co-authored papers: “Bayesball: A Bayesian Hierarchical Model for Evaluating Fielding in Major League Baseball.” For those who skipped statistics in school, Bayesian methodology integrates information from a number of different sources – in this case, knowledge of player distribution as well as historical data on all players, for example – to infer or predict the performance of one unit, in this case, the fielding ability of one player.
The article, according to its abstract, focuses on a relatively under-explored topic: “the use of statistical models for the analysis of fielding based on high-resolution data consisting of on-field location of batted balls.” The abstract notes that the authors – including, besides Wyner, Wharton statistics professor Shane T. Jensen and Kenneth E. Shirley, a statistics researcher at AT&T Labs — “combine spatial modeling with a hierarchical Bayesian structure in order to evaluate the performance of individual fielders while sharing information between fielders at each position.”