What Role Can AI Play in Sports?

Wharton professors Adi Wyner and Cade Massey join Eric Bradlow, vice dean of Analytics at Wharton, to discuss how AI — along with statistics and data science — can assist with tracking data, player evaluation, making predictions, and more. This interview is part of a special 10-part series called “AI in Focus.”

Watch the video or read the full transcript below.

Transcript

Eric Bradlow: Welcome to the next episode of the Analytics at Wharton, AI at Wharton podcast series on artificial intelligence. While I’ve enjoyed all the episodes, today is really a special one for me. Not just because of my love of sports, but of course because I’m interviewing two of my colleagues and co-hosts of another show here on Sirius XM, Wharton Moneyball.

It’s my honor to introduce my colleague, Cade Massey. Cade is a practice professor in our operations, information and decisions department. He teaches researchers and consults on how to improve decision-making in organizations. And probably the part– you could not have written it better than this, thank you, Cade— especially by blending models and experts. He’s also the faculty director of Wharton People Lab, faculty co-director of the Wharton Sports Analytics and Business Initiative. And of course, as I mentioned, co-host of Wharton Moneyball. Cade, welcome to our podcast.

Cade Massey: Thanks, Eric. Delighted to be here.

Bradlow: I’m also joined by my longtime friend and colleague in the statistics and data science department, Adi Wyner. I really think it’s important to read Adi’s short bio here, because of all the different things he’s doing around data science, statistics, and artificial intelligence at the school. Adi is a professor of statistics and data science whose research has spanned many areas including applied probability modeling, information theory, and machine learning. He’s written many articles in statistical methodology, and also applications including other episodes like we’ve had just now — neuroscience, medicine, climate science, and of course, extensively in sports analytics, where along with Cade, he’s the co-director of our Sports Analytics Business Initiative. He also runs the Penn Sports Research Seminar, the summertime Moneyball Academy, and he’s also one of my co-hosts of Wharton Moneyball. And he’s the director of the undergraduate program in statistics and data science. So Adi, welcome to the podcast.

Adi Wyner: Yes, thank you. Great to be here.

Bradlow: And that completes our 28 minutes of talking today. But let’s get serious. So Adi, why don’t we start with the following, and this is a question I get all the time. And I imagine you do, too. What is AI? And how does it differ, if any, with what you and I might call statistics?

Wyner: Well, it is actually a great, great blending of the two. I mean, statistics has been around a long time. If we go back, historically, AI was designed to try to figure out how people think and build models that do that. And then sometime in the ‘90s, people realized that the smart way to do artificial intelligence is not try to reconstruct the thought process, but just get tons of data. And this was a huge breakthrough. I think it really happened in the ‘90s in machine translation and in speech recognition.

Bradlow: And then eventually vision.

Wyner: And then eventually vision. What happened was, instead of trying to actually build machine learning and artificial intelligence, people discovered that the right way to do that was statistics.

Now fast forward to today. We still have statistics, and we have machine learning and AI. And they are still somewhat similar, and also different. I have a rule of thumb about what kinds of problems are machine learning and what kinds of problems are statistics, although you have to remember, there’s enormous feedback and intersection between the two. And we all use each other’s methods.

The way I like to think about it is this. What is the difficulty in solving the problem? And if the difficulty is complexity — for example, vision, self-driving cars, image recognition, trying to take the entirety of the web and use that to answer a question, like you think of large language models — that’s AI. Something that a human can do, and now you’re trying to get the machine to do it.

Statistics is all about what I would think of as noisy problems. We have a colleague in criminology who tries to predict whether or not someone commits a crime after they’ve been in jail for a while. Recidivism. Well, you’re never going to get a perfect prediction. Sports is a classic place where a lot of our problems are statistics. For example, trying to figure out, you know, what your batting average is going to be next year, or things like that. But also, other problems come up in areas which are common to both, where there is a lot of signal, but also you can handle them with statistical methods.

The way I generally describe it is what we would call signal versus the noise. And the machine learning problems have high signal, low noise, and the statistics problem have high noise and lower signal.

Bradlow: Cade, let me ask you, related to that, maybe one could argue that statistics has gotten its footing in companies and decision-making. Maybe you could argue machine learning has, because people like to predict things that can help predict the future. AI is kind of new. How do you see companies reacting to these different methods? And are we now in such a — I’ll call it an “enlightened explosion world,” like everyone’s just saying, “Of course I have to adopt that.” Or is that not the reality of today?

Massey: Well, certainly organizations are looking at opportunities for it, and they’re excited about it. And certainly, vendors who think they have a new application for it are selling the potential of that. But ultimately it always comes back to, is somebody going to rely on that algorithm or model for a decision? They have to use the dang thing. It’s one thing for it to spin and do pretty things. It’s another for a human to depend on it.

And what we’ve seen, reliably, is that that’s a pretty steep hill to climb. That people are loathe to trust decisions to models when models aren’t perfect. And in many, many applications, models are inevitably imperfect. As soon as we see them be imperfect, we’re reluctant to lean on them. Even if we know humans are imperfect. There’s an asymmetry between the penalties humans apply to models that are imperfect, versus humans that are imperfect. As soon as they see that imperfect performance, they’d rather lean on a human.

Bradlow: If I’m going to use an algorithm and offload some of it, that has to be much more accurate than in some sense, we as fallible humans.

Massey: That’s right. That’s what we’ve observed. We, along with some colleagues at Penn, we’ve called that algorithm aversion. And again, it’s not to say that people just innately don’t trust models. We’re happy to play with models, to lean on models. But if we see them performing perfectly, we’re much harsher in our treatment of them than we are humans.

Bradlow: Adi, I know you wanted to jump in here. And let’s get to you before we jump into the main topic of today, which of course are all of our passion, which is AI and sports. But please, you had a follow up.

Wyner: I guess my follow up to you, Cade, is that I think we might want to privilege a human versus a model when they both make mistakes at approximately the same rate. But how much is that disadvantage? Do we really kill models, even if they’re better than humans, just because they make mistakes?

Massey: Yeah. I can’t speak to the exact threshold where you might go back to the model, but where we’ve run experiments, we’ve manipulated exactly that. And people can observe that the model outperforms. But they hold models to a higher standard. There seems to be a model of a standard of perfection for models that they don’t use when it comes to humans.

Bradlow: Well, Adi, let’s jump in first with you with the application of AI and sports analytics. What do you see, as you mentioned statistics being the field of — let me just see if I got this right. Clearly, AI is, you’re going to have low noise, massive data. Statistics, we’re going to have typically smaller data sets, high noise, which means you have to rely more on mathematical models for decision-making. How do you see sports analytics today? How much of it do you see the use of statistics? How much do you see the use of AI? Is there some combination? What do you see happening right now?

Wyner: Certainly, there’s an incredible amount of statistics. And the reason why statistics is being used so much today, primarily, is people see the value in it. I think that’s been a giant sea change over the years brought about by the Moneyball revolution, Oakland A’s, et cetera, et cetera. People realize that using data to make decisions is a great thing to do. And that data wasn’t advanced. I mean, Bill James really revolutionized sports analytics with counting and percentages and clever ways of adding and subtracting things. Not fancy. So, statistics has certainly become essential to the operation of successful sports teams. And in lots of ways, player evaluations. None of this is particularly complex.

There are, of course, machine learning advances that have been made that have been useful. Certainly, the tracking data. And that’s for decision-making on something that we argue about on our radio show all the time. Should we have a robotic umpire? We certainly have the Hawkeye data in baseball. We have all this tracking data from SportsVision and Sportsview, that tracks where the ball is and where the players are.

And that leads to mixture models — statistics mixed with machine learning, which is these massively giant tracking data. Terabytes of information that need to be processed so you can still build statistical models. They’re still high noise, in the sense that we’re not able yet to predict whether or not, say, a running back is going to break through and get a large gain. That’s still high noise, but we have so much information. So, we have to use machine-like models to fit them.

So, that’s what’s happening. I still see say that we’re probably 90% in the statistics domain, although a lot of the flashy stuff is coming out of the machine learning and the AI. The things on TV that where everything is labeled and you see all these great predictions. They’ll tell you things like, what’s the probability of a catch? That’s a mixture of statistics and an ML or machine learning model.

Bradlow: Cade, one of the things you even talked about in your own bio was the blending of models and experts. How do you see that blend happening in the field? Let’s talk about sports since I know you do work with a lot of sports teams. How do you see that blending of machine learning AI and human experts today? What are some good examples?

Massey: Well, sadly, it’s largely uncomfortable. That historically, these come from very different communities and they don’t necessarily play well together. The organizations that actually are blending them most successfully are in some way forcing the groups to work together. Because, you know, a traditional decision-maker in the NFL, say, isn’t super interested in dialing in the computer science guy who just graduated from MIT. That’s not the way they usually make decisions. The organizations that get it well have leadership to say, “We are going to do this.” They don’t make it one person’s model or another. It’s more the team’s model.

The sports that are furthest along, especially on the personnel side, are baseball — and baseball, they do have models that are the team’s model. They lean on them heavily when they evaluate personnel. They’ve built those things up over years. These things are not something someone writes down one time and they’re off and running. They tend to be highly iterative. The modelers get lots of input from the traditional decision-makers. They get hypotheses from the decision-makers. The decision-makers point out things that models are missing. These things should be iterative, and a dialogue between the two sides. That’s tough to pull off, and so it’s relatively rare.

Bradlow: Cade, let me just follow up with a question on that. You mentioned the idea of using, whether it’s humans or theory, to come up with hypotheses that are tested. Just you as a scientist, how important is that? Or can’t I just use AI and machine learning algorithms to find patterns in the data, and that’s what it is? I’m saying this in a facetious way, because I have an answer. But this is about you guys, not me. Why not? Why need theory? Why not just explore the data and see what we find?

Massey: Adi is going to have a deeper answer than I am for this, because it’s very much in line with what he’s been talking about. But sadly, for some of us, in many domains of sports, we just don’t have the data to support that kind of exploration. So, we have to bring more structure to the conversation. We can’t just set the algorithm loose and see what it tells us. There are certain places where that can happen. But by and large, we just don’t have enough data for an algorithm to learn reliably. For example, I’m often worrying about personnel. And we just don’t see enough people — there are a lot of variables that you could lean on. We don’t have that many observations. We need to bring a little structure. We need that human hypothesis in order to make traction. That’s the way I would go about it. I’m curious how Adi’s going to talk about it.

Wyner: Well, I’ll start with just simply agreeing. I mean, we think we have a lot of data because we have lots and lots and lots of observations. But it’s not nearly as many as you think you’d want because you have so many variables as well. And a lot of times the data is not new data, it’s just more and more data of more or less the same thing. And what you need in statistics to do good forecasting and good model building is lots of independent data. We really don’t have that. You think about the incredible amounts of observations you can take using tracking data — it’s terabytes in a game. But that’s still one game. It’s still only a couple hundred plays and only one victory, right? So, all of this is highly correlated, which means we just don’t have as much data as you think. So, I’ll start with that.

The second thing is, when you’re dealing with noisy models — and player performances are fundamentally noisy. Meaning, you cannot look at a player in the minor leagues, or watch them swing or pitch or run, and predict what they’re going to do on the field. It’s just not possible. They’re human beings. They’re not robots. Which means that — and this is the technical word — “overfitting” is easy to do. And if you throw a machine learning AI model, at a high noise, not so many independent observations, you are going to get it wrong. This is a fundamental area of research. And this is what we’re doing, and what I’m working on with my grad students and my students to figure out how to use AI and statistics in this exact setting to build good models. And it’s just not trivial. You just can’t throw it in and crank it as if it will just automatically run for you.

Bradlow: Cade, let me ask you. I would think that a lot of the work that firms are doing today is about collecting these novel data sets. Whether it’s — you know, we have an episode on neuroscience, so maybe it’s brain data. Or maybe its motion tracking data, that Adi said. Or maybe it’s eye tracking data. How much do firms recognize that, in some sense, we don’t have the data we need. And we need to do, you know, whether it’s putting sensors on players, or measuring sleep or rest? How much do firms realize that in some sense it’s better data that’s going to solve these problems, not necessarily more correlated data?

Massey: I think there is pretty healthy appreciation of that now. Of course, there’s variation. Teams vary in their quality of ownership, quality of management, and not everyone sees the opportunity there. But increasingly, teams do see the opportunity. And of course, some sports are, again, farther ahead on this than others. I mean, a sport like baseball, they are way down the path. And because they’re so far down the path, they are getting quite inventive in the kinds of data they’re looking for in search of an edge.

Sports like football and hockey, there’s much more variance. But the sophisticated teams in those sports are, in fact, being creative. And one way to think about it — I think we tend to think that there’s one big model sitting inside the firm somewhere that’s spitting out answers, whether it’s play calls or who to draft. But in fact, these organizations have a bunch of little models. And they’re mostly going out and solving one problem, getting one insight, adding one measure to a player’s evaluation, using different little models, whatever they can get their hands on. It’s much more the propagation of these smaller models. Eventually, they’ll be integrated. But right now, it’s the propagation of smaller models, all in search of small edges.

Bradlow: Yeah, the one thing we hope — I don’t know, maybe you guys would disagree with this. Adi, I’d love your thoughts. There can be different models for different purposes. As a matter of fact, there should be, probably. But there should be a common data set. I know you work with a lot of students. Let’s say you want to solve a problem. Like, for example, I know we’re going to be doing a high school sports competition, and you’re going be doing something around soccer. How do you collect a data set? Like, suppose you have one data set, which is motion tracking, and you have another data set, which might be training. And another data set. How do you even think about integrating these disparate data sets together when you’re trying to solve a problem?

Wyner: So, interesting. I actually would call that data science.

Bradlow: Ah. So, that way you’ve got statistics, and now we’ve got machine learning, we’ve got AI. Now you want to add data science?

Wyner: Yeah. So, our department has been the statistics department for many years. That’s happened, that was my appointment. And all of a sudden, I got an extra title, to statistics and data science. And people still wonder, “What is it? And what how is it different from statistics?” And obviously, these things overlap. And I would say that the new direction is that, because we have so much data, it’s so large. And it needs to be integrated and managed and curated, if you will. “Wrangled” sometimes is the word people use. That task is data science. And that pushes us in the direction of CS, computer science, engineering. And that’s a hard task. I personally don’t work in it. My colleagues in engineering and CS and in some of my colleagues in statistics do more of that. It’s a challenge.

And I would say that teams invest heavily in the kinds of personnel who can do those things. It’s a big direction. And it’s expensive because they’re highly in demand in lots of areas. Having people who can collect data, integrate them, make dashboards, software — this is not modeling. This is not statistics. Not even machine learning, or AI. It’s just the support structure, which is expensive and needs to be done. We don’t even have a lot of the basic datasets.

One of the things that we in the university land — we have to deal with only what’s public. And sometimes that’s hard. And there’s often very good data with the individual teams, and some consulting firms have it. And you can, of course, buy it. But a lot of that data set is at a cost that is outside of our budgets. This is always a challenge, and we’re working on it.

Bradlow: Cade, I know you have some thoughts on this as well.

Massey: I just wanted to emphasize — Adi got there eventually. But I wanted to emphasize that there’s the back-end side of that, and there is the front-end side as well. And as analysts, we tend to lose sight of both of those things. The computer science part of integrating these datasets, building the plumbing and the infrastructure, is vital. You can’t do anything. And often the first hire in the space for an organization going down this road is on the CS side.

And then Adi eventually talked about like the dashboard, the front end, these organizations all have these, they do have these database systems that people drop reports into and people pull reports out of, and it is the single primary way that that the non-analysts interact with the data. So, it’s that front end, that dashboard and related issues on the front end is vital. And that’s development. I mean, this is, again, not the traditional data science that we think about. And yet it’s vital to the way data are used in the organizations.

Bradlow: Adi, let me start with you. When you think about the most sophisticated use, or interesting use for you, of AI, machine learning, statistics, or data science today? What is it for you? What’s the one that you find the most interesting? And if you want you could rely on, what’s interesting to you is what you and your students are doing research on. What’s most interesting to you that’s going on today. And then, Cade, I’d like to ask you after that, if we’re sitting here in 10 years, what do you see us talking about? Well, we’ll still be on our Wharton Moneyball show in 10 years and we’ll just talk about it there. But Adi, what’s the most sophisticated application you see today?

Wyner: Well, there’s a lot of really sophisticated ones that I’m not sure have really borne fruit yet. So, there have been a bunch of competitions in football that have released tracking data. And some of that information has been used and incorporated by teams, by ESPN or the NFL, in what they call next generation stats, to provide you the kind of information you couldn’t have dreamed of years ago. One example would be, you see a running back is handed a ball and you want to know, well, on average, what would an average running back do in that situation. And then you can compare what the actual running back in that situation did, and you post that that delta, that differential. And that’s an example of either poor performance or great performance. But you wonder whether or not that model was really high quality.

And some of our students have looked at this. And it’s been interesting, but I don’t think we’re there yet to see that these things have really provided value, because the data is hard to make good sense out of it. So that’s one kind of problem. The problem that I’ve been working with, with our students, is actually on uncertainty. So, a model produces a forecast.

Bradlow: All right. Well, let me get right to it, then. We’ll probably talk about this on our radio show tomorrow, but let’s talk about uncertainty for a second. The New York Giants are fourth and one from the 17-yard line in the game yesterday. And the question is, should they kick the field goal? Forget that they missed it. There has to be — a lot of people are like, “You always go fourth and one.” And you’ve shown that’s not true. Can you talk about, briefly, the role of uncertainty in these models, and that most people think it’s a yes/no decision when it’s really not?

Wyner: OK, there are a couple of issues here. When you build a model, the model will give you what we call a point estimate. And the point estimate will tell you, for example, the probability of winning the game if you go for it, and the probability of winning the game if you don’t go for it. And then you can easily just pick the one that’s higher.

Now, the problem with that is we don’t really know the probability. We’ve had to estimate it using a model. And there is a whole bunch of actual probabilities that are all kind of equally supported by the data. And what we want to do is take all that information and say, “Wait a minute. If I had gotten a different set of games, different universe of information, and tried to build the same model with that, would I’ve come up with the same decision?” And if every single time I do that, it’s always, “Go for it,” then you can be clear that that’s a firm decision. If, on the other hand, only 55%t of the historical — what we called simulations, the word, actually, is called bootstraps — of the historical data produce one result, then you have to throw up your hands and say, “I don’t really know.” And you have to communicate that to the coaches on the field to say, you know, “We don’t really know. You decide.”

Bradlow: Cade, let me ask you, since I believe in some of your research, you’ve covered the topic — and then we’ll get to the future in 10 years — you’ve covered the topic, or you’ve done some research on, you talked about algorithm aversion. But how about risk aversion? So how does uncertainty play into, you know, you’re asking a human decision-maker to trust a model, when there’s massive potential uncertainty. He, she, they see the situation in the game, and then decide, “Hey, you know what? My read of this is, the uncertainty does not swamp out what I’ve historically done.” How do decision-makers think about risk aversion in using models?

Massey: Well, I think the primary role of risk aversion is there’s an asymmetry in the way you get it wrong. If you get it wrong doing the conventional thing, the punishment is less harsh than if you get it wrong doing the unconventional thing. That’s the strongest dynamic there that affects, especially, coaches using these models or not in decision-making on the field.

But what I love about what Adi and Ryan are doing in this research, is they’re introducing this idea — so model output that we’ve been talking about for humans for a long time, people need to know what they know and what they don’t know. They need to know when they’re sure and when they’re less sure. And they need to be able to say and own and explain, “As part of my recommendation, this is one with high confidence, this is one moderate confidence, or this is one of light confidence.” We’ve talked about that with humans for decades.

And now, Adi and Ryan are saying, “Hey, we should do the same thing with our models.” And it’s vital, because those guys on the field are going to be incorporating whatever comes out of the model with other factors. And we as analysts have to recognize, there’s always other factors. The models rarely, if ever, have every possible consideration. There’s always other factors, and so it does depend on how sure the model is. If the model is absolutely certain, then it’s going to swamp other factors. But it’s not always going to be absolutely certain. And we haven’t talked about models in these terms in the past. It’s a fantastic new development from these guys.

Bradlow: In the last minute or so, Adi, tell me. If we were sitting here 10 years from now, what have you been working on? And what do you think the field of AI and machine learning and sports has done over the next 10 years? Or if we’re sitting here 10 years from now, over the past 10 years. What are the big new advances?

Wyner: Well, I’m fairly certain that we’re going to have unbelievable graphics. We’re going to have great statistics that are going to be computed on the fly. I think our broadcast experience will be really different. I’m hoping that we’ve made progress on some of the really big questions which are still amazingly unanswered, which is injuries. We have really no clue how to predict an injury, although there are lots of startups trying to offer that product, if you will, even in its infant stage, to teams. Because we just really have a hard time with that.

I think player evaluation in the complex sports will become much better. Really good at it in baseball. There’s still some things to work on. I’m working on them. Can’t keep my hands off of it. But I think in football and in basketball and soccer, you’ll see much better ways of evaluating the players, building rosters. And this is going to become necessary. Every team is going to have to have it. It’ll be baseline. Right now, in some sports, you don’t have to have anyone and you’re not automatically behind. Except in baseball. I think it’s going to be everywhere. Every team will have a substantial staff doing the routine things that just need to get done. That’s the extent of my imagination. I’m curious to know what Cade has to say.

Bradlow: Me too.

Massey: Well, I’m on the same page with you. The newest frontier is biomechanics, and the greatest opportunity there is in injury reduction. That’s the biggest development we’re going to see. The steepest development will be in biomechanics over the next five years.

The one that I hope — I agree with already that player evaluation is going to get better. I think, in particular, the reason baseball has always been ahead is that we can kind of just add up the players in a linear model and get a reasonable output for the team. We suspect there are interactions on the basketball courts, football field, soccer pitches, and we don’t have a great way of modeling those interactions right now. That means there are players that we think are doing more than they actually are to advance the team’s success. And importantly, there are players who are making very important contributions that are being underappreciated right now. And I’m hopeful I believe that the models will get better at identifying those in the next five, maybe 10 years.

Bradlow: Well, I’d like to thank my colleagues, friends, Wharton Moneyball co-hosts. Cade Massey, practice professor here at OID, also the faculty director of Wharton People Lab, Faculty Co-Director of WSABI, and co-host of Wharton Moneyball. And my colleague, Adi Wyner, professor of statistics and data science and — wow, he’s running everything we’re doing here in sports and statistics. I’d like to thank both Ado and Cade for joining me for this episode on AI Sports.

Transcript

More From Knowledge at Wharton

Who Benefits in the Deal Between Reddit and OpenAI?

This Media Bias Detector Analyzes News Reports in Real Time

Tracking Data in Sports

Looking for more insights?