Wharton professors Hamsa Bastani and Marissa King join Eric Bradlow, vice dean of Analytics at Wharton, to discuss AI’s widespread application in health care: where it’s already being used successfully, like radiology, and where challenges loom large, like implementation and clinical resistance. This interview is part of a special 10-part series called “AI in Focus.”
Watch the video or read the full transcript below.
Eric Bradlow: Welcome to the Analytics at Wharton AI at Wharton podcast series on artificial intelligence. My name is Eric Bradlow, professor of marketing, statistics, and data science here at the Wharton School. I’m also the vice dean of analytics, and I’m the one that’s been hosting this podcast series. I think that artificial intelligence and machine learning in combination with industry is going to solve the health care problems we have today. AI in health care is such an important area, and I can’t imagine two colleagues better to talk with me about that topic.
First, I have my colleague, Marissa King. Marissa is the Alice Y. Hung President’s Distinguished Professor at the Wharton School. Her research has significantly contributed to our understanding of a wide range of pressing health care issues, ranging from the prescription drug abuse crisis to clinician burnout. Welcome to our podcast.
Marissa King: It’s a pleasure to be here.
Bradlow: And then next and certainly not last but not least is my friend and colleague Hamsa Bastani. Hamsa is associate professor in operations, information and decisions, as well as a colleague in the statistics and data science department. Her research focuses on developing machine learning algorithms for learning and optimization in health care. And as I know very well, because I’ve interviewed her for other podcasts, she’s done a lot of work with Greece and Sierra Leone to deploy algorithms at a country-wide scale. Hamsa, welcome to the podcast as well.
Hamsa Bastani: Thanks so much for having me.
Bradlow: Marissa, I’ll start with you. For those people that aren’t familiar with the applications of AI in health care, what are they? If you could just give us a broad overview of the kinds of problems in health care people are trying to use artificial intelligence to solve, and maybe it’s in combination with machine learning and other types of algorithms.
King: Machine learning and artificial intelligence have touched almost all aspects of health care at this point. If you think of everything from how you get reminders to pick up prescriptions, from who’s reading your radiology reports, to even how you’re being triaged in the emergency department, machine learning plays a key role in all of those facets.
Bradlow: Before I jumped to Hamsa here, let me ask you a question. Are the reminders that we’re all getting, are those being determined using some optimal algorithm? Is triage being done in a not entirely 100% human basis? And in terms of who is reading my charts, is that being done a lot in an automated way? So maybe just give us a baseline there.
King: Pretty much in every one of those applications, machine learning and AI is playing a critical role. When you get those reminders, it’s almost certainly coming from an AI-powered reminder. The same is true if you’re thinking about reminders to pick up labs and get your blood work done. That’s the point of care for patients.
But clinicians are also starting to deploy this for a wide range of uses. If you think about radiology reports, that’s arguably the place where AI has had the greatest penetration. Many, many of our radiology reports are read now by machines. And then finally, if we think about triaging in the emergency department, that’s another important area of application that’s really reducing overall length of stay within the emergency department. And we know that when the length of stay is reduced, that has important implications from everything from complications to long-term mortality. So, AI is already playing a critical role in a lot of domains of health care.
Bradlow: Hamsa, as someone who is both a statistical methodologist, but also cares about the practical application of the work, how do you think about the kinds of problems that Marissa just laid out? Like, do you try to solve them in some idyllic setting? Do you try to solve them with real data? Do you try to kind of develop algorithms and then try to run field experiments, actually launch them in the field? How do you think about your role as — similar to me as we’re statistical methodologists — how do you think about your role in helping solve these problems?
Bastani: I think it has to start with the data. I think in health care, there’s a lot of variation. Only recently in the last couple of decades have we started digitizing the whole health record. But even now EKG readings, for example, aren’t digitized in most health systems. So, I think the first thing we need to figure out is, is this a use case where an algorithm, given the data that is digitized, is able to do better than a human? Or at least comparatively to a human, in a way that’s equitable and also transports well to other health systems. And if that’s the case, then we start thinking about algorithms and RCTs. Because there’s lots of other things that come up, like whether we’re able to effectively integrate it into the workflow, whether there’s buy-in from stakeholders and so on. But I think it has to start with the data.
Bradlow: So Marissa, Hamsa said a lot of things I want to ask you about. Let me fire these through in a rapid-fire kind of way. How much are people in the industry worried about equity? It’s easy to say you care about it, but how much does it impact? Like, let’s imagine I could have an algorithm that improves the outcomes for some population of people but not others. It’s not equitable. But from a population perspective, it could still benefit society. How are people thinking about equity?
King: I think it’s certainly a point of concern, in large part because some of the key issues arising around equity with algorithms that have already been deployed have been made quite public. But at the same time, I do think it’s a second-order consideration, and that’s going to, I think, have really important long-term business applications. Because in the long run, as businesses start to deploy these algorithms, when issues around equity come to light, I think it’s going to have really significant impact for their bottom line. I think in very short order, it will go from being a second-order consideration to a primary consideration. Or it should.
Bradlow: One of the things I’ve been talking to all of our guests on the podcast series about is what are the hindrances to actually getting this stuff implemented? Whether you want to call an algorithm aversion, there’s like, “Well, how do I know this machine learning algorithm is right?” What kind of barriers are you seeing or hesitancy you’re seeing in the field, especially when it’s health care?
King: I think it is a first-order consideration. One of the biggest issues is that, as Hamsa mentioned, data is certainly an issue. But a bigger issue is that to deploy these algorithms well, you need to actually have a very deep understanding of clinical workflows for them to be incorporated. Even when clinicians are willing to accept them, you still have to integrate them into workflows. And that point of integration seems to be one of the biggest challenges at the moment.
Bradlow: That’s a perfect segue to my question for Hamsa. That’s probably the biggest problem I’ve had in my career, and you’re going to tell me now how to solve that, where I develop algorithms. But the fact is, getting them actually in someone’s workflow is really tough. How did how do you think about that? How did you deal with that when you’re working with Greece, with Sierra Leone? You know, you have an algorithm, but you can’t just hand it off to someone and say, “Good luck.”
Bastani: Right. I think that’s an excellent question. I think it depends on the complexity of the setting and how well a human is trained to answer that particular question, compared to how well an algorithm might be. I think a lot of public health questions, where you’re trying to forecast demand for health resources or you’re trying to figure out which population to screen, and it’s kind of rapidly evolving. That’s kind of the work we’ve done. It makes sense. Even policymakers or public health experts think that algorithms are better suited to assessing those situations because there’s large volumes of data. I mean, assuming that you’ve built an interpretable system that they can look into and check the reasoning of the algorithm.
I think in health care, it’s harder. A very famous example is the sepsis alarm that’s in a lot of ICUs that’s being deployed now. I think a big challenge is those algorithms, they’re not always aware of the private information the physician has. So, a lot of doctors expressed frustration that when the alarm goes off, they already knew that the patient was crashing, they’re working actively to stabilize the patient. And this thing is just irritating them. This causes something called alarm fatigue. I think algorithms need to be, exactly as Marissa said, designed in a way that is aware of what knowledge the human decision-maker has, and is able to complement it in a useful way. And that’s not how we do machine learning.
Bradlow: That’s a fascinating idea. Let me give you my example that I always like to give in sports, and then I’ll translate it to health care. Which is, what is the role of scouts in sports? Like, can’t I just measure everything, and then just the algorithm is going to tell me who the better player is. But they may have private information that the algorithm doesn’t see. I always talk about blending the two. Can you talk to me or us a little bit about how it shouldn’t really be AI or humans, it really should be AI and humans, in health care?
King: In healthcare in particular, this is not negotiable in many ways, in large part because of regulation. If you think about what’s happening at the forefront of algorithm development, almost all the models at this point are thinking about a physician or clinician with an AI co-pilot. That model, I think, is going to be the one that is most likely to prevail, both for regulatory reasons, which in many ways you can’t get around, but also for getting clinician buy in, which is absolutely essential and seems to be a huge hurdle at the moment.
Bradlow: Marissa just mentioned a word, which is an interesting word when we develop algorithms, which is whether it’s regulation or restrictions. How do you think about that when you’re saying, “Well, here would be the optimal solution in an unrestricted world with unlimited data and the ability to do whatever you want?” Or now that there are restrictions, here’s — whether you want to call it the loss of efficacy — how do you think about restrictions when you’re thinking about building algorithms?
Bastani: I think it’s a great question. I think in most of these cases, humans still have this valuable private signal. We do want them to override the algorithm. But I’ve heard a lot of concerns that people are worried about malpractice lawsuits and things like that. If an algorithm is, for example, FDA approved, they would rather be more conservative and err towards the algorithm. We’ve had a lot of talk about algorithm aversion. But I think it also goes the other way, that sometimes there’s overreliance on algorithms because it creates a more established paper trail.
But ideally, we would have better training so physicians are able to understand what are the limits and the capacities of these algorithms? What is the correlation between the information that’s bringing to the table, and their own private signal? So that they’re able to more effectively combine these signals in a, I guess, Bayesian way. And I think that will be necessary to get actually good outcomes.
Bradlow: You just mentioned something I was not aware of. Maybe, Marissa, you could educate me and our listeners here on Sirius XM and in this AI at Wharton podcast series. Do these algorithms have to be FDA approved? And if the answer is yes, usually the gold standard for approval is randomized experiments. I mean, imagine running a randomized experiment, and now people are dying. So how do you think about getting kind of the gold standard of evidence in cases where you’re testing an algorithm? How is that thought about?
King: Yeah. All these algorithms require FDA approval. There have been more than 500 algorithms that have already been approved for use in clinical settings, so you can get a sense of just how many algorithms there are that exist. I think the lack of clinical integration is also highlighted by how few of those are actually used in practice. So certainly, regulation is a key point for this. And there’s a lot of debate over how well they’re actually being regulated. The current standard is trying to compare them to existing clinician performance. But we know that algorithms, particularly when they’re exposed to new data, will oftentimes deteriorate. So, the question is both how well does it work compared to clinicians, but also on how large of data sets. And those are two current criteria that are really key.
The other interesting piece on the regulation side — and I’m not a lawyer. But my understanding is that, ultimately, responsibility still lies with the clinician. So, if there is going to be a lawsuit, even if an algorithm does have FDA approval, the responsibility finally lies with the clinician. That’s another regulatory and legal challenge in terms of getting large scale deployment.
Bradlow: Can you give our listeners here on Sirius XM just a sense of what are these algorithms? If you want to think about them from the most impactful and efficacious to the least, what would sit near the top? For example, let’s imagine you had something that could read EKGs that could prevent heart attacks at a much higher rate than a human. That would seem to be, given the frequency of heart attacks, pretty efficacious and important. Can give us a sense of — or the way I like to describe it, I’m an effect size person. Tell me the algorithms that you think are having the big effect sizes on a large population?
King: The algorithms that seem to be enjoying the greatest success and having the biggest impact on health care and health care outcomes do seem to be the ones that are focused on radiology. If you imagine that you show up at the emergency department and that you may be having a stroke, the ability to get that scan and read that scan quickly is critical. Time is of the essence in saving lives. And the deployment of AI to read radiology reports — which then do set an alert that speeds up the rest of clinical care. So, a human is looking at it. But that acceleration seems to have a huge impact on both health outcomes and the cost and quality of care.
I would put the more radiology-focused machine ability to read various scans of whatever nature those may be, or radiology reports, seem to be the area where there’s been the greatest penetration. I think that’s where the greatest impact lies. Where things get trickier are when you think about things that require a deeper integration into clinical care. And I think where the therapy is seeing the biggest point of resistance is actually if we think about things that are directly patient-facing.
Bradlow: Marissa just mentioned something that I haven’t thought about for a while. I used to spend a lot of time working on methods that I called real-time approximation methods. I haven’t worked on these in a while. But as Marissa said, if someone comes in — you know, I hate to put it this way, but I’ll use the word Bayesian. I can’t run my Bayesian MCMC sampler overnight to get some result that gives me something and of course, the patient may have died by then. How much do you think about — as we as academics are supposed to, in theory. We’re supposed to come up with good answers. The fact that it may not be real time, that’s — I hate to put it this way, but that’s someone else’s problem. How much do you think about that when you’re trying to come up with solutions? Like when you’re working with COVID testing. Someone’s coming through the scanner. You can’t say, “Well, give me a few hours. Why don’t you just sit over here on the side while we decide what the likelihood of you having COVID is.” How do you think about real-time nature of things?
Bastani: I think sometimes that changes the algorithm that you use, so sometimes it’s better to use something that’s computationally easier, that might be slightly less accurate, because it’s actually practical. And another big thing we do is batching. Like batched updates. Like Marissa was saying, one of the issues I think the FDA is not monitoring is making sure that the algorithms actually evolve over time as the patient population changes and adapts to events like COVID. I think that should be part of the regulation but isn’t yet. But we do need to monitor these algorithms as the ICD system changes, as scanning imagery changes, and so on. And I think doing batch updates on as fresh data comes in is kind of a critical part of that. And that solves the computational issue.
Bradlow: I see. Let me ask you a question. When we as academics approach these problems, maybe even approach companies, I can imagine one of two reactions. “What are you doing here?” Or number two, “Oh, thank you. The academics have come to help us.” I think I can safely say the three people in this room care more about the way our research impacts practice, then probably — we’re in the top decile of Wharton faculty. But what’s the reaction of industry when academics want to get engaged here?
King: Yeah, I think that the engagement of academics is absolutely critical. And I think this is where Analytics at Wharton has particularly a huge role to play. If you think about the nature of the health care system, in general, it’s highly fragmented, with stakeholders having various strong positions for a variety of different reasons. You have insurers, you have regulators, you have the people delivering care. There’s many, many players in this space, and to solve health care’s biggest challenges, you need to get them all to work together collaboratively.
And because many times the positions from which they’re arguing, they may have different incentives or misaligned incentives. Having a neutral convener who can bring together all those parties and tackle them from a place of scientific basis and a place of neutrality, and act as a really a convening organization, is really, really critical. Certainly, algorithmic tools can be useful. But in order for them to have broad penetration to tackle the most pressing challenges, you need coordination among stakeholders. And I think that’s where academia can play a really important role.
Bradlow: Marisa reminded me of something that’s in neither of your bios, but it’s going to change after today. I should have mentioned, even more importantly, that Analytics at Wharton is launching a Healthcare Analytics Lab under Hamsa and Marissa’s leadership. For those people interested, which is, I would think, everybody, you could go to the website and see about all the work that Hamsa and Marissa will be leading us in.
Could you talk, Hamsa, about the role of uncertainty? An algorithm comes up with a suggestion or a recommendation. But one of the things that’s important is, if it’s 60-40, does the clinician have the right to know that this is 60-40? The model is saying, “This is better than this.” But maybe it’s not that much better. How do you think about that, whether it’s a dashboard you’re creating, or you’re providing recommendations, you’re doing some form — since you’re in OID, which means you also care about optimization. When you think about optimization, what is the role, and how do you think about uncertainty?
Bastani: I think is super important, as you probably agree. I’ll talk about the humans first. When are thinking about human-AI collaboration, I think uncertainty is one of those critical pieces of information that you have to convey so they know when they should override it and when they shouldn’t. But I think one of the challenges has been that in behavioral experiments, when you show uncertainty, people often tend to over-trust the algorithm because they think, “Oh, not only did it give me a point estimate, but it also gave me a measure of uncertainty.” I think this is part of the training thing that has to happen, that we want people to intervene in a preferential way when the algorithm is uncertain.
For optimization, it’s a little bit easier. We’ve built a lot of tools in stochastic optimization that account for uncertainty so that we’re targeting, for example, the right quantile of uncertainty rather than just using the mean or point estimates. Because typically in underserved populations, we’ll have a lot more uncertainty, and we don’t want that to result in them getting fewer resources.
Bradlow: The next question I’m going to ask Marissa is my favorite question to ask anybody when it comes to analytics. And usually with me, it’s about some sort of food or beverage. I’m taking you outside my personal life. I’m talking about my professional life here. If you think about the way that AI can impact health care, I’m going give you one of three options. You can have better data. You can have better mathematical models. Or you could have better adherence by people in the field to what we as academics — you can’t pick all three. We all want all three. Which one is the big, if you’d like, impedance right now to advances we’re making? Is it lack of data, lack of better models? Or is it, we’ve got all that stuff, just these damn people just won’t listen to us.
King: I think it’s the latter. If you think about the data challenges, the data challenges still loom large. But we have now the ability to work with large enough data sets that this is starting to become a solvable problem. I think, particularly on the electronic health record front, it’s still a challenge in the sense that most of the time we’re going hospital by hospital to deploy these things. But it’s still like — the data is OK.
The second piece, the algorithms I don’t think are the challenge. If you just even look at how many are FDA approved, right? And I feel like Hamsa and I could sit down probably tomorrow and write an algorithm that would certainly improve care in many, many ways,
Bradlow: I’m counting on that.
King: The biggest challenge is really implementation and integration. And the same is true as was true with data, in the sense that most of the time these things have to be rolled out system by system, hospital, by hospital, doctor’s office by doctor’s office. And even within those rollouts, there’s a lot of clinical resistance. So, as I watch these things be deployed in various settings, right, I can’t tell you — I don’t even want to disclose how oftentimes the algorithm is overridden. Medicine is particularly challenging in the sense that clinicians have a lot of expertise and a lot of authority. And there’s a strong deference to that expertise and authority, as there should be. But it makes changing the way that they think and questioning their judgment, particularly difficult. So, I think it’s the latter. If I could improve anything, it would be adoption and implementation.
Bradlow: Hamsa, I know an issue you’ve thought quite a bit about and are planning on doing a lot of work on is educating people in these methods. What’s the process? My brother’s a cardiologist. He’s also a researcher. I’m pretty sure if you present him the ideas of confidence intervals and prediction methods, he might get it. But many doctors, that’s not their job. So, how do you present information? Or how do we even educate this large population on these algorithms? Because you used the word before, I think, “interpretable and explainable.” Maybe they’d be like, “How do I know? This is just some black box. Data is coming in, something’s coming out.” How do you think about your role in educating, if you’d like, the distribution channel — in this case, the physicians — on these methods?
Bastani: I think one big challenge is that people don’t know what training data was used to train the algorithm. And I mean, giving it to them wouldn’t be super-useful anyway because it’s not interpretable. But I think a lot of the reasons that we want humans to override these algorithms is because the data that they’re seeing is possibly an outlier or has a different distribution than the data that it was trained on. For example, maybe they didn’t see people of this particular type. Or maybe the imaging systems they’re using now are a little bit different from the one that was used in the training data. And I think partly it’s on us to provide these signals. But I don’t think they’re immediately interpretable several
I think some kind of training where we show them historical examples of when the algorithm went wrong, and when they might have had a right prior, and when the algorithm didn’t go wrong, and when they overrode it. Even just showing them their own decisions historically and seeing when they should or shouldn’t have overwritten the algorithm would help a lot. Because people need feedback. Health care is very expensive, and these people’s time is very expensive. Setting aside the time to do that training, I think is important and costly.
Bradlow: Well, that was going to be my next question for Marissa. Do you see whether it’s hospital groups? Or do you see, I don’t know, the American Medical Association, or the American College of Surgeons, do you see them coming to people like yourselves and saying, “Help us. Train us en masse. Come up with video series.” Or something, so that these people that are overburdened, overworked — in general, you could argue underpaid. How are we going to train them? Because we want to save lives.
King: Yeah. I think, unfortunately — or fortunately, depending on how you look at it — the greatest opportunity is actually just going to come from pain. We know that we’re facing a clinical health care worker shortage. By 2033, there’s going to be a massive shortage of health care workers. And if you already look at the issues around burnout, with a majority of clinicians burnt out, there’s already an enormous amount of pain and overwork. I think in many ways that we’re most likely to see adoption cutting coming from the bottom up, where the clinician is asking. And I didn’t talk about clinical notation, but that’s a huge area in which large language models and machine learning can play a role by starting to do some of the work, and the tasks that clinicians don’t need to be doing. And by saving them that time, then you can allow them to reconnect with patients, deliver higher quality care. I think in many ways, education is certainly going to be more important. But based on my experience, people are really much more willing to adopt things when there’s a real need for them. And right now in health care, there’s a strong need for help to augment clinical workflows, particularly with the things that clinicians don’t need to be doing. I think, very soon we’re going to be seeing it.
Bradlow: That’s a great point. I just had an example the other day where someone — I’ll just say, with a large investment bank — told me that all of their meetings are now reported. Now the agent or the investment advisor doesn’t need to take notes, because all of that is automatically put into the system. Any types of decisions they mad, get automatically implemented, because they’re now, in this case, voice recorded automatically. Now the investment advisor can spend time on training and other forms of doing her, his or their job actually better, which is a really great point.
Maybe in the last minute or two that we have, let me ask each of you. I’m trying to ask each person in this podcast series about this. Let’s say we’re sitting here 10 years from now. That’ll be my 38th year at Wharton. What are we talking about that either you, or you think the field of AI in health care from an algorithmic or data perspective — what have we seen over the past 10 years? What’s your hope and dream at least, even if it’s not going to happen?
Bastani: We will definitely have adoption on notes, for example, or things that physicians don’t want to do. I think we’ll have more adoption in developing countries where there isn’t — I know we’re short staffed even in the U.S. — but where there isn’t as much health workers staff to reach underserved communities. So, automation is being more adopted in those locations. And I think definitely for things like radiology, I think we’ll still be facing challenges with things like alarm fatigue and adoption for places where we have clinician experts. And I think with models like GPT coming out, I think it’ll be easier to educate people on machine learning and be able to better enable this human AI collaboration. But I feel like that is going to be a big challenge even 10 years from now.
Bradlow: What do you think we’re so what are we going to see out there, whether it’s in the field or what we as academics are doing?
King: I mean, both Hamsa and I’s goal at the Analytics Lab is to try to improve access to care and quality care for all, and I think that that’s hopefully where analytics will take us.
Bradlow: Well, I’d like to thank both of you for our podcast series episode here on AI and health care. I’d like to thank again my colleague, Marissa King, and Hamsa Bastani. Thank you again for joining me.