Can Algorithms Diagnose Disease Better Than Doctors?

Artificial intelligence has major implications for medicine. Proponents say the technology holds great potential in predicting drug interaction, infection risk factors — even in cancer diagnoses. But how can scientists use AI to make differences in patient outcomes? Two University of Pennsylvania researchers offer a plan that focuses on adherence to high standards and strong regulation.

Ravi Parikh is a fellow in hematology and oncology at the Perelman School of Medicine, and Amol Navathe is a senior fellow at the Leonard Davis Institute of Health Economics and Wharton professor of health policy and medicine. They spoke on the Knowledge at Wharton radio show on SiriusXM about their paper titled “Regulation of Predictive Analytics in Medicine.” (Listen to the podcast at the top of this page.) The paper, written with co-author Ziad Obermeyer, acting professor of health policy and management at the University of California, Berkeley, was published in the journal Science.

An edited transcript of the conversation follows.

Knowledge at Wharton: Can you set the stage for us about the use of algorithms in the medical field right now?

Amol Navathe: Algorithms and data have been used in medicine as far back as we can really think. As clinicians, we use decision rules all the time that are based on pretty traditional statistical types of models. What’s new here is that we have an expanse — troves and troves of data that are being collected as a byproduct of providing care — in the electronic health record. All of a sudden, we have the techniques to be able to harness all of that data to try to improve clinical care.

Ravi Parikh: The nature of some of these algorithms is that they can generate predictions in real time, in automated fashion. I think about decision rules that I use in clinical practice that require often tens or dozens of variables of manual input, taking up time during which I could be talking to patients. One of the promises of these algorithms is that they can provide predictions right at the point of care.

Knowledge at Wharton: How do you differentiate between the algorithms of the past and the more advanced ones that we have now?

Parikh: There are a couple of points. One is just the capacity of the number of variables that algorithms nowadays are able to account for. When I think about decision rules, I think about usually less than 10 things to input. These algorithms can account for tons of vital signs, variability in vital signs, things that we’ve entered into electronic medical records over the past years or decades beforehand. That capacity to interpret all of this information is probably the most unique thing that advanced algorithms can account for.

“One of the promises of these algorithms is that they can provide predictions right at the point of care.” –Ravi Parikh

Knowledge at Wharton: Does that open a lot of doors in medicine?

Navathe: It does, absolutely. For example, there’s a certain type of heart arrhythmia called atrial fibrillation. Right now, we can use a point score. We say, “The patient is over 75 years old. They’ve had a history of a stroke.” We can add up points, and that gives us some sense of what the risk is. Then we know what to do from a clinical perspective — what medications to prescribe or how to manage that patient. However, that’s all done using a retrospective study that happened five years ago, 10 years ago. It’s not necessarily customized to my patient. What we have now is the ability to use potentially thousands of variables to say, “What is the recommendation based on what’s happening locally and customized to my patient?” That’s the potential, but that doesn’t mean that it doesn’t come with challenges.

Knowledge at Wharton: What is the scope of AI in medicine at this point?

Navathe: In some areas, we’ve seen pretty dramatic possibilities and perhaps even some realization of those possibilities. For example, there’s an interesting application that has come out where software can look at pictures of retinas and diagnose certain diseases as well as an eye doctor would. That’s pretty neat because that means we could potentially do this at scale in a very rapid fashion.

The point that Ravi and I have noted, however, is that when you take these algorithms and you apply them to clinical practice, we have to be very careful because just identifying something as “abnormal” doesn’t mean that we can change clinical outcomes. We have to think about regulating these algorithms or these interventions much in the same way that we think of anything else — a drug, a device, what have you. The Food and Drug Administration hasn’t really picked up on that. That’s where we wanted to articulate this concern and say, let’s be careful here. Let’s make sure we’re pegging this to the same standards we use to evaluate any other medical technology.

Knowledge at Wharton: Medicine is a heavily regulated industry, but there isn’t a focus yet on digital regulation. Is one of the goals of your research to get people to pause and think about where we’re going with this?

Parikh: Yes, absolutely. Amol and I are coming at this from unique perspectives. Both of us do research related to developing some of these algorithms, but both of us have interests and expertise in health policy as well. Merging those two things is how our article and our framework came together.

I want to also mention that this is still a burgeoning field, ripe for innovation, that in some ways is much more poorly understood than we understand the components behind a drug or the components behind a medical device. For example, I participate in research designing a machine-learning algorithm that helps predict mortality for patients with cancer. When we input the sheer number of variables in there, and the algorithm generates an extremely accurate prediction of a risk of short-term mortality, it’s hard to determine sometimes what variables are having the most importance within that machine-learning algorithm.

Some people have called this the ‘black box’ of AI and machine learning. So, there has to be an entirely different framework in how we interpret the explainability of some of these algorithms and what our expectations are of it because we can have better predictions. But sometimes, when we look under the hood, we don’t really understand what’s going into it.

Navathe: We’re coming at this not as critics of AI or of artificial intelligence applications in medicine. In fact, we’re proponents, understanding that we have to carry this out in a very responsible fashion where we’re really protecting patient safety.

Knowledge at Wharton: Then what role does the FDA or other agencies need to take?

Navathe: There’s an important responsibility to set the right standards, and those standards have to be around what benefits patients. It requires a little bit of nuance in terms of clarifying, because it’s not to say that the FDA hasn’t been well-intentioned to date in how they’ve approved these algorithms.

Let me take a step back and articulate one of the key pieces that makes this really challenging. On the face of it, it seems easy. We look at how Netflix or Amazon makes recommendations, and it seems like, “Hey, can’t we just use AI in medicine? Maybe we can better diagnose patients than humans can.” But when we’re trying to improve upon human clinician decision-making, the way algorithms actually see data is because of somebody like Ravi or myself being in the hospital and ordering tests.

“Let’s make sure we’re pegging this to the same standards we use to evaluate any other medical technology.” –Amol Navathe

The algorithms essentially are seeing data that’s filtered through my eyes, filtered through Ravi’s decision-making. If we turn around and say, ‘We’re going to use that same data to beat Ravi in the clinic,’ that is challenging because all of it is coming filtered through Ravi’s lens in the first place. So, we risk doing things that clinicians would think of as stupid.

Let me give you an example. A patient coming into the emergency department could have something called sepsis, which is a really severe infection that could kill them. We risk designing algorithms that essentially tap on the shoulder of the physician and say, “Hey, your patient has sepsis.” And the reason is because you ordered antibiotics, you ordered blood cultures and you ordered tests that probably indicate that you’re worried about sepsis. So, the clinician basically says, “Well, duh. That’s why I ordered those tests.” We risk doing things that will irritate physicians, but probably more importantly, we just won’t have real clinical impact.

Parikh: I completely agree. One of the offshoots of this that we brought up in the paper and is of concern is the fact that a lot of this is being filtered through clinicians’ current practices, which means that we run the risk with some of these algorithms perpetuating bad practices.

I’ll use an example from another industry. Amazon’s facial recognition software was trained on thousands of people. It was very accurate for the people it was trained on, namely white male patients. Such facial recognition software did pretty poorly when it came to interpreting faces of females or minorities. We run the same risk of reinforcing potential biases that we’re perpetuating through the medical system, whether unintentionally or intentionally.

Knowledge at Wharton: How do you control for factors outside of the health care provider, such as device manufacturers and pharmaceutical companies? Wouldn’t they need to be involved?

Navathe: Yes, absolutely. They’re largely proponents of it because it is potentially increasing evidence-based practice. The manufacturers are behind us because this could actually drive better outcomes, this could drive better use of their products. There’s an opportunity to blend the two together, to take these predictive, analytic, machine-learning algorithms and blend them together with their products to increase the quality of care, increase better patient outcomes.

Knowledge at Wharton: But there’s a big difference between the better use of their products and more use of their products. There’s an economic component for them.

Navathe: True. Not that we want to be so naive as to not recognize it, I think that there is that point. But having worked with some of these manufacturers around some of these issues, there is a general sense that there’s under-recognition of a lot of chronic conditions. There’s under-diagnosis, to the extent that better care does equate to greater use of their products. There is some alignment there. Now we need to be careful about it.

“There’s an important responsibility to set the right standards, and those standards have to be around what benefits patients.” –Amol Navathe

Parikh: Part of the promise here is that we have somewhat of a playbook to follow. We’re interacting all the time with pharma, with device-makers when it comes to regulating drugs and devices, and there are proprietary concerns that always come into the regulatory process — the nature of wanting to preserve patient safety and improve patient outcomes while preserving the drive for innovation that goes into the creation of these amazing tools.

We can incorporate a similar framework when it comes to some of these algorithms — stipulating frameworks that preserve the opportunity for, to put it bluntly, algorithm developers to make profit off their product while still ensuring that the best patient outcomes are being emphasized from their products.

Knowledge at Wharton: Are you confident that the FDA can handle this?

Parikh: Yes, I’m confident. I’m optimistic about it. One of the reasons I’m optimistic is because the FDA has already recognized this through programs like the 21st Century Cures Act and a recent proposal that was passed called the Digital Health Innovation Action Plan, which sets somewhat of a pre-market incubator for promising digital health technologies, including AI, to be ushered through the market-clearance process in ways that show they improve patient outcomes. That’s one of the first steps that, I would argue, we didn’t take with other types of poorly understood technologies, like the field of genetic screening and biomarker screening for cancer decades ago.

Navathe: When we think about the FDA as a regulatory agency and what it’s really trying to do, there are some early pieces of this that I think are incubating the ideas and really promoting it. That’s what Ravi just spoke to.

The other piece of it is, before we put things out into clinical practice, like allowing a medication to go on the market or allowing a device to go into clinical practice, we want to take a step back and say, ‘There are some first principles here. Do no harm. Make sure there’s patient safety. Make sure that it’s actually improving clinical outcomes.’

This is easily understood by the FDA because that’s largely what it’s doing already for medications and devices. Part of our message here is, let’s not lose our way. Let’s not get too intimidated by all these fancy algorithms and this new technology. A lot of the first principles still apply. Let’s just make sure that we keep applying those first principles, and we’ll do just fine.

Knowledge at Wharton: Do you think an outside body will need to work in conjunction with the FDA for oversight?

Navathe: There are three parts of this that are important to disentangle. The first part of it is incubating — creating the right environment for innovation. We have to be careful that we don’t use clinical trial-type standards very early in the process when we’re initially developing these technologies and algorithms. If we do that, then we’re going to stifle innovation.

The second piece is, after we go through that process and the traditional clinical trial-type phases, we want to apply those same types of clinical trial standards to these algorithms and these devices. It’s not trying to reinvent the standards. We want to use the same standards.

“If we keep framing those discussions in terms of replacements or efficiency, … we run the risk of turning people off to this technology before it has the chance to show its benefit. –Ravi Parikh

Then there’s the post-market piece. Right now, the FDA through the Sentinel Initiative does a lot of post-market surveillance of drugs that were initially vetted in these clinical control trials. Now they’re being used in the real world. Are there bad things happening? Are there good things happening? That’s where I think we’ll need a partnership.

The implementation of these algorithms is going to happen in real-world clinical settings where some of that data may make its way back to the FDA, but a lot of it may not. Health delivery organizations are going to have to be key partners in making sure that we adjudicate what’s working and what’s not working after we’ve already put things into clinical practice.

Knowledge at Wharton: It sounds like you’re both very positive about the future of AI in medicine.

Parikh: It’s important how we frame this to doctors and to the public because this is a very sensitive issue. When we talk about AI in radiology, for example, a lot of tools have been framed as, ‘AI is going to replace the radiologist.’ Similar discussions have been had with pathologists, as well as in a variety of medical fields. If we keep framing those discussions in terms of replacements or efficiency, as opposed to an added tool of potential benefit when doctors talk to patients, then we run the risk of turning people off to this technology before it has the chance to show its benefit.

When we’re actually testing these in clinical trials, the way to show their benefit is to integrate them in the workflow of existing clinicians’ processes and show how it improves things, rather than comparing the algorithm versus the clinician.

Navathe: AI offers this extremely exciting vision for the future. Many in the health care community ran into artificial intelligence with this unbridled enthusiasm that this is going to change health care. After many years of working in the space, we now have the humility to take a step back and say, ‘This is really hard to do this in health care for a variety of reasons.’

There are ethical reasons. There are some technological and methodological reasons that I outlined earlier. It’s really hard. We need to be very clear-eyed about what we’re trying to do and what we consider success. When are we going to raise the banner of mission accomplished, and when are we going to have the humility to say, ‘No, we have more work to do here?’ The FDA can be a key partner in doing that.