Loading the Elevenlabs Text to Speech AudioNative Player...
This article was originally published by the Wharton AI & Analytics Initiative.

Human-AI collaboration is often presented as the gold standard for modern organizations. AI systems handle scale and speed, while humans provide judgment, oversight, and accountability. In theory, these hybrid teams should outperform humans or AI working alone.

New research from Wharton professors Hamsa Bastani and Gérard Cachon reveals a counterintuitive challenge: As AI systems become more reliable, organizations may find it increasingly difficult, and costly, to motivate humans to oversee them effectively.

The Human-AI Contracting Paradox

Modern AI tools tend to fail rarely, but unpredictably. When failures occur, they can be expensive, reputationally damaging, or even dangerous. That is why organizations insist on “human-in-the-loop” designs.

The research shows that vigilance is not free. When AI errors are infrequent, humans must expend effort reviewing outputs that are almost always correct. As a result, the compensation required to ensure consistent oversight rises sharply as AI reliability improves.

This creates what the researchers call a human-AI contracting paradox. Even when human-AI collaboration would produce the best outcomes for the organization, rational leaders may choose to:

  • Limit or delay adoption of advanced AI tools,
  • Rely entirely on AI and accept occasional failures, or
  • Prefer less reliable AI systems because they keep humans more engaged.

In short, better AI can create worse economic incentives for oversight.

Why Organizations Get Stuck

This paradox helps explain why many AI deployments stall after early success. Even in instances where there isn’t a lack of trust in, or resistance to, AI from employees, there’s still a misalignment between how AI works and how humans are incentivized.

Oversight is often treated as a passive responsibility rather than an active role. When incentives fail to reflect the real cost of vigilance, organizations either overpay for supervision or quietly lose it.

Key Takeaways for Senior Leaders

  • Human oversight is an economic decision, not just a design choice. If oversight is essential, it must be explicitly rewarded. Professional norms alone are not enough to sustain attention when AI rarely fails.
  • More reliable AI does not automatically lead to better outcomes. As AI error rates decline, the cost of motivating human vigilance increases. Leaders should anticipate this trade-off rather than assuming reliability solves governance problems.
  • Specialization beats uniform reliability. Organizations benefit when AI is predictably strong at some tasks and predictably weak at others. When humans can tell when AI is likely to fail, oversight becomes more targeted and less costly.
  • Redesign roles around judgment, not constant monitoring. Humans add the most value when they decide whether AI should be trusted, not when they are asked to check everything.
  • Align AI governance with incentives. Mandating “human-in-the-loop” processes without rethinking compensation and accountability can create the illusion of control without the reality.

This article was partially generated by AI and edited (with additional writing) by Kyle Kearns. Read Knowledge at Wharton’s AI policy here.

Comments

New This Week

A digital coin representing USD Coin (USDC) with a dollar sign in the center, set against a yellow background.

How Stablecoins Could Get More Stability With the GENIUS Act

March 24, 202611 min read

Stablecoins in the U.S. are on a roll, but it is important to fix regulatory gaps and stay vigilant in times of stress, according to a Wharton finance panel.

Headshot of a person standing indoors, smiling, with arms crossed, wearing a blazer and striped shirt. Large windows are in the background.

Should Universities Do More to Help Women Entrepreneurs Get Funding?

March 24, 20265 min read

Universities are promoting female entrepreneurship, but their efforts aren’t increasing the venture capital flowing to women founders, according to a study by Wharton’s Tyler Wry.

A business meeting with a diverse group of people seated around a conference table, engaging in discussion. Laptops and documents are visible. The text "This Week in Business" is displayed at the bottom.
Podcast

The Business Impact of Leadership Under Pressure

March 20, 202616 min listen

Nancy Rothbard, deputy dean and professor of management at Wharton, analyzes how leaders manage disruption, delegate authority, and build organizational capacity.