What is the likelihood that couples who had previously met through speed dating would want to pursue a second date? Age, gender, and race apart, this may depend on a range of factors such as the other person’s attractiveness, sincerity, intelligence, shared interests, fun, and ambition. While data scientists can build AI models for difficult prediction tasks such as a couple’s chances of a second date, will users trust AI recommendations or simply rely on their own assessments?
Experts at Wharton and elsewhere used that setting of predicting the outcome of speed dating to find out what drives — or does not drive — trust in AI. The study is motivated by research that shows that despite the high performance of AI systems, users often don’t readily trust them.
They detailed their findings in a paper titled “Will We Trust What We Don’t Understand? Impact of Model Interpretability and Outcome Feedback on Trust in AI.” The paper’s authors are Kartik Hosanagar, Wharton professor of operations, information and decisions; Daehwan Ahn, professor of financial planning, housing, and consumer economics at University of Georgia; Abdullah Almaatouq, information technology professor at MIT Sloan School of Management; and Monisha Gulabani, a former research assistant at Wharton AI for Business.
The factors that influence trust in AI are typically put in two baskets: One is made up of information about AI performance, like model accuracy and outcome feedback — how accurate are the results from an AI model, and what is the outcome of a prediction event relative to the AI model’s prediction, respectively. The second basket is of information about the AI model, such as interpretability (or the ability to explain how an AI model arrives at its predictions) and transparency.
“Users may be hesitant to trust a system whose decision-making process they do not understand,” the paper explained. As a result, recent research in computer science has focused on developing better explanations and visualizations of the factors that strongly influence AI’s predictions or interpretability. In terms of model-based factors, existing research suggests user trust can get a boost with information about the realized outcome of prediction events. But there has been no formal analysis of either hypothesis, it added.
Key Findings on Trust Generation
The study’s findings rejected the widely held belief that users will trust AI more if they have a better understanding of how a certain AI model arrived at its prediction — or interpretability. A bigger driver of trust was outcome feedback on whether the AI’s predictions were right or wrong, the study found. Study participants tended to build trust over time based on whether following AI helped or hurt their performance on recent predictions. Significantly, the paper is the first to compare those two sets of factors to understand their impact on human trust in AI and, consequently, on user performance.
A bigger driver of trust was outcome feedback on whether the AI’s predictions were right or wrong.
But there was another surprise in the findings: both interpretability and outcome feedback had only “modest effects” on user performance at the prediction task using AI. The upshot of that: “Augmenting human performance via AI systems may not be a simple matter of increasing trust in AI, as increased trust is not always associated with equally sizable performance improvements,” the paper noted.
The study arrived at its findings from two web-based experiments — one with 800 participants and the second with 711 participants. The participants were asked to predict the outcomes of speed-dating events, first without and then with AI predictions. Each participant received a payment as an incentive for participating in and performing well in the experiments.
The authors used “weight of advice” (WoA) as a measure of behavioral trust to quantify the extent to which users adjust their initial decisions based on AI advice. They found evidence of irrational behavior, from distrusting AI if it failed even once (i.e., expecting perfection from AI, which is unrealistic) to “overtrusting” AI when it was not rational to do so; for example, if a user predicts an event has 25% probability but AI predicts a higher probability of 50% and the user then updates their prediction to 75%.
Why Trust Is So Important in AI
Trust has been the biggest hurdle to AI adoption, and so understanding how that works is a top-of-mind issue for developers and users of AI solutions. While AI interpretability did not have a significant impact on trust, the paper suggests that interpretability may have other uses beyond driving user trust, including helping developers debug models or meeting legal requirements around explainability. The findings could encourage further research on improvements in interpretations of AI uses and also find new user interfaces for explaining how AI models work to lay users so they can better impact trust and performance in practice, the paper stated.
An analysis of 136 studies that compared algorithmic and human predictions of health-related phenomena found that algorithms outperformed human clinicians in 64 studies (about 47% of the time) and demonstrated roughly equal performance in 64 more studies. Human clinicians outperformed algorithms in only eight studies — about 6% of the time. Despite those compelling findings, algorithms were not widely used in making health-related decisions. Similarly, other studies show that AI has not been widely adopted in medical settings, clinical psychology, and firms; by professional forecasters across various industries; or in a variety of tasks typically performed by humans, the paper noted.