As more companies look to tools like ChatGPT to supercharge creativity, a new study out of Wharton offers a word of caution: Generative AI may boost individual performance, but it can also limit how teams think.
New research co-authored by Wharton professors Gideon Nave and Christian Terwiesch finds that while ChatGPT improves the quality of individual ideas, it also leads groups to generate more similar ideas, reducing the variety that’s essential for breakthrough innovation.
The takeaway? AI might sharpen your pitch, but it could flatten your team’s thinking. As Terwiesch, co-director of Wharton’s Mack Institute, put it: “The ideas are great, but not as diverse as human-generated ideas. That points to a trade-off to be aware of: If you rely on ChatGPT as your only creative advisor, you’ll soon run out of ideas, because they’re too similar to each other.”
The study, led by Mack Institute research fellow Lennart Meincke, revisits and extends earlier experiments by researchers who found that participants using ChatGPT during creative tasks produced more original and useful ideas, outperforming both unaided individuals and those using search engines.
But Meincke and his co-authors took a broader view, focusing not just on the quality of individual ideas but on the diversity of ideas generated across participants. Their concern: Even if every suggestion scores well on its own, do too many people end up thinking alike?
AI vs. Human Creativity
The original experiments — conducted by academics Byung Cheol Lee and Jaeyeon (Jae) Chung — asked participants to complete creative tasks under different conditions, with or without the help of ChatGPT. Reanalyzing this data, Meincke and his colleagues found that participants using the chatbot were more likely to produce overlapping responses, often using strikingly similar language. Even when working independently, participants using ChatGPT were more likely to converge on the same answers.
In one experiment, people were asked to invent a toy using a fan and a brick. Among those using the AI, nearly all suggestions clustered around the same concept, with several participants even naming their toy “Build-a-Breeze Castle.” By contrast, the human-only group generated entirely unique ideas. In fact, just 6% of the AI-generated ideas were considered unique, compared with 100% in the human group.
“If you rely on ChatGPT as your only creative advisor, you’ll soon run out of ideas, because they’re too similar to each other.”— Christian Terwiesch
As Meincke explained: “When you give the model the same prompt, it tries to average the most likely completions based on that input. So, if you repeat the task across multiple sessions, it’s not surprising that you get fewer distinct ideas, because they all come from the same underlying distribution.”
Meincke and his colleagues measured diversity using a tool developed by Google, designed to assess how closely content was related in meaning. This helped identify subtle patterns of overlap that might not be obvious at first glance. In 37 out of 45 comparisons, ideas generated with ChatGPT were significantly less diverse than those from other methods — and this pattern held even when the researchers used different techniques to measure similarity.
One exception came from a single experiment, which didn’t show a clear drop in diversity. The researchers suggest this may be due to a “ceiling effect” — meaning the task was already so constrained that there wasn’t much room for variation in the first place. This underscores a key insight: Creativity is especially vulnerable when the prompt is overly narrow or repeated too often.
A Better Way to Brainstorm With AI
These findings have important implications for businesses. In functions like product development, marketing, and strategy, success often depends not just on generating strong ideas, but also on generating a wide range of them, so as to tackle problems from different angles. As the study’s authors wrote: “The true value of brainstorming stems from the diversity of ideas rather than multiple voices repeating similar thoughts.”
Yet that diversity doesn’t happen on its own. “Diversity is often overlooked, but it needs special protection,” said Terwiesch. “If you don’t solve for it explicitly, you won’t get it.”
One reason for the lack of diversity, the researchers note, is that participants often used similar prompts when interacting with ChatGPT — suggesting that some of the convergence in ideas may come from how users engage with the tool, not just how the model generates responses.
Even small changes in how questions are framed can lead to more varied results, they argue. “The cost of varying prompts is low, and given how important diversity is, it would be foolish not to do it,” said Terwiesch.
“It’s not surprising that you get fewer distinct ideas, because they all come from the same underlying distribution.”— Lennart Meincke
One technique the researchers highlight is “chain-of-thought prompting.” Rather than asking the chatbot for a single idea all at once, this method breaks the task into smaller, structured steps. It can increase the variety of responses and reduce repetition, said Terwiesch.
Meincke added that starting with human ideas may help teams move in different directions before introducing AI. He also suggested using multiple AI models to inject greater variety into the brainstorming process. “It would be equally foolish not to try five models,” said Meincke. “Throw them all into the mix and go crazy.”
The paper arrives as generative AI moves deeper into business workflows, not just for writing and coding, but also for creative tasks like ideation, product naming, and brand development. “People have been dreaming about AI being creative, but we have never been closer to having a system reaching a point where we can be at human creativity,” said Terwiesch. “That’s a big deal.”
Still, the researchers caution against mistaking fluency for originality. The best ideas, it seems, are still born from disagreement, divergence, and a bit of creative mess.