Social media may seem like a jumbled sea of smiley faces, selfies and status updates, but when scientists cast their nets, they pull up a haul of data that brings empirical order to the chaos.

Researchers at the University of Pennsylvania and other institutions are finding that the well-being of a community can be determined through the collected posts of its individuals. And the information derived from the data has practical applications across a broad range of disciplines, from marketing to medicine to national security.

“As people are more and more migrating toward social media for their social lives, social media increasingly becomes the platform for researchers to understand social trends, to understand psychological trends and to understand public health threats,” said Johannes Eichstaedt, a doctoral student in the department of psychology at Penn and a founding research scientist of the World Well-Being Project, which is pioneering techniques for using language in social media to measure well-being.

Eichstaedt discussed his work recently on the Knowledge at Wharton radio show on SiriusXM channel 111. (Listen to the podcast at the top of this page.)

The Power of Words

Eichstaedt and his colleagues have found that words, both positive and negative, are strong indicators of personality. By using algorithms to sift through messages in the milieu of social media, scientists find patterns that begin to emerge.

“Social media increasingly becomes the platform for researchers to understand social trends, to understand psychological trends and to understand public health threats.”

First, the researchers needed to harness the data. Their sample set came from 100,000 Facebook users who gave permission to be participants in the study. Once they agreed, the users were “de-identified” and their posts collected through an app. Then, they were given a standard personality survey. The scientists also analyzed Twitter by volume, focusing on the content in one billion tweets, rather than individual users.

When it comes to language, one of the most common words used by extroverts was “party.” Introverts most frequently wrote “computer” or “Internet” in their posts. “The Facebook savvy introvert is somebody who spends a lot of time on the computer and then writes about it,” Eichstaedt said.

Emoticons —typographic symbols used to convey emotion —were also part of the study. “Emoticons are highly predictive of certain things,” Eichstaedt noted. “Frowny faces are generally, predictively speaking at a population level, associated with neuroticism. Neuroticism is a tendency toward negative emotionality, emotional reactivity, being grumpy, being sort of hard to be around. Those people spike in their use of frowny faces.”

Interestingly, one of the most predictive language features found in the data is the heart character, denoted as <3. The presence of the character almost always indicates the user is female. “To the point that if you only had one piece of information to identify whether male or female, you’ll want to know the number of heart characters [they used],” Eichstaedt said.

Social media language is also an indicator of emotional stability. Researchers found more emotionally stable people post often about sports, both as spectators and participants, and make references to religion. “Emotional stability is the opposite of neuroticism,” Eichstaedt said. “It’s living a calm, chill, well-adjusted life. It’s being connected to something larger than yourself.”

Social Media and GDP

Eichstaedt and his contemporaries are urging governments to utilize the data gleaned from social media as a cost-effective method for measuring the well-being of a society beyond a nation’s gross domestic product. It’s one of the founding intentions of the project.

“The GDP doesn’t care about the nature of transactions in society,” Eichstaedt said. “It’s one index among many. We’ve been working — and other people have been working — to convince governments to take on these other indices.”

In practical terms, Eichstaedt doesn’t see data-heavy agencies, such as the U.S. Bureau of Labor Statistics or the Centers for Disease Control and Prevention, switching to social media samples to drive policy. But information gathered from social interaction has been fueling national security concerns for years.

“This is small potatoes for the intelligence community,” Eichstaedt said. “The budgets that the intelligence community has thrown at population surveillance through these methods is to the order of a hundred, if not a thousand-fold, of what we will spend on this project.”

The Search for Acceptance

As with any empirical research, scientists with the World Well-Being Project worry about response bias. In social media, the bias presents differently than, say, a person taking a paper survey always checking the fourth box.

“Frowny faces are generally, predictively speaking at a population level, associated with neuroticism.”

The problem with social media, Eichstaedt said, is that people typically are searching for acceptance.

“You want yourself to be approved of by your peers, by people you’re connected with on Facebook,” he noted. “So as a result, you might selectively present the information about your life to your peers in such a way that they respond favorably.”

That means users may suppress their negative emotions or post benign status updates that mask the upheaval going on in their lives.

“The trickiness of managing a social identity is that it exists at the interface between these different worlds, and some people follow the safe route, which means they turn their Facebook into their LinkedIn, which means you post updates about your job and about your very predictable, generic accomplishments,” Eichstaedt said.

When that happens, Facebook loses some of its social function, he added.

Despite these drawbacks, the data mined from social media is proving invaluable for researchers. With more than half the U.S. population using Facebook at least once a month and traffic on Twitter measuring in the billions, Eichstaedt sees the burgeoning use of social media data as progress for social science. He is currently evaluating how the data can be used to detect depression.

“The penetration of social media over the past few years has been incredible,” Eichstaedt said. “Ten years ago, it was a novelty. Five years ago, it was sort of an annoyance for researchers. Now, it’s pretty clear that the next generation of big data sets that you can use for population health comes from these sources.”