What have you shared on social media today? Did you comment on last night’s election results; mention that you’re going to the gym later; sympathize with a friend who’s been in the hospital; describe your meal at a favorite burger joint, or display pictures of your daughter’s jazz dance recital?

And what do those posts reveal about your health and your risk for serious medical conditions?

That last question may seem odd, but not to the researchers at the Penn Social Media & Health Innovation Lab at the University of Pennsylvania. Director Raina Merchant and her team are investigating how people’s social media language on sites such as Facebook, Twitter and Yelp can be used to assess and their health and predict diseases. The conditions they are looking at are some of the main culprits for premature death and disability (not to mention skyrocketing health care costs) in America, including heart disease, diabetes, hypertension, obesity, chronic lung problems, depression and drug abuse.

Part of the larger Penn Medicine Center for Health Care Innovation, the lab also has a partnership with the Leonard Davis Institute of Health Economics (LDI), which studies ways to improve America’s health care system. Merchant is a senior fellow at LDI as well as an assistant professor of emergency medicine at Penn.

Merchant explains that there are differences in people’s language structure, or the kinds of words they use, that might indicate a disorder or cognitive decline. “Someone might post directly about having a condition, or some [conditions] may be more revealed when people talk about it,” says Merchant. “If someone has a lot of posts that may suggest that they’re depressed, they may not be as overt as ‘feeling sad,’ or ‘blue,’ or ‘unhappy,’ but there may be other words … that suggest depression, that aren’t as obvious.”

While much of the lab’s research is at a relatively early stage, there have been some intriguing initial findings. The team published a study involving Facebook in October 2015 in BMJ (formerly British Medical Journal) in which more than 1,000 patients in the University of Pennsylvania Health System agreed to have their social media data compared with their electronic health record.

One finding: Individuals who were clinically obese according to their medical records were significantly more likely to use words related to being stationary: “sitting, being still, planted, at rest; these sorts of things,” says Merchant. The results were not what the team had predicted; they had thought this group might make frequent references to food or exercise.

About 71% … consented to share their social media activity and have it compared with their electronic medical records.

David Asch, who directs Penn’s Center for Health Care Innovation, mentions an even more unexpected association that was revealed by another of the team’s ongoing studies: Patients with high blood pressure post more frequently about their children than do people without the condition.

“Dealing with your kids doesn’t cause high blood pressure, although people think it does, colloquially,” noted Asch, who is also a professor of health care management and of operations, information and decisions at Wharton.  “We find associations that are on the surface hard to explain, [and which] we wouldn’t have thought of in advance.”

The Privacy Question

Would most Americans agree to this type of surveillance, if they were told it was for the purpose of improving their health? Data mining is not new, of course — marketers have been using it for years to stealthily capture our online behavior and tempt us with ads. Some of this research may even call to mind the 2014 controversy involving Facebook and “emotional contagion.”  The company reportedly manipulated nearly 700,000 of people’s news feeds without their knowledge, to test if it could influence whether individuals posted more positive or negative content. (Facebook asserted that consent was given via its stated Data Use Policy.)

By contrast, in Merchant’s research the idea is to obtain explicit consent and to funnel “actionable” data to patients. “Our hope is, can we collect this information and give it back to patients so that they could really learn from these assumptions we’re making? And how do we also make this available for health care providers, if patients wanted to share with them?”

In the lab’s Facebook study, a large percentage of individuals were in fact willing to participate. The study showed that of 1,432 patients in the University of Pennsylvania Health System who were Facebook and Twitter users and expressed interest in the study, the majority — about 71% — consented to share their social media activity and have it compared with their electronic medical records.

“That was a big finding,” says Merchant. “We don’t know of anyone really having done that before — being able to demonstrate [that people would give consent] and to engage in a very transparent way for data collection.”

Asch says that in his experience with the lab’s experiments so far, people seem to feel comforted by the idea that their health might be “watched over” by their local hospital or health system. “My intuition was that people would think of this as Big Brother,” he said, but he found that the opposite appears to be true. Plus, “a main finding is that although people do care about privacy, they also recognize the value of sharing, to themselves or to society.”

“Even something that is said in jest [on social media] may be more likely to be used by people with certain conditions than others.” –Raina Merchant

With 3,000 patients in the database currently, the team plans to collect data over the next decade and according to Merchant, “build this map, this database of digital footprints that people are sharing as information.”

Separating the Signal from the Noise

Is it really possible to get useful health data from social media posts? People say a lot of spur-of-the-moment things online. How does a computer program cope with human beings’ colloquial language, metaphors, sarcasm, and humor? What if the lab’s computer program interprets “BTW, I could have died!” as “I’m depressed and thinking about killing myself?”

“I think [those questions] get at the crux of this,” agrees Merchant. But even joking comments may be relevant. “Even something that is said in jest may be more likely to be used by people with certain conditions than others.”

The team’s task, she says, is to try to separate the signal from the noise. This effort is spearheaded by the lab’s computer scientists, including Lyle Ungar and Andy Schwartz. Ungar, whose expertise is also in biomolecular engineering and operations, runs the group that performs natural language processing: using computers to automatically “read” people’s social media. Schwartz is based at Stony Brook University and works remotely with the Penn Social Media and Health Innovation Lab.

“Social media is an unstructured data source. It doesn’t come with these variables that you can just cleanly plug into your statistical software,” Schwartz points out. “So you have to, first of all, run algorithms that turn the social media — these strings of characters — into some sort of meaningful piece of statistical information.” He also applies the latest machine learning techniques from computer and information science. But even so, the process is challenging.

Tracking Public Health

In addition to looking at individuals, the team also conducts studies involving broad public health trends. Other groups have taken this route as well: A widely reported example is Google’s effort in the late 2000s to analyze search queries to predict flu outbreaks earlier than the CDC. According to Smithsonian Magazine, the project was not very successful. It consistently greatly overestimated flu rates. But some believe that although the project’s execution was flawed, its essential concept holds promise.

Merchant says the team is involved in studies using Twitter to look at heart disease. One focus is “to learn something about how people think about heart disease,” Asch says. “How do people understand terms like heart attack, hypertension, and diabetes?” If there are mistaken views out there, perhaps Twitter could be used to push out health-promoting messages. “It’s just so costless to do it, so if it works, what a great thing.”

“I would submit that our social media probably tells us far more about our health than our DNA.” –David Asch

Research by Penn’s social media lab may also help hospitals obtain useful feedback about their services. Schwartz talks about the team’s Yelp study, published in April in Health Affairs, which analyzes people’s reviews of their hospital stays. U.S. hospital visits are typically assessed with a standard patient satisfaction survey called HCAHPS (Hospital Consumer Assessment of Healthcare Providers and Systems). But according to Schwartz, the Yelp study shows that HCAHPS fails to ask about some issues that are very important to patients, such as parking, and dealing with the billing staff.

“Billing, for example, correlates with how well patients rate the hospital. So not only do they talk about it a lot, but actually we find that if they mention billing in their review, they’re more likely to give a negative review.” Schwartz notes that these kinds of findings could be used by hospitals to improve their services and their national rankings.

From the Genome to the “Social Mediome”

The team has coined the term “social mediome” to describe the area they are studying. “It’s kind of a play on words,” Merchant explains. As the genome is reflective of a person’s genes, the social mediome is reflective of his or her online behavior.

But can attempts to parse our random chatter on social media really compare with the “hard science” of DNA research?  Revolutionary breakthroughs have been made over the past few years in sequencing the human genome, leading to new treatments for cancer and other diseases. And yet, Asch points out, “Human behavior, [according to] estimates, is responsible for 40% of early mortality.” What we do, or fail to do, in our day-to-day lives matters. “I would submit that actually our social media probably tells us far more about our health than our DNA,” he says.

Ungar agrees. “What can we do to be healthier and live longer? Don’t smoke. Exercise. Wear a seatbelt and don’t drive drunk. Don’t be depressed. People who are happy and in good relationships live some five years longer than those who are not.” What all these behaviors have in common, says Ungar, is they are fundamentally psychological, not genetic.

If self-destructive behaviors can be identified earlier, continues Ungar, this can cut down on the cost of health care. “Most of American health care money is spent too late in the process,” he observes. “Giving someone a stent is expensive; using social media to help people exercise so they don’t get cardiovascular disease is much cheaper.” He gives the similar example of drug addiction: Identifying those at risk early costs less than trying to rehabilitate them after years of substance abuse.

Asch remarks on the tremendous research opportunity offered by social media. Before its advent, “so much of our behavior was ‘unwitnessable’,” he says. “Private communications were important, but we couldn’t observe them.” Now, “we’re in a position to learn a lot more about the associations of various forms of behavior with health. And that’s very exciting.”