Data analytics and artificial intelligence are transforming our lives. Be it in health care, in banking and financial services, or in times of humanitarian crises — data determine the way decisions are made. But often, the way data is collected and measured can result in biased and incomplete information, and this can significantly impact outcomes.   

In a conversation with Knowledge at Wharton at the SWIFT Institute Conference on the Impact of Artificial Intelligence and Machine Learning in the Financial Services Industry, Alexandra Olteanu, a post-doctoral researcher at Microsoft Research, U.S. and Canada, discussed the ethical and people considerations in data collection and artificial intelligence and how we can work towards removing the biases. This interview is part of an editorial collaboration between Knowledge at Wharton and the SWIFT Institute.

An edited transcript of the conversation follows.

Knowledge at Wharton: Could you share with us how you got interested in artificial intelligence and what you are doing at present?

Alexandra Olteanu: In 2013, when I was working on how we could leverage social media during humanitarian crises, we realized that there are a lot of problems around how we collect data, what is in that data, is that data even helpful for let’s say the U.N. and other actors in that space. The first problem that we became aware of is that people tend to collect data using hashtags. For instance, when Hurricane Sandy arrived in 2012 in the United States, there were a lot of hashtags like #Sandy. Some people posted about the power going off, while others posted pictures about the event without using those hashtags. So, a lot of people are left out of data sets and therefore, a lot of information that may be relevant for the U.N. Office for Humanitarian Affairs may not be visible. This is how I personally started to pay more attention to who is represented in these data sets, who ends up receiving more help, and who ends of being overlooked.

Knowledge at Wharton: What did you find in your studies?

Olteanu: In our initial study, we found that a lot of the data that is collected by humanitarian agencies and researchers tends to be biased towards those that know how to use social platforms and social media. But any first responder is probably already aware about what CNN or The New York Times covers, so that’s not interesting to them. But reports from eyewitness accounts were not as visible. We needed to come up with a better way to see how we can capture their data in those circumstances.

Knowledge at Wharton: Did you come up with a better way?

Olteanu: Yes. A better way is to build a lexicon or a language model that is more generalizable from crisis to crisis. For example, donating blood is not necessarily specific to a certain type of crisis. Or, that we need volunteers, or that someone is injured, or that a house is damaged.

Knowledge at Wharton: Bias is a big issue when you’re dealing with humanitarian crises, because it can influence who gets help and who doesn’t. When you translate that into the business world, especially in financial services, what implications do you see for algorithmic bias? What might be some of the consequences?

Olteanu: A good example is from a new law in the New York state according to which insurance companies can now use social media to decide the level for your premiums. But, they could in fact end up using incomplete information. For instance, you might be buying your vegetables from the supermarket or a farmer’s market, but these retailers might not be tracking you on social media. So nobody knows that you are eating vegetables. On the other hand, a bakery that you visit might post something when you buy from there. Based on this, the insurance companies may conclude that you only eat cookies all the time. This shows how even incomplete data can affect you.

Knowledge at Wharton: You have mentioned that fairness is an important part of what you do. Could you explain your approach to some of these issues regarding the use of artificial intelligence?

“We have hundreds of metrics that could be used, but we don’t always have a good understanding of when to use which metric.”

Olteanu: My work is focused on data biases and how that can affect the outcome of a system. Let me give you a high-level overview. There are two [main] kinds of research approaches and both focus on outcome fairness. One is called “individual fairness,” where the idea is to ensure that similar individuals are treated similarly. The challenge here is how do we identify that two individuals are similar. What types of attributes should one include and what type of mathematical function should one use? The second approach is focused on what is known as “group fairness.” Here, the idea is to ensure that the error rates for different groups of people are similar. In this approach, the problem is how do you define an error, how do you compute [or] aggregate errors across groups, and so on. There are a lot of challenges. Even when you want to do the right thing, oftentimes it may not be clear how to do it.

The other distinction is that this type of approach is focused on outcome fairness, but there is also the aspect of process fairness, i.e. when we take a decision, is the process that we have in place to make that decision fair for everyone involved? So, these are the different types of work, and this is mostly work-in-progress. Because of these distinctions, we have hundreds of metrics that could be used, but we don’t always have a good understanding of when to use which metric.

Knowledge at Wharton: So if we use historical data about the people who have received different kinds of services and that becomes the basis for decision-making because of the way in which the algorithms are written, there may be an inherent bias in the system that discriminates against certain groups — especially minorities or, immigrants. How can this problem be corrected?

Olteanu: This is a huge problem. Many of these systems know only what you show them. If you understand that there were some problems in the past, and you want to improve the future, then you need to go back to the data and understand who is represented in that data set, but even more importantly, how they are represented. Maybe you have everyone in, but the way in which you gathered data about them — the way in which you decided what signals were important when you made a decision — are equally important.

Knowledge at Wharton: Is there a way to use transparency to make more ethical decisions about these matters?

Olteanu: I think it’s important for AI systems to be able to justify and explain their decisions. This is increasingly difficult — at least on the state-of-the-art machine learning techniques — in particular in what are known as “neural networks.” They tend to be like black boxes, even for the researchers who develop them. We need better ways to debug them. But this is just one side. The other side, which I think people don’t often recognize, is that these systems are integrated in extremely complex scenarios. They are large scale. They depend on other systems. They interact in ways that we are not always aware of. So even if you don’t have this kind of black box system, it is sometimes extremely difficult to track back from where an issue originates. This is a hot area of research and extremely important.

“It’s important for AI systems to be able to justify and explain their decisions.”

Knowledge at Wharton: Is there anything that the field of financial services can learn from this area of research?

Olteanu: The first thing is to understand that each data point is a person. The medical industry has a lot of strong principles around human research and working with human data. This should be adopted by the financial industry. Also, the financial industry needs to understand that AI won’t offer magical solutions. We tend to evaluate AI and the human separately, and we compare them with each other. Sometimes we conclude that AI is better, but what we should do is to compare what happens when we combine the AI with the human. Research studies show how doctors tend to lose their skills because of the use of AI. They tend to pay less attention because they trust the recommendations that AI makes, so we may actually see a decrease in performance, not an increase. So the way in which you assess how well this combination of AI plus human works is extremely important. I think in the medical domain we have failed to do that.

Knowledge at Wharton: Is there a way to correct that failure, or to go beyond it?

Olteanu: We need to constantly evaluate [the systems] and understand that people need to learn how to interact with a new system and train them accordingly. This has been the case in the past, too, but because of the feeling that AI is a bit magical, people trust it more than they should. It’s incredibly important for institutions to think about processes around what happens if we make a mistake. Do our clients or users have a way to effectively let us know when a mistake is made? Do they have a way to check if a mistake has been made? What are our processes to fix those mistakes?

Knowledge at Wharton: Somebody who was working on AI systems in the medical industry told me that doctors often are drilled so hard in their training that in order to qualify for their medical expertise, they almost have to lose their humanity in some sense. But if there are AI-driven systems that can help with the diagnostics, then maybe doctors can become more empathetic. They will have time to cultivate their empathy and compassion and develop other human qualities that are not there in an artificial system. Is this thinking valid? And does it have parallels in the financial industry?

Olteanu: Yes, the nature of work will change. I personally don’t think that AI will replace knowledge-driven jobs but it might transform them. The focus of the employees may change. I do agree that doctors may have more time for thinking about how the patients feel and empathize more with them. There might be other aspects also. For instance, pathologists don’t interact directly with patients so the point about empathy is not that relevant for them, but thanks to AI they may be able to read more slides and also do it more effectively.

Knowledge at Wharton: Do you have any final comments that you think are important for people to know?

“Often, we don’t know the relationship between what we measure — like clicks or time spent on the webpage — and what we actually care about, i.e. relevance.”

Olteanu: There are two things that are extremely important, but often overlooked. One, when you start implementing the systems, you should pay more attention to the data-generation process. How did we end up with that data set? What types of decisions were made along the way about what to collect, from whom to collect, how to store, how to represent the data, and so on. The other aspect that is given even less thought is the evaluation of the systems and metrics. Many of the metrics that we care about are what are called “unobservable concepts.” You cannot really measure them. For instance, in search engines like Google, you will want to measure something that is called “relevance.” But you can define and measure relevance in many ways. Did the user click on that webpage? How much time did they spend on the page that they clicked on? But there are also instances in which these activities don’t reflect relevance. So, often we don’t know what is the relationship between what we measure — like clicks or time spent on the webpage — and what we actually care about, i.e. relevance.

This also happens in many industries where you care about customer satisfaction, which is again hard to define and even harder to measure. The gaps in how we measure can affect certain groups. These are the key issues that I would like people to think more about — how they ended up with a certain data set and how they ended up measuring certain outcomes, and the way in which they are measuring them.