Wharton’s Katherine Klein interviews Maoz (Michael) Brown, head of research for the Wharton Social Impact Initiative, about a study revealing some of the problems with measurement in impact investing.

Most impact investors report meeting or exceeding their impact performance metrics, but is this truly the case? Maoz (Michael) Brown, head of research at the Wharton Social Impact Initiative, collaborated with Wharton PhD candidate Lauren Kaufmann on a study that reveals it is common practice for impact investors to report metrics primarily to market their success, rather than to evaluate and understand where their impact may fall short. Brown attributes this practice to the challenges of measuring impact rigorously while simultaneously running an investment fund.

Brown recently joined Katherine Klein, vice dean for the Wharton Social Impact Initiative, for an episode of the Dollars and Change podcast to talk about the research findings. He offers recommendations for measuring impact and explains why the field needs to get more comfortable with impact underperformance in order to learn and grow.

Following is an edited transcript of the conversation.

Katherine Klein: Michael, I’m going to start by reading a few definitions of impact investing, because one of the things that is really striking is how much measuring impact is part of the very definition that people in the field use to describe it.

The GIIN is the Global Impact Investing Network. It defines impact investments as “investments made with the intention to generate positive, measurable social and environmental impact alongside a financial return.”

The International Finance Corporation has a very similar definition. They define impact investing as “an approach that aims to contribute to the achievement of measured, positive social and environmental impacts.”

And we’ll give one more definition. The Impact Investing Hub says, “Intentionality and measurable impact are the fundamental concepts that differentiate impact investing from traditional forms of investment.”

As researchers, we spend a lot of time thinking about how you measure things, so this commitment to measurement in impact investing is really interesting. What made you want to dig into this topic and learn more?

Michael Brown: There are two reasons that I wanted to get into this with my colleague, Lauren Kaufmann, who’s a PhD candidate at Wharton. It’s really two puzzles. One is a higher-level puzzle, and one is a more specific puzzle. The higher-level puzzle is what you described — that impact measurement is such a critical component of impact investing as it’s defined by all of these advocacy organizations and trade associations. But impact measurement done rigorously is quite difficult. So, how do impact investors go about threading that needle — measuring and documenting their impact and doing so in a way that’s compelling but not draining off all of their resources on impact evaluation? That was the first big puzzle.

The more specific puzzle is this finding that has surfaced in all of the annual impact investor surveys that the GIIN has done, which is that impact investors consistently report that their portfolios have performed either in line with their expectations or even above their expectations. Underperformance is extremely rare, on the order of 1% to 3% of the respondents to these surveys. That was interesting because our initial thought was that meaningful impact should be pretty hard to achieve, so how is it that underperformance is so uncommon? Those two reasons are really the impetus for getting into this research.

“How do impact investors go about … measuring and documenting their impact and doing so in a way that’s compelling but not draining off all of their resources on impact evaluation?”

Klein: I want to underscore what you described with the GIIN survey. This annual survey of hundreds of impact investors asks respondents, “How well are your portfolios doing with regard to impact? Are they meeting your expectations for impact, exceeding expectations for impact or underperforming?”

About 20% were saying, “We’re doing better than we expected.” About 80% say, “We’re in line.” And that leaves nearly zero — 1% or 2% or 3% — saying, “We’re not quite meeting expectations,” which is a remarkable statistic. It’s fair to say you were skeptical. I look at these numbers and I’m skeptical. You and I know impact is difficult to measure. It’s difficult to achieve. What did you do to get below the surface and dig into what’s actually going on with impact measurement?

Brown: The reaction that I initially had — and I think that most Ph.D.-trained researchers would have — is, “Well, that’s just BS. These respondents are trying to paint a rosy picture.” Even though that’s an understandable initial reaction, it’s not necessarily the most constructive reaction.

The first step was we decided to hear out impact investors. We wanted to assume that they’re being forthright and truthful and give them the benefit of the doubt, even while maintaining some degree of skepticism. Surveys are really powerful research tools, but Lauren and I decided that to really get under the hood of this topic, we needed to talk to these professionals. We went about interviewing staff at 135 impact investing organizations — mostly fund managers, but also their own investors, so limited partners and others who invest into these funds. We also interviewed about a dozen or so consultancies that are part of the impact investment process and supply chain.

Klein: I love what you did to start to unpack what was going on. A key question that you asked these people is to describe two deals that they had done: one that they regarded as an impact success story, and the second where there was some disappointment. You saw some striking things when you started to reflect on the kinds of stories people were telling you. Tell us more about what you learned.

Brown: I’ll go into our findings in just a moment, but I want to give a bit of the rationale for using this approach. There is a temptation or a tendency in impact measurement to escape into abstraction, to say, “Oh, I use this framework,” or “I collect these metrics,” or “I use a logic model,” or something like that, which is useful to some extent, but it doesn’t really give you the full color of what’s going on. That’s why we asked for specific examples.

After the interviews were concluded, Lauren and I listened to those stories again and classified the kinds of evidence that our interviewees used to explain why a given deal was deemed a success or deemed a case of underperformance from an impact perspective. We wanted to pay attention to whether this evidence was quantitative or qualitative. Was it focused on the business or more focused on the impact? What we found was a pretty striking inconsistency, especially in the use of impact metrics. Impact-focused, quantitative data were cited far more often in cases of impact success than they were in cases of impact underperformance. Specifically, 82% of the examples of impact success involved citing quantitative impact data, while only 24% of the underperformance examples involved quantitative impact-focused data.

“We really wanted to understand the pressures and the constraints and the challenges that these professionals are working under.”

Klein: That’s really curious. It sounds like folks are being much more specific with regard to quantitative indicators when they are telling you about a success story. I imagine that they probably had quantitative indicators for their low-performance stories as well, but they weren’t sharing them with you. I’m not sure that my interpretation is correct, but what do you make of this disparity in the kinds of evidence that people bring to bear?

Brown: Specifically in this case, it’s the impact numbers. The initial interpretation was that this is just more top-of-mind to our interviewees when it comes to the success stories. But we really wanted to see if impact metrics factor into cases of impact underperformance, so Lauren and I made a point of following up to ask, “What kind of data are used?”

To be more concrete, if they described impact underperformance in terms of business underperformance — just a lack of scale or lack of market penetration or poor financial performance — we actually followed up to ask about any more impact-focused data. Still, we tended to not meet with that kind of answer, which led us to conclude that, in many cases, at least based on our data, it seems impact metrics are not really used to measure performance or to evaluate performance. They are used to convey accomplishments and to describe what portfolio companies are doing. The big assumption here is that a meaningful system of performance evaluation would be more sensitive to cases of underperformance because performance evaluation is not just about showing business success.

Klein: There’s a GIIN survey that asks impact investors, “Do you set impact targets, and is this an important part of your measurement and evaluation?” Something like 78% of these folks say yes. Have you found in talking to people that that didn’t really seem to be the case, that these impact targets, if they had them, were not very rigorous?

Brown: That’s right. To clarify, we were interested in targets because they’re relevant to this question of impact underperformance. To set a target is explicitly or implicitly to set a threshold for success, and therefore to set a guideline for potential underperformance. For the same reason that we asked about examples of success and underperformance, we also asked about target-setting. We found exactly what you described, that target-setting appears to be pretty rare on the impact side, at least. And when it is used, it seems to be used quite loosely.

I’ll draw up some of the data points that we came across. This comes not just from the interviews but also from an analysis of about 100 impact reports that we reviewed, both public and private performance reports. We found that only about 14% of these public impact reports even cited a quantitative impact target, and all of them conveyed success. This is important because this was not a random sample of impact funds. We reached out to funds that were more likely to be more sophisticated on impact measurement. If 65% of impact investors are using quantitative impact targets, as the latest GIIN report suggests, then that should be evident in the reports produced by this sample of funds that we’ve collected.

When we spoke about this topic with our interviewees, we got statements like, “Yeah, it’s more of just a general gut check. It’s more of an impression. We’re just making sure that the portfolio companies are doing what they said they would. We don’t really monitor these targets carefully. It’s just kind of to set an expectation.” It doesn’t seem to be analogous, in short, to a commercial performance benchmark, like an IRR or a multiple.

“We heard very consistently from our interviewees that they face this expectation to do impact measurement, but they just don’t have the resourcing and the funding to do it.”

Klein: What about investors? I want to come back to this idea that the field is saying impact measurement is an important part of practice. I think fund managers probably feel a lot of social pressure. There’s a norm here that we need to say that we measure impact seriously. I’m wondering if investors hold them to this. Are they saying not only, “Show me the money,” but also, “Show me the impact”?

Brown: Based on what we heard, the investors into funds do want to see that impact is being measured. But in terms of the technical specifics and the rigor of that impact measurement, our interviewees told us that the expectations are pretty low. We were told that investors don’t really ask questions about the impact reports. They seem to be pretty content with what’s reported out, and they’re just happy to know about the kinds of companies that are being capitalized, rather than all of the quantitative details of the specific social and environmental impact that these companies are producing. [For example,] an investor wants to know that solar companies are being invested into, but they don’t necessarily have lots of questions about the specific tonnage of greenhouse gas emissions being reduced and whether that’s aligned with standards and expectations or falling short.

There are some exceptions here. We were told that certain kinds of investors tend to be more attentive to the impact measurement specifics. Specifically, development finance institutions and philanthropic foundations tend to be more “invested,” so to speak, in impact measurement, presumably because these institutions tend to be more likely to accept below-market-rate returns and are therefore more likely to differentiate impact performance from financial performance.

Klein: This is a really interesting set of findings and not much of what you’re describing surprises me. When I look at the rhetoric around impact measurement, I think about how difficult it is to achieve some of these standards, to quantify how much impact you’re having. What do you think is going on here? How do we make sense of the fact that the reality of impact measurement doesn’t measure up?

Brown: I’m glad that we’re getting into this part of the conversation because, at this point, listeners may be wondering if this is just one long indictment of impact measurement and impact investing. That’s really not the takeaway that Lauren and I had. We really wanted to understand the pressures and the constraints and the challenges that these professionals are working under. In that spirit, the way we interpret these findings is not that impact investors are cutting corners or that they’re lazy or that they’re just trying to suppress unpalatable information. Because as you noted, it’s not that surprising. This is really hard to do, even for scientists with all the credentials and budget and staff and time to do impact evaluation. It’s understandable that impact investors need to make certain compromises and think about impact measurement in a way that maybe doesn’t jibe with how we think about rigorous impact evaluation.

I think what’s happening here is that impact investing is a relatively new field, and field-builders have understandably used really ambitious language to grow the field and to advocate for the work that they’re doing. Investors hear that, but they also have businesses to run and deal pipelines to manage. In many cases, they’re screening potential investments carefully and thoughtfully and really trying to be clear on how they interpret impact. But they’re not trying to do randomized control trials, and they’re not trying to approach impact assessment in a way that would resonate with a more academic approach.

A lot of this seems to be more pre-deal when it comes to the impact side. Once companies or potential investees are deemed impactful, then a lot of the performance evaluation is actually business performance evaluation. A lot of it is based on financial performance, and impact metrics remain relevant as a way of describing what these companies are doing for stakeholders.

“I think that the field needs to get more comfortable with underperformance and how underperformance is relevant to impact measurement and management.”

Klein: What would you encourage impact investors to do, given the challenges of rigorous impact measurement?

Brown: I think that more resources need to be allocated to impact measurement. I don’t know if that necessarily means having a team of PhD’s in every impact investment fund. I’m not sure it’s realistic. But we heard very consistently from our interviewees that they face this expectation to do impact measurement, but they just don’t have the resourcing and the funding to do it.

When we did hear that they had resourcing and funding, it tended to be from those DFIs and foundations that are giving grants for building out impact evaluation teams. But that seems to be pretty uncommon, and I think that there needs to be more of that for impact measurement to mature and reach its potential.

People need to put their money where their mouths are, for sure. There are some really interesting tools that are coming online for doing more robust, post-deal impact measurement and monitoring, and 60 Decibels is one example that I’ve cited before as a really exciting development in the field. I would be thrilled to see more adoption of those kinds of tools. But I do think there is a limit to this. Even with the power of a tool like 60 Decibels, it remains really, really difficult to do impact measurement and evaluation in a way that’s truly rigorous.

While I do think there needs to be more emphasis on post-capital deployment impact measurement, I still think a lot of the most substantive impact assessment will happen before that capital is deployed — in the screening process, deal selection process, and due diligence, having those initial site visits, talking to company management and leadership, and really making sure that the case for designating a particular company as impactful is a solid case and a compelling case.

Klein: You’ve also talked about learning from failure and that perhaps the rhetoric around impact measurement and the push for documentation have gotten in the way of a real learning orientation in this space. I’d like for you to say more about that.

Brown: I think that the field needs to get more comfortable with underperformance and how underperformance is relevant to impact measurement and management. If impact measurement is going to be relevant to more than just marketing, if it’s going to be relevant for impact management, then underperformance has to be an important part of that equation. Because the acid test of impact management is underperformance.

How do you detect it? How do you manage it? How do you improve? I think if we get to a point where impact reports more consistently and more frequently acknowledge cases of impact underperformance and explain how the fund is responding to that, that will be a major win for the field. I think that means that the stakeholders, specifically the investors into funds — limited partners, for example — need to make it very clear to fund managers that as long as there is sufficient explanation for the underperformance and sufficient reason to believe that it’s being managed, this information will not be penalized. I think fund managers need that psychological safety to disclose this information and to bake it more thoroughly into their investment management process.