Ask most people what “the gold standard” is and they’re likely to tell you it is a metaphor for “the best.” Indeed, when a New York Times opinion piece recently took the Obama administration to task for a certain environmental policy and the author called it “the bronze standard,” it was clear it was no compliment.
In the social sector, we talk about the gold standard as a specific kind of evaluation that has been designed to determine the impact of a program or set of activities. A gold standard impact evaluation uses randomization to determine cause and effect. It is based on the science of clinical trials and it is used to attribute change to a particular intervention. For instance, rather than just observing that students improved their reading skills, an impact evaluation allows you to know what caused that improvement: if the improvement was the result of a teacher, a new curriculum or perhaps a longer school day—or whether the change would have occurred anyway because of exogenous factors.
These evaluations are enormously important because they can provide evidence about what works to improve people’s lives; and, at least as important, what does not work. Many foundations are funding impact evaluations these days to inform national and international policy, funding decisions, and more effective practice. Seems like a sound approach, doesn’t it?
Unfortunately, it does not always work the way we plan.
Influencing policy?
First, relatively few policymakers rely on high-quality randomized studies to shape their perspectives and approaches. I recently spoke to a senior executive at a preeminent research organization dedicated to generating knowledge, largely through randomized studies, and communicating the findings to influence social policy. He explained that while evidence has been gaining ground in national policy debates, there are essentially four factors at play that inform and shape national policy: 1) ideology; 2) politics; 3) evidence; and 4) political horse trading. Evidence is only one of four factors — and it ranks third in the sequence.
When studies validate ideology and political orientations on both sides of the aisle, as they did on welfare reform – where the results highlighted the importance of work as well as the need for extra supplements to low wage work – policymakers used the evidence. But when it doesn’t validate ideology and reinforce strongly held beliefs, the evidence is often ignored, even when well communicated and well substantiated. Nationally, this is improving with new Office of Management and Budget standards that promote programs with rigorous studies of effectiveness, but still we see policymakers choosing the evidence that suits their beliefs rather than the other way around.
Influencing practice?
It can be even harder to make the evidence central to decision making in practice. A seasoned evaluator who has worked with many foundations and their grantees recently complained to me that even in fields where there is strong evidence of what the best practice is, only about ten percent of nonprofit practitioners follow those practices. Perhaps the proportion is higher than this evaluator estimates, but it is surely true that many nonprofits don’t operate in ways that are consistent with what the evidence shows. Why would this be? Why not use evidence to shape practice?
There are many possible explanations, but the two I think are most important are:
- A misalignment or a lack of incentives. It may be expensive to revise practice to align with the evidence – including investing in developing skills, hiring new personnel, or even changing the basic activities an organization performs. There are often no explicit incentives or funding to make the necessary adjustments.
- Not knowing the evidence. Many – especially small – nonprofits (and their funders) do not stay current with the literature in their fields and, as a result, may not be aware of the latest studies showing what works and what does not. There has not been an easy way to stay current on the literature and reading through reams of academic papers is not at the top of most practitioners’ or funders’ lists of priorites.
A further complication concerns an evaluator’s own belief system, which influences the interpretation of results for even highly rigorous experiments. When I asked an esteemed university-based evaluator who studies social service programs, “How often are your evaluation findings used?,” he said most of the time they were used for “tweaking.” In most of his impact evaluation studies, he further explained, there is “no effect,” meaning there is no evidence that the service being provided is helping people in any significant way. He exercises great care in the way he interprets and communicates such findings so that the evaluation results do not unduly harm what he views as essential social service programs for poor people. The primary value in this calculation is that any program serving the needs of poor people is better than no program, even if the program itself is not proven effective.
Influencing philanthropy?
In philanthropy, the evaluation picture is a bit mixed.
On the one hand, a 2011 Patrizi Associates study found that in recent years, foundations have placed less emphasis on evaluation – spending less money and reportedly having diminished influence on the foundation’s decision making. On the other hand, a recent CEP study found that 90 percent of foundation CEOs report that their foundations conduct formal evaluations of their work, often using third party evaluators to do so. Still, 65 percent of these same foundation CEOs report that it is challenging to have evaluations result in meaningful insights for their foundation. Thus, even those foundations historically oriented to evaluation find that effectively using evaluation results remains elusive.
While the challenge of using evaluation findings has perhaps led some foundations to eliminate dedicated evaluation positions, it appears to me that in the ebb and flow of foundation trends, more foundations are now creating evaluation functions as part of their operations; they are increasingly likely to link those functions to strategy and organizational learning. It is essential to seize this opportunity and plan for evaluations that are designed to purposefully support learning and decision making, as well as make better use of findings, in order to drive better actual results.
Many types of evaluations matter
The issue of “gold standard” evaluations in the social sector is controversial for other reasons. Many believe that the crowning of them as the preferred method has produced a kind of gold rush towards randomized controlled trials (RCTs), which are often narrowly focused, limited to variables that are more easily measured, and not always focused on what is most important. The fact is that there are many important forms of evaluation – performance evaluations, formative evaluations, developmental evaluations, cost effectiveness studies, case studies, and more. While well-designed RCTs can offer strong evidence as to what programmatic models work on the ground, and can inform government decisions about resource allocation, they need to be blended with other types of studies to answer the how and why questions about effectiveness. RCTs also are not applicable to evaluating things like advocacy or field building – areas of growing interest and importance in the sector.
Different forms of measurement are valuable to answer different questions at different times. If we are trying to maximize the use of evaluation results, it is essential to match evaluation methods to questions that are important to answer – that if answered, might make a difference to how one thinks about a given approach or set of activities.
It is time for a gold standard of data use.¹
More important than pursuing a single standard for measurement, is the need to inculcate the expectation that all nonprofits and funders will use data to inform how they work and the decisions they make. Sometimes those data will come from randomized studies; sometimes, they will come from performance evaluations and performance measurement to shape continual improvement and adaptation; sometimes they will come from case studies, a great vehicle for learning and improvement, especially in circumstances that require a good deal of professional judgment.
Part of increasing use of data and evaluation involves right-sizing evaluation efforts, focusing on what is important to know when, and anticipating how the results might be applied in practice. Paul Brest at the Hewlett Foundation has a saying, “Don’t kill what you cannot eat.” It is time to put our energy into consuming the information we take the time to gather, the reports we commission, and the studies we support. We should avoid asking for more data, more evaluation, and more analysis than we can actually make sense of in the time frames required for meaningful action. Rigorous methods are important, but the real gold standard is in use.
Hopeful signs ahead?
I invite readers to share what they believe are promising signs of more and better use of evaluation to inform decisions and adapt practice – in nonprofits, foundations, and even policy.
It is up to us…
Data don’t make decisions, people do. It is up to us to anticipate and meet our information needs, deploy ways to measure performance, commission evaluations with a clear purpose, and take time to use the data. Only then will we meet new vaunted expectations for a gold standard of data use.
¹In the interests of attribution, Jodi Nelson, my esteemed colleague from the Bill & Melinda Gates Foundation, was the first one to introduce me to this turn of phrase, “gold standard of use” and at the time, it was so resonant, I immediately said to her that I will use that phrase again.
Fay Twersky is a senior fellow at The William and Flora Hewlett Foundation and a member of CEP’s Advisory Board.