Nicole P. Marwell and Jennifer E. Mosley believe that there are many ways to measure the efficacy of social programs beyond randomized controlled trials. (From left: Photo courtesy Nicole P. Marwell; photo courtesy Jennifer E. Mosley; image courtesy Stanford Business Books)
Two scholars argue that the dominance of randomized controlled trials in social policy is harming nonprofits.
In the 1940s, medical researchers began using randomized controlled trials to assess the efficacy of health interventions—new treatments, preventive measures, and devices. Then as now, RCTs involved creating randomly assigned treatment and control groups, administering the potential remedy to only the first group, and comparing how the participants fared. This method, with its promise to tease apart cause and effect, had such seductive explanatory power that other fields began to take notice.
One of these was the social policy sector, Nicole P. Marwell and Jennifer E. Mosley write in their new book, Mismeasuring Impact: How Randomized Controlled Trials Threaten the Nonprofit Sector (Stanford Business Books, 2025). Advocates of conducting RCTs believed the trials could bring clarity to the messy work of helping people: By measuring the outcomes of participants and nonparticipants in social programs, nonprofits and their funders could ensure that public and philanthropic dollars were well spent. Does a job training program really result in more participants getting jobs? Identify a control group and a treatment group and find out.
But as Marwell and Mosley, both professors at the Crown Family School of Social Work, Policy, and Practice, write, this well-intentioned notion has grown so powerful that the organizations delivering social programs feel strong pressure to have their programs legitimized by RCTs. Gradually, but especially over the course of the 1990s and early 2000s, the RCT came to be seen not as one method of assessment among many but as “the only method that tells you whether or not the program works,” says Marwell—and an important way for organizations to attract funding. Mismeasuring Impact challenges this status quo, arguing that RCTs aren’t always the best tool for the job and calling for a more expansive approach to the evaluation of nonprofit social programs.
Questioning the role of RCTs in social policy took some courage. “There’s a lot of support for this methodology on this campus and in this city,” says Mosley. In fact, many of Marwell and Mosley’s colleagues have been leaders in the movement to use RCTs to assess social programs. “But in the Chicago way,” Mosley adds, “they are supportive of having lively debate on the topic.”
Marwell and Mosley started the research process for what became Mismeasuring Impact interested in the proliferation of RCTs in the nonprofit sector but without a particular point of view about their use. However, as the pair began speaking to nonprofit employees, funders, and even the evaluators who help organizations plan and administer RCTs, they were surprised to discover how many people had developed misgivings about how the procedure was being implemented in practice.
Many of the concerns were methodological. Evaluators were especially troubled by insufficient sample sizes: Social programs are often expected to have small-scale effects or to address rare issues, such as youth involvement in gun violence. To statistically detect whether these kinds of programs are working, the organizations running them might need to enroll many hundreds of participants in an RCT. “But most youth programs don’t serve that many youths at one time,” Marwell and Mosley write in Mismeasuring Impact. “There may not even be that many youths in the neighborhood.”
RCTs are also plagued by problems with so-called control group contamination: People assigned to a control group who didn’t get access to an organization’s program may seek help elsewhere. This means the trial isn’t comparing the treatment to nothing; it’s comparing the treatment to a similar program delivered elsewhere.
These implementation problems undercut a core claim made by RCT proponents: that RCTs provide the best evidence that a program either works or doesn’t. If an RCT finds that a program has no effect on outcomes, it could signal that the program is ineffective—but, Marwell says, it could also indicate that “you didn’t implement the RCT according to the very rigorous methodological standards that it requires.”
RCTs are also costly and time-consuming, placing a significant burden on the nonprofits conducting them, Marwell says: “Organizations do lots of things besides the program that’s being evaluated [and] the RCT evaluation that’s being conducted.”
Under normal circumstances, an organization that sees an emerging need or a gap in its current offerings can quickly adjust; nimbleness is a historic strength of the nonprofit sector. But, Marwell and Mosley explain, organizations conducting expensive multiyear RCTs are essentially frozen in amber—they can’t change their programs once the trial is underway.
Marwell and Mosley talked to many in the nonprofit sector who worried that the dominance of RCTs was pushing organizations to offer programs with easily measured outcomes. Yet many social needs don’t fit into the tidy cause-and-effect framework of the RCT. A homeless shelter, for example, provides vital services but may not on its own reduce homelessness. “If we continue down this road [where] RCTs are the only way to prove your legitimacy as a program,” Mosley says, “it does devalue those programs that are never going to be able to be part of an RCT. Is that really a world we want to live in?”
Of course “we do need to make sure our programs are effective,” Mosley says. Fortunately, she and Marwell point out in Mismeasuring Impact, there are many tools beyond RCTs to measure efficacy. Organizations can improve their data-gathering efforts to learn more about the people they serve and what happens to them over time. Surveying program participants more regularly can yield essential information about what helps and what doesn’t. “Plan-do-study-act” cycles—quick, small-scale experiments—can also help organizations improve their offerings in real time. “That’s also a lot closer to what you would see for-profit organizations doing,” Mosley says.
These flexible approaches allow organizations to “iterate and improve on a continuous basis,” Marwell adds. That’s good for all nonprofits, including ones that will never be able to run RCTs. And what’s good for nonprofits is good for the rest of society, she says: These organizations are “such a critical part of our social safety net and a critical part of the services that help people grow and thrive.”