Nonprofit Sector

Evaluation Design and the Research Question at Hand: What are we trying to learn here, anyway?

January 3, 2018

Designing an evaluation pushes many providers outside of their comfort zone. Available data may leave you with more questions than answers about critical questions like referral and enrollment volume. Tradeoffs between evaluation precision and operational viability may be unclear. On top of this, many providers lack in-house evaluation expertise. While this challenge is not unique to Pay for Success (PFS), it’s particularly acute in these projects where evaluation design is just one of a series of complex issues to negotiate among a large stakeholder group with competing priorities. 

As I’ve engaged in these discussions, the distinction between a program evaluation and a systems evaluation has been an important one. This post will attempt to distinguish the two, explore each within the context of PFS, and offer some guidance for providers. Many service providers find PFS appealing because of the opportunity to do a rigorous evaluation and have the resources to scale their work. My hope is that this post will help providers ensure that they are negotiating the evaluations they want, whether in PFS or other settings.

How do your operations and research question influence one another?

At the heart of all PFS initiatives is a need to get participants to and through a service intervention. How projects develop recruitment pipelines, and when those individuals are assigned to either a program or control group, represent fundamental project design decisions. In a Randomized Control Trial (RCT), selecting the point of randomization requires an understanding of the entire ecosystem in which the intervention operates. Your randomization point marks the spot in a process where you’ll split a group of individuals into a treatment and control group and will also influence what questions you are answering, and with how much precision. In addition to choosing a point that’s operationally, statistically, and ethically feasible, project partners need to  ensure they’re comfortable with whether the point of randomization is setting up a program evaluation or a systems evaluation.     

Systems or Programs: Upstream = System, Downstream = Programs

Program evaluations: If your goal is to learn as much about the effects of the program intervention alone, you want to randomly assign someone as close as possible to the point at which they would receive program services. This ensures that the maximum number of people in the program group get at least some exposure to the intervention and would give you the most precise estimate of the effects of the intervention alone.

Systems Evaluations: We also know that people don't enter child welfare, homelessness or reentry services in a vacuum. They frequently interact with multiple public sector and non-profit partners in addition to the one being evaluated. If you are interested in learning about the broader effects of how beneficiaries are identified, assessed, and served by these actors, you may have random assignment occur before a person reaches the door of any single service provider. It is important to understand that more “upstream” evaluations are answering questions about the combined effects of collective activities. You may be able to  disentangle the effects of the two using statistical adjustments, but it can be difficult to to attribute results to a particular program or step in the process.

To take it from theory to practice, here are a few examples to show how the same intervention might fit into a program or a systems evaluation design depending on the recruitment and randomization practices:

Chart that shows intervention evaluations, including system examples and program evaluation examples

What are the implications of each?

A well designed and well executed program evaluation isolates the interventions and seeks to answer whether the services, on their own, are producing impact.

Many service providers benefit from program evaluations - it’s a clear tool to determine whether your theory of change delivers the outcomes for which it’s designed.

Systems evaluations often set up exactly the kind of partnership that providers want with institutional partners. It can incentivize genuine collaboration in which several stakeholders are working towards common goals. Further, this evaluation approach recognizes that a person has a variety of important interactions apart from a service provider that may affect their trajectory.

Each approach also has drawbacks. While powerful, program evaluations often require providers to take part in randomization, which could mean turning people away for reasons that can feel difficult to justify. This can be emotionally draining and, in some projects, randomization can last for years. Any organization preparing for a program evaluation should have strategies in place for supporting staff through the process and responding to frustrated participants and referral partners.

Systems evaluations have the potential to answer really important questions about demand for services, referral processes, and who the right target population might be for a service.

Yet isolating the impact of a single actor within that system can be more difficult; for a service provider that is searching for a simple “yes” or “no” regarding their effectiveness, a systems evaluation may not provide clarity.

Dynamics pushing PFS towards systems evaluations

Even though PFS has developed a strong narrative about evaluating specific providers or interventions, PFS projects have often looked more like systems evaluations. The success of a therapeutic curriculum on Rikers Island depended in part on the facility keeping control and treatment groups separate and distinct; FrontLine’s interventions for families in crisis requires Cuyahoga County’s child welfare system, courts, and safety net agencies to better integrate practices and buy into FrontLine’s approach.[1] PFS stakeholders have often messaged the importance of shifting performance risk away from government onto providers, but in many of these projects provider performance is deeply connected to the behavior of multiple actors, including public sector agencies.

As a provider, there can be significant risk in using a systems-level approach for an evaluation that is positioned as an evaluation of your effectiveness as a provider. In the college access program systems example, success for the treatment group requires more than an effective provider - the school district would need strong data collection and storage practices; accurate contact information for students and families; and a good strategy for identifying first generation college students that’s consistently implemented across schools, guidance counselors, and administrators. In many places those high-functioning systems are not a given. If the school district failed to identify the right referrals, was uncooperative with sharing information, or lost interest in the project midway through the evaluation the provider would struggle to demonstrate impact for reasons outside of its control. The evaluation could also be muddied if guidance counselors, assuming that treatment group members were being served by the provider, offered them less college planning support than their counterparts in the control group.

Factors and Tradeoffs Pushing Projects Towards Systems or Program Evaluations

While a PFS project, or other evaluation, may begin with an intent to evaluate either a specific program or a group of linked activities, a variety of factors may result in the research question shifting. There are several dynamics that act together to push PFS evaluations towards towards systems or program evaluations.

Scale versus Match: Governments often seek providers that are as effective as possible for as many people as possible while providers push to enroll those for whom they’re most likely to make an impact. PFS is often used as an opportunity to pursue both goals, pushing government to systematically refer eligible individuals, and defining provider success based on everyone identified as needing services. It’s a GREAT shift in approach if the right population is identified and it’s well implemented. The evaluation, however, will be assessing the success of a set of actors working in a coordinated way rather than just a service provider doing business as usual. This may be a significant shift from how those actors worked together before PFS. Evaluations that rely on government playing an active, participatory role in identifying and referring participants can contain elements of a systems evaluation.

Optics & Ethics: Government and providers may love the rigor of RCTs, but often dislike the idea of denying beneficiaries a service that may be critical for their health, economic security and general well being. Project partners want to make the randomization process palatable for all involved and that often means pushing the point of randomization upstream, and concealing it, to minimize disappointment of individuals not receiving services. If you compare the program and systems designs described above, you can see many people’s aversion to turning a hopeful high school student away after they’ve already gone to the trouble of applying to a program. Systems evaluations often involve randomizing people before they even know about a program. [2]

Powering the Evaluation: When you design an evaluation you need enough people in the treatment and control groups to “power” it, taking into account the effect size you want to find and the variation you expect to see in the data. Sometimes providers need to take on new recruitment strategies or find other ways to identify enough eligible individuals for both treatment and control groups. Often the solution is to move further up the pipeline and partner with government to identify a large pool of potential participants and randomize them as they’re identified. Note that you can have it both ways - if a partner still wants a program evaluation but needs to recruit more people, they can partner with other systems to better identify all eligible people, but randomize downstream like a traditional program evaluation.

Cream-Skimming Anxiety: Governments have, rightfully, been taught to be wary of “cream skimming” and their radar goes up when they hear about randomizing at the door because they think a provider is trying to hone in on a highly motivated subset of a population. This is a valid concern when there’s no comparison group. However, experimental designs correct for this concern -- projects will naturally have a harder time showing impact for motivated individuals who are already on the path to success because their equally motivated counterparts are in the control group. In our experience, however, stakeholders may miss that nuance and push back against randomizing close to the point of intervention.

Guidance for Providers

  • Consider the long-term implications of the data: A rigorous evaluation showing positive impact can open doors for providers, regardless of whether it is a program or system evaluation. While entering into either design comes with operational and reputational risks, providers should be especially aware of the complexity of systems evaluations.
  • If you have a systems evaluation, advocate for analysis of the people served: Evaluators use adjustments (Treatment on the Treated being a common one) that can approximate the impact of an intervention on its own, even in a systems evaluation. These can be effective layer of protection for service providers. It’s best to consult with an evaluator to ensure that you fully understand the adjustment and its likely effects.
  • Know your partners:  Any multi-year project is going to encounter hurdles. If the success of a project hinges in part on people outside your organization (such as doctors, parole officers, social workers, teachers, government leaders, etc.) acting in a specific way, consider whether it’s reasonable to expect that they will fulfill what is asked of them. Have you worked with them previously? Are there brand new players in the partnership? Are you confident they’ll sustain performance over a multi-year period? If they don’t, do you have the relationships or the project governance structure to resolve issues?
  • Use caution if enrolling control group members: When you randomize to create treatment and control groups, you have to decide how to treat the control group. Control group members often arrive at service providers seeking help. Many projects allow control group members to enroll; it’s really hard to turn people away. If you enroll them, however, it makes it additionally difficult to distinguish program impacts.
  • Brand your project appropriately: If you’re a provider in a systems evaluation, you run the risk of a no impact finding and the communications message being “well, now we know intervention X didn’t work.” Ensure that language describing the project reflects the research question at hand and the interconnected nature of the various agencies included in the evaluation design.
  • Do a gut check: A RCT often requires years of sustained effort. If the research question behind an evaluation (and the other benefits of PFS) isn’t compelling enough to warrant the investment, it might not be the right project for you.

In this piece I’ve set out to help providers differentiate between program and systems evaluations, understand the implications of each, and anticipate how the concepts may play into PFS negotiations. At CEO we genuinely want to engage in both program and systems evaluations - we seek to understand the nuances of our model and how it operates within the broader ecosystems in which our participants operate. Building understanding of both issues is critical to the field and, most importantly, to the people we serve.


Christine Kidd is the Director of Program Innovation at the Center for Employment Opportunities (CEO). CEO helps men and women coming home from incarceration to find and keep jobs. CEO has been involved in two PFS projects and multiple performance-based contracting efforts.

This blog was supported through funding awarded in 2014 by the Corporation for National and Community Service Social Innovation Fund.

The Corporation for National and Community Service is the federal agency for volunteering, service, and civic engagement. The agency engages millions of Americans in citizen service through its AmeriCorps, Senior Corps, and Volunteer Generation Fund programs, and leads the nation's volunteering and service efforts. For more information, visit NationalService.gov.

The Social Innovation Fund (SIF) was a program of the Corporation for National and Community Service that received funding from 2010 to 2016. Using public and private resources to find and grow community-based nonprofits with evidence of results, SIF intermediaries received funding to award subgrants that focus on overcoming challenges in economic opportunity, healthy futures, and youth development. Although CNCS made its last SIF intermediary awards in fiscal year 2016, SIF intermediaries will continue to administer their subgrant programs until their federal funding is exhausted.  

[1] For Riker’s Island, see Kathy Gibbs’ recent commentary at For Cuyahoga County, see https://www.thirdsectorcap.org..., page 5 “Who are the Stakeholders, Roles and Resources from Cuyahoga?”

[2] The ethical questions regarding RCTs and our comfort experimenting with those served by PFS projects (low income individuals, people of color, incarcerated people) deserve attention and project partners should test the ethics of the research design. At CEO we’re aware of the legacy of unethical experiments; the communities we serve have been subject to experimentation beyond their control.