Tuesday, November 9, 2010

Villgro Blog: The Bad and Good News About Evaluation

In an earlier post, I had critiqued the research methodologies of 2 surveys of social entrepreneurs, by Ashoka and Intellecap. Most of my suggestions for improvement were directed towards helping Ashoka and Intellecap better answer the question, “How representative are my results of a larger community (the universe) of social entrepreneurs?” However, at the end of the post I had also mentioned the concept of causation. Causation is important when trying to answer a different question, which is, “How likely is it that my results were caused by a particular intervention, rather than by something else?”

From the 25th to the 28th of October I attended the Evaluation Conclave 2010 in Delhi, where I had the chance to explore these issues further. In an evaluation context, it’s often the word attribution, rather than causation, that is used. Attribution is an issue at many levels. I’ll try to illustrate this using a hypothetical example of a social enterprise.

This social enterprise produces insecticide-treated bed nets. Orders for the bed nets are placed through a microfinance institution, and the bed nets are delivered to customers through kirana stores (corner shops). The social enterprise requires funds for its social marketing campaigns, to cover costs until it reaches break-even, and to develop talent within the organization. It is able to find donors to meet each of these needs.

An evaluation is conducted of the social enterprise, and it is found that incidences of malaria have decreased in the households in which customers have bought bed nets. The donor that funded the talent development program wants to know how its money contributed to the reduction in malaria. However, it is not possible to isolate talent development from the rest of the organization, as the people developed through this program contribute to social marketing, production and all other aspects of the enterprise.

What all three donors can agree on is that they want to know to what extent the reduction in malaria can be attributed to the bed nets. While it’s easy to assume that bed nets resulted in a reduction in malaria, this may not be the case. For example, the government could have embarked on a mosquito eradication program, and sprayed the villages in which the bed nets were also sold.

How do we find out the extent to which the reduction in malaria can be attributed to the bed nets? The strongest evaluation designs that answer this question should have two characteristics. The first is that they should include both the project group, and comparison (non-equivalent control) group. The second is that both groups should be “observed” at both the start and end of the project. If time and budget permit, both groups can also be “observed” during implementation of the project and after it has been operating for some time.

What is it about these evaluation designs that make them appropriate for the social enterprise in the example?
• The project group is “observed” both at the start and at the end of the project. In our example, the project group is households that have bought bed nets. A figure for the incidence of malaria at the end of the project is meaningless if we do not know what the incidence of malaria was at the beginning. The only exception is if the incidence of malaria is 0. However, since this would require all the bed nets sold to be in perfect condition, to be used all the time, and by all members of the family, this is highly unlikely to be the case.
• There is a comparison (non-equivalent control) group, which is also “observed” at the start and end of the project. Even if the incidence of malaria has reduced in our project group, how do we know that this was due to the bed nets and not other factors? The answer is by selecting a comparison (non-equivalent control) group. This group should comprise of households that did not buy the bed nets, but are as similar as possible to the households which bought bed nets. This will enable the evaluator to compare whether malaria declined at a similar rate in the project group and comparison group. If it did, then it is likely that malaria declined due to factors other than the bed nets.

The bad news about evaluation is that it can be quite complicated to successfully meet the criteria for a strong evaluation design. The good news is that between strong and weak evaluation designs, there are a range of options that are frequently adequate. These options are summarized in this overview of Real World Evaluation, a book by Michael Bamberger, Jim Rugh and Linda Mabry. I was lucky enough to attend a workshop that spanned two days, and was facilitated by Jim Rugh, at the Conclave. Based on what I learnt at the workshop, I’m going to propose my own design to evaluate the social enterprise in my example above.

If you remember, orders for the bed nets are placed through a microfinance institution (MFI). The social enterprise is still building its distribution network, and has only reached the kirana shops in one geographical area. However, it asks the MFI to aggregate orders from all of the geographical areas in which it has a presence. The customers who have ordered the bed nets, but whom the social enterprise’s distribution network has not reached as yet, will serve as the comparison group.

It is important for the evaluation design that the comparison group also comprises of clients of the MFI. This is because it is likely that MFI clients are more enterprising than other members of their communities. It is possible that even without bed nets, they have devised other solutions to avoid getting bitten by mosquitoes. Therefore, in order to understand whether it is bed nets that have made the difference, the households where bed nets were bought must be compared to other enterprising households, and not to the average population.

The “observation” of the project and comparison groups at the start of the project can come in part from the MFI’s records, as they are likely to have already collected a lot of data on their clients. Any additional data that needs to be collected can be done at the time that orders for the bed nets are placed, before they have been bought and have had the chance to have an effect.

The Real World Evaluation book uses terms such as during, upon completion of, and after the project. In the case of a social enterprise, it is more likely that the evaluation will be of an ongoing business, than of a project that has a start, middle and end. However, as one of Paul Polak’s principles of Designing for the Other 90 Percent is that the design should pay for itself in the first year, one year may be a suitable duration to designate as the project period.

Of course, in this case I have chosen the business model of the social enterprise so that it lends itself to a strong evaluation design. Designing evaluations for actual social enterprises will be less easy. However, I hope my example demonstrates that while difficult, designing a strong, or at least adequate, evaluation is not impossible, and that practical solutions can be found to real world constraints.

Finally, there are some instances in which attribution may either not be possible or not necessary. As I discussed earlier, it may not be possible to attribute the effects of a talent development program within a social enterprise to a reduction in malaria. One example of where attribution may not be necessary is in the evaluation of a sector. Returning to our example, let us assume that an epidemic hits the Asian continent. The government is not prepared for this epidemic. Therefore, in the year the epidemic hits, while all of the government’s planned health programs are successful, the health of the population decreases overall.

In such an event, the overall trend in the health of the population may not be discovered through evaluations of individual interventions or organizations. In a sectoral review, positing an entire nation as a project group and another nation as a comparison group may not be meaningful.

While the example of the epidemic may seem dramatic and unlikely, at the Conclave there was an interesting exchange between a representative of the Rockefeller Foundation and the Asian Development Bank on whether donors should, in addition to evaluating the interventions that they fund, evaluate whole sectors as well. However, sectoral evaluations are likely to be quite expensive, complicated, and therefore rare. For the vast majority of evaluations, issues of attribution and comparison will remain important.

1 comment:

  1. How do we measure the impact created by social enterprises which have failed to do a baseline survey or does not have a relevant control group ?