Back to top

Broadening the range of designs and methods for impact evaluations

Elliot Stern (Team Leader), Nicoletta Stame, John Mayne, Kim Forss, Rick Davies, Barbara Befani

Executive Summary

  1. This report covers a study commissioned by DFID entitled ‘Broadening the Range of Designs and Methods for Impact Evaluations’.
  2. Impact Evaluation (IE) aims to demonstrate that development programmes lead to development results, that the intervention as cause has an effect. Accountability for expenditure and development results is central to IE, but at the same time as policy makers often wish to replicate, generalise and scale up, they also need to accumulate lessons for the future. Explanatory analysis, by answering the ‘hows’ and ‘whys’ of programme effectiveness, is central to policy learning.
  3. IE must also fit with contemporary development architecture that post Paris Declaration is decentralised, works through partnership and where developing countries are expected to be in the lead. These normative principles have practical implications for IE. For example, working through partners leads to multi-stage, indirect causal chains that IE has to analyse; and using a country’s own systems can limit access to certain kinds of data.
  4. Up to now most investment in IE has gone into a narrow range of mainly experimental and statistical methods and designs that according to the study’s Terms of Reference, DFID has found are only applicable to a small proportion of their current programme portfolio. This study is intended to broaden that range and open up complex and difficult to evaluate programmes to the possibility of IE.
  5. The study has considered existing IE practice, reviewed methodological literatures and assessed how state-of-the art evaluation designs and methods might be applied given contemporary development programmes.
  6. Three elements - evaluation questions; appropriate designs and methods; and programme attributes - have to be reconciled when designing IE. Reviews of existing evaluations suggests that sometimes methods chosen are unable to answer the evaluation questions posed; or the characteristics of development programmes are not taken into account when choosing designs or deciding on evaluation questions.
  7. Demonstrating that interventions cause development effects depends on theories and rules of causal inference that can support causal claims. Some of the most potentially useful approaches to causal inference are not generally known or applied in the evaluation of international development and aid. Multiple causality and configurations; and theory-based evaluation that can analyse causal mechanisms are particularly weak. There is greater understanding of counterfactual logics, the approach to causal inference that underpins experimental approaches to IE.
  8. Designs need to build on causal inference approaches each of which have their strengths and weaknesses, one of the reasons that combining designs and methods – so called ‘mixed methods’ – are valuable. Combining methods has also become easier because the clear distinctions between quantitative and qualitative methods have become blurred, with quantitative methods that are non-statistical and new forms of within-case analysis made easier by computer aided tools.
  9. On the basis of literature and practice, a basic classification of potential designs is outlined. Of the five design approaches identified - Experimental, Statistical, Theory based, Case-based and Participatory – the study has in line with its ToR concentrated on the potential of the latter three.
  10. The study has concluded that most development interventions are ‘contributory causes’. They ‘work’ as part of a causal package in combination with other ‘helping factors’ such as stakeholder behaviour, related programmes and policies, institutional capacities, cultural factors or socio-economic trends. Designs and methods for IE need to be able to unpick these causal packages.
  11. This also has implications for the kind of evaluation questions that can usefully be asked. It is often more informative to ask: ‘Did the intervention make a difference?’ which allows space for combinations of causes rather than ‘Did the intervention work?’ which expects an intervention to be cause acting on its own.
  12. Development programmes that are difficult to evaluate such as those concerned with governance, democracy strengthening or climate change mitigation, are often described as ‘complex’. In the case of humanitarian relief or state-building the term complexity can also be applied to the setting in which the programme is located.
  13. Instead of classifying programmes into how complicated they are, attributes of programmes were identified on the basis of literature reviews and analysing a portfolio of current DFID programmes. These attributes included duration and time scale; nonlinearity and unpredictability; local customisation of programmes; indirect delivery through intermediate agents such as funds; multiple interventions that influence each other – whether as sub-programmes within the same programme or in separate but overlapping programmes.
  14. Tailored evaluation strategies are needed to respond to these attributes. For example careful analysis is needed to decide under what circumstances large multi-dimensional programmes, such as those characteristic of the governance area, can be broken down into sub-parts or have to be evaluated holistically.
  15. A reality that often has to be faced in IE is that there is a trade off between the scope of a programme and strength of causal inference. It is easier to make strong causal claims for narrowly defined interventions and more difficult to do so for broadly defined programmes. The temptation to break programmes down into sub-parts is therefore strong, however this risks failing to evaluate synergies between programme parts and basing claims of success or failure on incomplete analysis.
  16. Similarly when the effects of programmes are long-term and have unpredictable trajectories, designs for IE will need to take this into account. Results monitoring may need to be prioritised alongside a staged evaluation strategy able to respond to changes in implementation trajectories not anticipated at programme launch.
  17. Quality Assurance (QA) for IE is needed both to reassure policy makers that evaluation conclusions are defensible and to reinforce good practice within the evaluation community. Existing QA systems already in use by DFID cover many generic aspects of evaluation quality. We have therefore focussed on what else is needed to assure quality in relation to IE.
  18. Having reviewed the literature on quality in research and evaluation, the study concluded that a common framework could be applied across different designs and methods. Standards such as validity, reliability, rigour and transparency have therefore been incorporated into a three part QA framework covering: the conduct of the evaluation; the technical quality of methods used and normative aspects appropriate to IE in an international development setting.
  19. This has been a ‘proof of concept’ study and many of the newer designs and methods identified have not previously been applied in international development IE. Making these IE approaches usable in the development setting will require field-testing in a number of targeted programmes in cooperation with DFID’s decentralised offices and evaluation specialists.
  20. IEs should not be regarded as an everyday commission. Any IE that is thorough and rigorous will be costly in terms of time and money and will have to be justified. Criteria are therefore needed to decide when such an investment should be made. More generally IE raises questions of capacity both within development agencies such as DFID and among evaluators. For this reason ‘Next Steps’ are outlined that would enhance capacities needed to support the take-up of a broader range of designs and methods for IE in international development.
  21. It is important to recognise that even when IE is inappropriate, enhancing results and impacts can still be addressed through evaluation strategies other than IE. Results monitoring can make major contributions to accountability driven evaluations. There will also be occasions, when real-time, operational, action-research oriented and formative evaluations can all make serious contributions to filling gaps in evidence and understanding.

Follow the link to read the full report on DIFD, where it was originally published.