Application of model quality checklist to TIMER Energy model
Introduction
Large, complex energy models present considerable challenges to develop and test. Uncertainty assessments of such models provide only partial guidance on the quality of the results. We have developed a model quality assistance checklist to aid in this purpose. The model checklist provides diagnostic output in the form of a set of pitfalls for the model application. The checklist is applied here to an energy model for the problem of assessing energy use and greenhouse gas emissions. Use of the checklist suggests that results on this issue are contingent on a number of assumptions that are highly value-laden. When these assumptions are held fixed, the model is deemed capable of producing moderately robust results of relevance to climate policy over the longer term. Checklist responses also indicate that a number of details critical to policy choices or outcomes on this issue are not captured in the model, and model results should therefore be supplemented with alternative analyses.
Method
The goal of this checklist is to assist in the quality control process for environmental modelling. The point of the checklist is not that a model can be classified as ‘good’ or ‘bad’, but that there are ‘better’ and ‘worse’ forms of modelling practice. One should guard against poor practice because it is much more likely to produce poor or inappropriate model results. Further, model results are not ‘good’ or ‘bad’ in general (it is impossible to ‘validate’ a model in practice), but are ‘more’ or ‘less’ useful when applied to a particular problem. The checklist is thus intended to help guard against poor practice and to focus modelling on the utility of results for a particular problem. That is, it should provide insurance against pitfalls in process and irrelevance in application. The checklist is designed largely for internal use (within a modelling group) for self-assessment. It can be used as a self-elicitation by competent practitioners, to give form to their own judgements about the models they know intuitively. There are not always single best answers to the questions. What constitutes good practice in one domain may be in conflict with the requirements of good practice in another, and the resolution of such conflicts will often depend on the context.
For the purposes of this checklist we diffentiate between ‘users’ and ‘stakeholders’ as follows: A ‘user’ is someone who exercises the model or who uses its output in some application. user is necessarily aware of the existence of the model. A stakeholder is one who either participates in the policy process regarding the issue at hand, or who is affected by that process in some way. Stakeholders may or may not be aware of the existence of the model (or of the policy process for that matter).
The checklist is arranged as follows. First there is a set of questions to probe whether quality assistance is likely to be relevant to the intended application. If quality is not at stake, a checklist such as this one serves little purpose. The checklist itself is fairly long, and many modellers will not have the time or need to complete the entire checklist. For that reason, we have provided a set of screening questions at the front to allow the modeller to identify the parts of the checklist that are potentially most useful for their application. The first section of the checklist proper aims to set the context for use of the checklist by describing the model, the problem that it is addressing here, and some of the issues at stake in the broader policy setting for this problem. The next sectionaddresses ‘internal’ quality issues, which refers to the processes for developing, testing, and running the model practiced within the modelling group. Next, the interface between the modelling group and outside users of the model is addressed. This section examines issues such as the match between the production of information from the model and the requirements of the users for that information. In a following step issues that arise in translating model results to the broader policy domain are addressed, including the incorporation of different stakeholder groups into the discussion of these results. The final section provides an overall assessment of quality issues from use of the checklist.
The application of the checklist to the TIMER model was carried out in an extended interview with TIMER modeller, Detlef van Vuuren, by Risbey and van der Sluijs. The responses were shaped by dialogue with van Vuuren, but the following descriptions represent our interpretations of that dialogue. The first questions in the interview were aimed at quickly assessing the relevance and utility of the checklist for the given application for assessing long term greenhouse gas emissions. These are cast in the checklist as a set of screening questions to determine whether quality issues are really at stake in use of the model. There is no point completing a detailed checklist for quality assistance if quality concerns are not relevant to the issue in question. Responses to the screening questions showed that there is some question as to the accuracy of model results, some interpretation and judgement of results is required, and that the public is concerned about process and results regarding the model application.
Thus, quality considerations seem relevant to this application and use of the checklist is warranted. The screening section of the checklist also serves to quickly isolate the potentially most cricital areas for quality assistance so that users with limited time can be directed straight to the relevant parts of the checklist.
Results
The main tangible output from use of the checklist is a diagnosis of potential pitfalls in applying the model to the given problem. The following list of potential ‘pitfalls’ were generated in response to the TIMER checklist run. The list of pitfalls is generated via a preset algorithm on the basis of checks of the responses coded for each of the questions.
The algorithm checks for inconsistencies among responses and for responses that indicate potentially poor or inappropriate practice. The results generated from this step were then checked in consultation with the modeller. Some consultation on results is useful because it is difficult to generalize pitfalls. That is because there are not always single ‘best’ answers to the questions. What constitutes good practice in one domain may be in conflict with the requirements of good practice in another, and the resolution of such conflicts will often depend on the context. Thus, the list of pitfalls should be viewed as a guide only:
• Uncertainty in input values is only partially represented by the sensitivity runs carried out to date. Thus, the list of key parameters selected for this problem is not necessarily complete.
• Since uncertainties have not been propagated through the model from inputs to outputs, one cannot rigorously state what the final error bars are. It is important to be cautious of this fact in interpreting model results.
• Since alternative model structures have not been tested and have only indirectly been addressed through model intercomparison, the effects of structural uncertainty are partly unknown. More effort may need to be devoted to exploring effects of alternative model structures.
• Model results are sensitive to uncertainty in model structure formulation. This fact should be noted when presenting results.
• The key results are potentially very sensitive to uncertainty in parameter values. The non-robust nature of the energy system represented by the model should be signalled to users.
• There is a broad spread of possible output values in key model results. Some of the uncertainty may be irreducible, and high spread does not necessarily imply low quality. Nonetheless, the results should be checked
against users needs to determine if the spread is narrow enough to be useful.
• There is a lack of systematic processes for managing development of the model.
• It is difficult for outside groups to run the model because of specialized requirements of software and familiarity with a large, complex body of code. This means that model results are effectively not very reproducible by
outsiders, increasing the likelihood of error and decreasing general acceptance of the results.
• The model could benefit from more involvement of stakeholders in using or inspecting the model. The reasons for relatively low stakeholder involvement should be ascertained if not already known.
• Users of model results in policy are at best partially aware of the implications of different value choices in the model. Better communication seems warranted in this regard.
Conclusions
The list of potential pitfalls generated for the TIMER run through the checklist are intended to apply to use of TIMER results on energy scenarios and greenhouse gas emissions. It is clear from use of the checklist that results on this issue are contingent on a number of assumptions that are highly valueladen.
When these assumptions are held fixed, the model is deemed capable of producing moderately robust results of relevance to climate policy over the longer term. However, it is critical that the effects of value choices be communicated as clearly as possible in assessing model results. Checklist responses also indicate that a number of details critical to policy choices or outcomes on this issue are not captured in the model, and model results should therefore be supplemented with alternative analyses.
While these comments are made in reference to testing of the checklist on TIMER, they would apply broadly to other energy models as well. That is because other energy models must make the same assumptions and compromises as TIMER in approaching this problem. They may make different choices in how best to do this, but that does not weaken the force of many of the most critical assumptions or reduce the inherent value-loading of the analysis.
The checklist could be used at various stages in the development of a model and application to a particular problem. In the energy model example given here the checklist was employed after the initial development of the model and during the ongoing application of projecting greenhouse gas emissions. The diagnosis of pitfalls can help in further model development and in effectively connecting the model to the policy process – by avoiding the more obvious pitfalls in this process. The checklist could also be used proactively prior to development of a model to shape the process of model development itself. However, it should also be kept in mind that the checklist is oriented at the role of models only. It does not provide assistance for use with the various other tools and aspects that can be included in the environmental analysis process (e.g. the uncertainty guidance).
Finally, the case demonstrated that the model quality checklist has potential to provide useful diagnostic aid in the quality assessment process for complex environmental models.
Documentation of the case
Journal article:
Risbey, J., J. van der Sluijs, P. Kloprogge, J. Ravetz, S. Funtowicz, and S. Corral Quintana (2005): Application of a Checklist for Quality Assistance in Environmental Modelling to an Energy Model. Environmental Modeling & Assessment 10 (1), 63-79. DOI:10.1007/s10666-004-4267-z
Short version:
James Risbey, Jeroen van der Sluijs, Penny Kloprogge, Jerry Ravetz, Silvio Funtowicz, Serafin Corral Quintana, Application of a Checklist for Quality Assistance in Environmental Modelling to an Energy Model, in: Integrated Assessment and Decision Support. Ed. A.Jakeman, A. Rizzoli. Proceedings of IEMSS 2002 Conference, 24-27 June 2002, Lugano, Switzerland. p. 25-30.
Checklist only:
James Risbey, J.P. van der Sluijs, J. Ravetz and Peter Janssen, A Checklist for Quality Assistance in Environmental Modelling, NW&S report, Utrecht University.
Interactive version of the model quality checklist:
http://www.nusap.net/sections.php?op=viewarticle&artid=15