Ever wondered how to validate a Terrorism Risk Assessment Model?
A defining risk of our time is the possibly growing capability of terrorist groups to fabricate and deliver weapons of mass destruction (WMDs). That risk is characterized by extreme possible consequences, including tens of thou- sands of fatalities and initiation of global conflict. Yet by some definitions of WMDs, we have not, as of this writing, observed even a single full-scenario event. There are three other interrelated aspects of that risk: 1) the essential terrorist-defender game aspect of the risk, where the terrorist may be intelligent and adaptive to defensive actions, and may make decisions based on poorly understood processes of radicalization and poorly understood foreign and domestic subcultures; 2) terrorist incentives to develop and launch WMD attacks may be changing due to “The Great Unraveling” of international pro- cesses (Cohen, 2014; Haas, 2014); 3) terrorist capabilities can include step function increases due to Internet information, random meetings of individuals, and random opportunities.
These considerations combine to create an almost overwhelming risk management challenge and an almost overwhelming risk assessment challenge for risk analysts. We pose that latter challenge as: How, in this context, do analysts apply all available data and analysis tools to generate the most effective risk management advice? That challenge has many facets.
In this chapter, we address one particular facet: How should we validate WMD terrorism risk assessment models (here abbreviated as TRAMs, always in this chapter as applied to WMD terrorism risk) in the absence of observed events? We define WMD as not including IEDs and IEDs with effects multi- plied through, e.g., attacks on infrastructure. We began to address the challenge in “Validation in the Absence of Observed Events,” recently published in Risk Analysis (Lathrop & Ezell, 2015). There we create a special definition of validation of models concerning not-yet-happened events. The basis of that definition rests on considering why we validate models. Typical validation involves some form of testing how well the model results correlate with observed events. Those tests are not based on a desire simply to correlate with observed events, but more fundamentally are based on a desire to test how well the model can advise decision makers in their decisions.
We find support for that reasoning in three sources: 1) ISOStandard 15288, on “Systems and software engineering – System life cycle processes,” which defines validation as “confirmation, through . . . objective evidence, that the requirements for a specified intended use . . . have been fulfilled” (ISO, 2015); 2) MIL-STD-3022 on Verification, Validation, and Accreditation (VV&A) for Models and Simulations, which states that “Validation [is the] process of determining the degree to which a model . . . and [its] associated data are accurate representations of the real world from the perspective of the intended use(s)” (Department of Defense, 2008). Then finally, we cite Sargent, who lists four key entities in his paradigm for model verification and validation (Sargent, 2013). Two entities apply to model validation. 1) “Conceptual model validation is defined as determining that the theories and assumptions underlying the conceptual model are correct and that the model representation of the problem entity is ‘reasonable’ for the intended purpose of the model”; and 2) “Operational validation is defined as determining that the model’s output behavior has a satisfactory range of accuracy for the model’s intended purpose over the domain of the model’s intended applicability.”
Note that all three definitions are based on intended use(s) and purpose. What is the intended use and purpose of TRAMs? To improve WMD terror- ism risk management. Applying that reasoning to the challenge we present here leads to a one particular definition of validation we will focus on in this chapter: Validation in the absence of observed events should test how well the model, including the process by which the model is built and operated and its results presented to decision makers, can improve risk management decisions.
Underlying that broader definition is a broader conceptualization of what a risk assessment is. In this context, a risk assessment is typically not like a voltmeter – it is not a “riskmeter.” In many cases we have studied, the results format issues involved are ones of representation of a complex, uncertain situation – that is, analytically reducing a large amount of information into summary renditions targeted to improving risk management decisions. That perspective frames the reasoning of this chapter.
We apply these points to specify four concepts of validation:
First: Validation should test how well the model performs the modeling, model quality, and engagement functions called for by the use and purpose of TRAMs to improve risk management decisions. In Sections 4.3 through 4.6, we present 13 tests of how well the model fulfills these functions.
Second: Validation should include tests of full disclosure. The 13 tests we specify are quite challenging. In fact, none of the 11 TRAMs in our experience would perform well on many of the tests. Again, invoking our central principle of support of decision makers, validation should include an evaluation of the degree to which a TRAM achieves full disclosure, defining that in the following terms: full disclosure of how and how much the TRAM falls short on the tests we specify here, and the risk management implications of each of those shortfalls.
Third: Validation is not a matter of valid/not valid. It is not a matter of being above some threshold of minimum validity. Petty, in his work on verification, validation, and accreditation, stipulates that it is largely incorrect to refer to complex models as “valid” or “validated.” More appropriately, validation is a qualified description of validity that is interpreted by its users as accurate enough to be useful within the parameters and context (bounds of validity) set forth in its design (Petty, 2010).
Fourth: Validation is not a matter of generating a “validity score.” It is a matter of testing the model to determine how well it performs the modeling, model quality, and engagement functions called for by the use and purpose of TRAMs to improve risk management decisions, then full disclosure as defined earlier. Others have suggested that we develop constructed scales for each of our 13 tests, to measure the degree of performance of a model on each of the 13 tests, as if they are attributes in a multi-attribute utility score. We respect- fully disagree. Constructed scales are quite useful in many contexts, but in this context, would involve the collapse of several different considerations onto a single N-level scale within each of the consideration sets represented by each test. That would represent an unnecessary and unhelpful loss of information, unnecessary since no decision maker is typically in the position of evaluating alternative TRAMs based on a validation “score.” Rather, a decision maker is in the position of wanting all applicable risk management advice available from a TRAM project. Again, citing our central principle of support for the decision maker, that decision maker would not be well served by a score of performance on each of the 13 tests. He or she would be well served by a report on how well the model performs on each test, its shortfalls with respect to each test, and the risk management implications of those shortfalls.
That last point bears restating, in terms summarizing the insights of the previous paragraphs: Once we decide that validation in this context should be based on how well the model can improve risk management decisions, that
leads to defining validation on two levels:
1) Performance: How well the model performs the modeling, model quality, and engagement functions called for by the use and purpose of TRAMs to improve risk management decisions.
2) Disclosure: How well any shortfalls in that performance, and the risk management implications of those shortfalls, are communicated to risk management decision makers.
As mentioned, in this chapter, we present 13 validation tests. The first four are from the Risk Analysis article (Lathrop & Ezell, 2015). Those tests are based on insights we have gained over 10 years of work with the 11 TRAMs we have developed, augmented, analyzed, compared, or reviewed, and on a systematic
examination of the strong and weak points of each TRAM, how those
strong and weak points matter for the usefulness of the decision advice available from each TRAM, and differences among the TRAMs. Table 4.1 lists those 11 TRAMs. It also lists a twelfth analysis, a survey, which we do not count as a model. All of the concepts presented in this chapter are based on a combined consideration of those 11 models. That combined consideration concept is central to this chapter, in that it allows us to make statements based on the strong
and weak points of the models that could not be made specifically about any one of the models, due to confidentiality and classification issues.
This chapter is written to be useful to four groups of people: those who may be commissioning TRAMs, who may be using TRAMs, who may be validating TRAMs, and who may be performing TRAMs. As should become clear in the course of this chapter, all four groups have a vested interest in maintaining a level of TRAM performance and disclosure specified by the concepts of validation presented here. This chapter ranges over a wide variety of validation
issues, but those issues are not chosen arbitrarily. They are based on our systematic study of the 11 TRAMs listed in Table 4.1. The model validation issues addressed here are, frankly, quite challenging.
We address them by introducing a conceptual framework in Section 4.2, then applying each of four perspectives, presented in Sections 4.3 through 4.6, to develop 13 validation tests. Section 4.7 applies those tests to BTRA as it is reported in the 2008 NAS study. In Section 4.8, we present a combined look at the third and fourth perspectives, evaluating how well the model engages with the decision- making process and with the terrorist- defender game. Section 4.9 recaps the findings of this chapter, then Section 4.10 presents paths forward, a
discussion of the real- world practical considerations of implementing the validation processes presented here.