Validation in the absence of observed events: weapons of mass destruction (WMD) terrorism risk assessment models

– by John Lathrop, Ph.D. and Barry Ezell, Ph.D.

Introduction

The concepts we present here are based on ten years of experience involving nine specific Weapons of Mass Destruction Terrorism Risk Assessment Models (WMD-TRAMs) and the Lugar survey.

Each TRAM has one or more adversary choice components.  Each model has its own strengths, weaknesses, constraints, limitations, and assumptions.  The perspectives here arise not out of abstract analysis, but out of the perspective we have gained across those nine TRAMS and a survey – in particular the challenges and solutions we have observed across them, the differences among them, and the relative strengths and weaknesses of each TRAM-survey as revealed by comparing it to the other TRAMs-survey.

We maintain that the important issues of validating models of adversary behavior are all quite fully raised, and in fact better raised from a more fundamental perspective in arena of WMD TRAMs.  Two conditions for validation form the basis for our reasoning, here stated in a way focused on the relationship between model validation and adversary choice.

  1. Validation must involve how things work in the observable real world.  In this context that must involve focusing on the Risk Generating Process (RGP), as best as can be modeled, as limited by several “curtains” obscuring that RGP from us, among them uncertainty, adversary concealment and avoidance strategies, and otherwise unattainable data.

  2. Validation should focus on what we care about.  In this context that means the risk itself, and so advising risk management, and so those aspects of the RGP that can be accessed and modeled specifically to best advise that risk management.

By that reasoning, we focus here not on adversary choice, but on the particular challenges of validation of models that consider events that have not yet happened.  WMD terrorism risk assessment models provide a highly motivating arena for that focus since:

  • they are extremely important, in terms of stakes (potential consequences) and defensive-system investment decisions advised; and

  • the events of most concern, attacks with large-consequence WMDs, involve full-scenario events that have not yet happened.

Those two points can be unpacked to five interrelated points:

  • How well the model will advise the DM in risk management is not necessarily the same thing as how well the model predicts how the future will unfold.

  • To that end, the model should advise the DM on the nature of how that future will unfold.  We expand on “the nature of” with three very different statements:

  • The parts of that “nature” we should focus on are the parameters/features/natures/aspects of that future that shed light on the best strategies for risk management, as opposed to simplistic results such as probability distributions over targets.

  • A key aspect of that “nature” is how little we can know about how the future will unfold.  We discuss this aspect later in terms of the degree of overconfidence/underconfidence in predictions of the future, in probability distributions over such things as improvised nuclear devices, routes and targets.

  • We should focus on those aspects of that “nature” that can be used to advise the DM in how to make fundamental strategic tradeoffs to deal with that future, such as between defend-each-single-target systems, versus non-target-specific defensive strategies such as disincentivization, resilience, robustness, etc.  To explain:  By disincentivization we mean to reduce Red incentives in ways not target specific, e.g. try to generally facilitate a “Boston Strong” national reaction to any attack, so that any attack may prove an occasion for Blue to demonstrate its societal strength.  Note our adoption of the gaming terms “Red,” in this context the adversary, the terrorist, “Blue” the defender.

So we should not focus too tightly on figuring out imperfect ways to approximate, in this context, “Validation” as classically defined.  We have noticed some phrases used to describe some versions of validation of models for not-yet-happened events: “valid because people use it,” “justify why the model is valid based on our best knowledge or accepted wisdom,” “it looks right,” “it’s built right,” “it checks out with the data we have,” “triangulation,” “nothing works perfectly so do it all and combine it,” or “someone thinks it’s useful.”  We respect the thinking behind those phrases, but find that they do not provide a basis for standardization of testing for validity.  So here we propose a slightly more specified approach to achieving the same ends as the authors of those phrases are seeking.  Our approach to improving upon those phrases is to focus on why we care about validation, and so focus on how to evaluate models to investigate the degree to which they generate the best possible decision advice.

Four Validation Tests

Test 1:  Does the risk assessment model capture the initiation process?
First we should note that some risk assessment models are scoped such that initiation is not in-scope, and so of course those assessments are not subject to this test.  This test can be summarized in terms of seven questions to be asked.

What external events affect initiation and how well are those events accounted for?

  • What are the key uncertainties that affect initiation and how well are they accounted for?

  • What information bases, including SME expertise, can be brought to bear to model the initiation process and how well are they brought to bear?

  • What are the pitfalls, for example what about the initiation process are SMEs apt to miss?

  • How can those pitfalls be addressed and how well are they addressed?

  • What has been done to explicitly address the problem of SME overconfidence?

  • What has been done to explicitly address the problem of cross-referencing the SME panels with other SME panels and other assessments?

Test 2:  Does the risk assessment model capture the shaping process?
By scenario shaping process we mean the sequence of probabilistic events that describe the step by step mechanics of the scenario as it unfolds.  This Test 2 consists of the same seven questions as listed for Test 1, so here 2.1 – 2.7, but here reworded to apply to the shaping process.

  • What external events affect the shaping process and how well are those events accounted for?

  • What are the key uncertainties that affect the shaping process and how well are they accounted for?

  • What information bases, including SME expertise, can be brought to bear to model the shaping process and how well are they brought to bear?

  • What are the pitfalls, for example what about the shaping process are SMEs apt to miss?

  • How can those pitfalls be addressed and how well are they addressed?

  • What has been done to explicitly address the problem of SME overconfidence?

  • What has been done to explicitly address the problem of cross-referencing the SME panels with other SME panels and other assessments?

Test 3:  Does the risk assessment model consider, capture, account for, unanticipated scenarios?
Unanticipated scenarios have special relevance for terrorism risk assessment.  In that arena we not only have Black Swans, we have “Deliberate Black Swans,” i.e. we must assume that Red is deliberately trying to develop attack scenarios “not on Blue’s list.”  By their very nature, unanticipated scenarios are hard to incorporate into modeling or validation of modeling.  But a model can and should be tested, based on asking the following two questions.

  • Does the risk assessment include in its risk management advice, advice specifically to address the risk management implications of unanticipated scenarios?

  • Has every possible means been brought to bear to characterize the risk management implications of unanticipated scenarios?  “Every possible means” can include Red Team SME panels, i.e. SME panels specifically charged with generating specific or generally characterized challenging previously unanticipated scenarios.

Test 4:  Are there alternative probabilistic causal chains other than the one(s) accounted for?
This test is another, more general, perspective on the issues examined by Test 3.  The assumption with most (actually, all that we’ve seen) risk assessment models is that the probabilistic causal chain of the model (or whatever equivalent structure is employed in the model) is the only way the real world works to generate risk.  But is it?  This test can be implemented based on asking the following two questions, which are slight rewordings of the two questions of Test 3.

  • Does the risk assessment include in its risk management advice, advice specifically to address the risk management implications of alternative probabilistic causal chains?

  • Has every possible means been brought to bear to characterize the risk management implications of alternative probabilistic causal chains?  “Every possible means” can include Red Team SME panels, i.e. SME panels specifically charged with generating specific or generally characterized challenging alternative probabilistic causal chains.

Summary

So … what is our proposed process of validation?  It is to submit the WMD TRAM to the above four tests, then report in a concise way how well or poorly it passes each of those tests, along with the implications of those shortfalls for risk management advice. 

To recap, these four tests are based on our experience.  We make no claim to more than that.  But we do state that they are our own Best Use of Available Data, that Available Data including our experience with nine WMD terrorism risk assessment models and a survey, to define a validation test regimen corresponding to our revised definition of validation presented earlier:  to test how well the model can advise DMs in WMD terrorism risk management decisions.

One of most important acute-impact risks facing our world today, over the next decades, is WMD terrorism.  We as risk assessors, the authors and many of the readers of Risk Analysis, are and should be compelled to invest our best efforts to assist our respective nations to best manage that risk.  A key part of that should be to develop a concept of model validation that can be applied to WMD terrorism risk assessment models, recognizing that that risk is characterized by high-consequence events (WMD attacks) that have not yet happened.  So we need a new concept of validation that is not based on correlation with observed events, but rather on testing the model against a key criterion: how well the model can advise decision makers in terrorism risk management decisions.