- January 22, 2021

ACS6132 Agent-based modelling and multi-agent systems Lecture 8: Model calibration and validation I Robin Purshouse ACSE, University of Sheffield Autumn Semester 2020/21 1 Lecture overview This lecture starts our introduction to the calibration and validation of agent-based models. In the lecture we look at: • Calibration parameters, and where these come from; • The Bayesian framework for model calibration; • Bayesian methods for performing calibration of agent-based models. 2 Introduction 2.1 A simple agent based model with some parameters Consider a very simple agent-based model, where agents make decisions about a binary behaviour y[k] based on beliefs x1[k] and desires x2[k] in discrete time k. For agent i, the decision rule is: yi[k] = { 1 ωxi1[k] + (1− ω)xi2[k] > τ 0 otherwise. (1) where ω determines the relative weighting of beliefs and desires in the decision, and τ is a threshold that triggers the behaviour. We will also assume that the agents are randomly arranged on an undirected loop network, being connected to their nearest neighbours. The desire of agent i is then influenced by the behaviour of neighbouring agents according to: xi2[k + 1] = (1− ψi)xi2[k] + ψi 2 (yi−1[k] + yi+1[k]) (2) Robin Purshouse ACSE, University of Sheffield where ψi represents the individual susceptibility of agent i to influence by other agents. Our very simple model assumes that there is no influence on an agent’s beliefs. For completeness, we can write: xi1[k + 1] = x i 1[k] (3) In the model, we will assume that state variables xi1 and xi2, and constants ω, τ and ψi, are all defined on the range (0 1). We also need to define initial conditions, at k = 0, for the model, i.e. xi1[0] and xi2[0] for all i. 2.2 Calibration of the model 2.2.1 What is calibration? When we talk about calibrating (also known as training or estimating) a model, we are saying that we want the model to be representative of a real-world system over a defined period of time. This representativeness is defined empirically – i.e. by reference to observed data collected for the system over the time period. In your group project, the system is the population of Long House Valley, and the period is 800 A.D. to 1350 A.D. 2.2.2 Hyperparameters In order to make our model representative of the real-world system, we need to identify suitable values for xi1[0], xi2[0], ω, τ and ψi. Even in this very simple model, we already face one major issue: that we need to choose values of xi1[0], xi2[0] and ψi for every agent i in our model. To manage this issue, we use a parametric representation for each of these vari- ables, in which we describe the heterogeneity (i.e. differences) between agents using a functional form. This function will include constants of its own (known as hyperpa- rameters), and we will manipulate these hyperparameters to define the heterogeneity needed in the population of agents. In our model, all three agent-level variables (xi1[0], xi2[0] and ψi) are defined over the range (0 1), so we need to choose a functional form that produces values that are bounded within this range. An appropriate and flexible choice here is to use a beta probability density function, Beta(α, β), where α, β > 0. In this case, we introduce new pairs of parameters, α and β, for each of the variables that need to be described at the agent level; in total we introduce six new parameters into our model, but have escaped from having to calibrate each individual agent’s values. 2.2.3 Calibration parameters We are now ready to specify the set of calibration parameters for the model. For convenience, we will denote these parameters using vector notation as θ and define ACS6132 Agent-based modelling and multi-agent systems Lecture 8: Model calibration and validation I 2 Robin Purshouse ACSE, University of Sheffield Calibration Model Definition θ1 ω Weighting between beliefs and desires in agent decision-making θ2 τ Threshold used in agent decision-making θ3 αψ Hyperparameter used to define individual agents’ susceptibilities θ4 βψ Hyperparameter used to define individual agents’ susceptibilities θ5 αx1 Hyperparameter used to define individual agents’ initial beliefs θ6 βx1 Hyperparameter used to define individual agents’ initial beliefs θ7 αx2 Hyperparameter used to define individual agents’ initial desires θ8 βx2 Hyperparameter used to define individual agents’ initial desires Table 1: Calibration parameters for the simple agent-based model the vector according to Table 1. Our aim now is to find suitable values for these parameters through a process of model calibration. 3 Model calibration in a Bayesian framework 3.1 A probabilistic representation In the Bayesian framework, we represent all variables using probability distributions – where the probabilities reflect our subjective beliefs about the values of the variables. Note that this is quite different to non-Bayesian estimation methods, where values are assumed to take on single values (i.e. point estimates). An advantage of the Bayesian approach is that we also get a probability distribution for our model outputs, p(z). We can then use summary statistics of this distribution (e.g. expected value or quantiles) to report on how the model behaves in a way that accounts for the uncertainty in our modelling. Example results for such a probability-based approach are shown in Figure 1, for a recent published attempt to calibrate an agent-based model of earthworm ecological dynamics. Outputs for six different model runs are shown, corresponding to different experiments performed using a population of earthworms (the output relates to the average mass of the earthworms). Experimental observations are shown by the open circles; the black line corresponds to model parameters chosen from the scientific literature; the grey lines correspond to the calibrated parameter settings (with the most likely parameterisation shown in darker grey). Note that it can be quite difficult to achieve good calibrations of agent-based models, mostly due to issues with how well the underlying model structures actually represent the system of interest – we will return to this in later lectures. 3.2 Bayes’ theorem One of the most important concepts in probability theory is the notion of conditional probability, which states that the probability that two events A and B both occur is equal to the probability that A occurs given that B has occurred multiplied by the ACS6132 Agent-based modelling and multi-agent systems Lecture 8: Model calibration and validation I 3 Robin Purshouse ACSE, University of Sheffield 186 E. van der Vaart et al. / Ecological Modelling 312 (2015) 182–190 Fig. 4. Fits of the model to the empirical data. The thick black and grey lines are the result of averaging 100 runs each, using Johnston et al. (2014) literature values and ABC’s best values, respectively, where “ABC’s best values” refers to the parameters belonging to the best-fitting run. The open circles are the empirical data, and the semi-transparent grey lines are the ‘posterior predictive check’, i.e. the output of 100 new simulations using random samples from the accepted runs. Arrows mark the days when food was either added (↑) or removed (↓). (a) Experiment 1 (Gunadi et al., 2002); (d) Experiment 2 (Gunadi and Edwards, 2003); (b, e) Experiment 3 (Reinecke and Viljoen, 1990, ‘variable condition’); (c, f) Experiment 4 (Reinecke and Viljoen, 1990, ‘constant condition’). R2 is the proportion of variance explained, a measure of goodness of fit (Eq. (7)). fit well; the same holds for the maximum size achieved by the earthworms in Experiment 4 (Panel C). To check the accuracy of ABC’s estimates, we performed two kinds of quality control: Cross-validation (Csillery et al., 2012) and coverage (Prangle et al., 2013). Both methods use some subset of model outputs as “pseudo-data” and then use the remaining runs to do ABC. Because the parameter values that produced the “pseudo-data” are known, this makes it possible to check whether ABC is accurately estimating them. There are two main differences between cross-validation and coverage: Firstly; cross-validation uses randomly selected model outputs as “pseudo-data” while fol- lowing Prangle et al. (2013) coverage uses the 100 ‘best’ runs; i.e. those with the smallest error according to Eq. (5); secondly, cross- validation summarises ABC’s posteriors by a point estimate–in our case, the median accepted value–while coverage looks at the accu- racy of the posteriors as a whole. To carry out cross-validation, we set aside 100 random out- puts of the model, and then performed ABC using the remaining runs (see also Fig. S4). The medians of the accepted runs resulting from this procedure were mostly accurate, as shown in Fig. 5. In particular, all parameters with narrowed posteriors (most strongly Mb, Mm, and rB, but also E, IGm, Mp, and rm; Fig. 3) were also ade- quately estimated in this cross-validation. Conversely, four of the estimates of the parameters with posteriors that were not nar- rowed did not correlate with their true values at all (B0, Ef, Es and s). This means that ABC could not identify their values even when it was using ‘pseudo-empirical data’ generated by the model itself. That leaves three parameters (Ec, h and Mc) which were estimated reasonably well from the simulated data but not from the empirical data. These three parameters, Ec, h and Mc, were found to correlate significantly with other parameters in ABC’s accepted runs (Fig. 6). Ec and rm correlated with each other, while rm also correlated with Mc; these three parameters all regulate reproduction. Also, h, the saturation coefficient, correlated with IGm, the maximum ingestion rate; these parameters both affect energy intake. Thus, although the posteriors of Ec, h and Mc were not narrowed when considered independently, they were narrowed in relationship to each other. To estimate coverage, following Prangle et al. (2013), we took the model outputs of the 100 best runs as “pseudo-data” and then used the remaining runs to generate posterior parameter dis- tributions using ABC. For each posterior parameter distribution so obtained, we calculated the relative frequency, p, of accepted parameter values that were less than the value actually used in that run (see also Fig. S5). The histograms of these p-values are shown in Fig. 7. In most cases (B0, E, Ec, Ef, Es, Mc, Mp, rm, and s) the distributions are uniform, indicating that the corresponding poste- riors are accurately estimated; therefore, ‘coverage’ is said to hold. However, this does not mean that the empirical data was necessar- ily informative with respect to estimating these parameters; when the empirical data is uninformative, ABC will return the prior dis- tributions, and these will produce uniform coverage plots. In our analyses B0, Ef, Es and s had uniform coverage (Fig. 6), were not nar- rowed from their priors (Fig. 2), and could not be estimated from the data (Fig. 4). We conclude that the data are uninformative for these four parameters. In the coverage plots of h and IGm, there is an excess of p-values on the right-hand side in Fig. 7, indicating that ABC generally under- estimates these parameters. The remaining non-uniform coverage plots (Mb, Mm, and rB) show deficits of p-values in the tails of the his- tograms, indicating that the estimated posterior distributions are broader than they should be. In other words, testing based on these estimated posteriors will be conservative. Overall, we conclude that h and IGm were underestimated, that the credible intervals of the other eight parameters for which ABC provided information were either accurately or conservatively estimated, and that the Figure 1: Calibrated model outputs (grey and black lines) and empirical observations (white circles) for an agent-based mod l of earthworm ecology taken from va d r Vaart et al.’s Calibration and evaluation of individual-based models using Approximate Bayesian Computation, 2015) ACS6132 Agent-based modelling and multi-agent systems Lecture 8: Model calibratio and validation I 4 Robin Purshouse ACSE, University of Sheffield probability that B occurs: P (A ∩B) = P (A | B)P (B) (4) Since the ordering of events is unimportant in determining the joint probability, we can also write: P (A ∩B) = P (B | A)P (A) (5) Bringing Equation 4 and Equation 5 together, we have: P (A | B) = P (B | A)P (A) P (B) (6) which is known as Bayes’ theorem. Rather than thinking about events A and B we can also think about model pa- rameters θ and observed data x. We can then write Equation 6 in terms of density functions as: f(θ | x) = f(x | θ)f(θ) f(x) (7) Equation 7 lies at the heart of Bayesian model calibration. Examining each of its terms: • f(θ) is known as the prior probability distribution of the model parameters θ. It captures all of the knowledge available about θ without taking account of the observed data x. Note that f(θ) is a joint distribution across all the elements of θ, i.e. θ1, θ2, etc. Since this distribution is usually very difficult to specify, we often assume that many of our parameters are conditionally independent, e.g. f(θ1 | θ2) = f(θ1), unless there is strong evidence to think otherwise. • f(x | θ) is known as the likelihood function, which expresses the probability of generating data x from a model with parameters θ. The likelihood function is usually looked at from the opposite perspective – as the plausibility of the parameters θ given that the data x has been observed, which is written L(θ | x). • f(x) is known as the normalising constant and can also be written as the marginal distribution of the data x when the numerator of Equation 7 is integrated across the set, Θ, of all possible values of θ – i.e. f(x) = ∫ Θ f(x | θ)f(θ) dθ. • f(θ | x) is known as the posterior probability distribution of the model parame- ters θ given our observations x. So if we want to use empirical data to calibrate our model parameters then this is the term we are interested in computing! Once we have obtained our posterior distribution for the model parameters, we can now use them in our model to obtain the probability distribution of the calibrated model’s outputs z: p(z | x) = ∫ Θ p(z | θ) f(θ | x) dθ (8) ACS6132 Agent-based modelling and multi-agent systems Lecture 8: Model calibration and validation I 5 Robin Purshouse ACSE, University of Sheffield 3.3 Working with Bayes’ theorem Despite its elegance, Equation 7 turns out not to be particularly straightforward to use for calibrating simulations like ABMs, principally because the likelihood for a simu- lation cannot be written as a density function. Before looking at methods designed specifically for simulations, we will first look at how Bayesian statisticians would deal with more straightforward models based directly on density functions. 3.3.1 Conjugate priors In an ideal world, it would be possible to derive an analytical expression for the left- hand side of Equation 7. One way to do this is to specify a conjugate prior that can be combined with a likelihood function to produce a posterior that is of the same type of density function as the prior. Such an approach also avoids having to compute the normalising constant in the equation, since we are already guaranteed that the posterior will integrate to unity (as required for the latter to be a proper probability density function). The potential downside to this approach is that the conjugate prior may not permit an adequate description of our prior beliefs. As an illustrative case, consider the situation in which we specify our model as an exponential distribution (which would be suitable if, for example, we were modelling times between events). The exponential density function, defining the probability of x given parameter θ, is given by: f(x | θ) = θe−θx (9) which given a set of n data observations, X, has the corresponding likelihood: L(θ | X) = θne−θ ∑n i=1 xi (10) Now consider a prior for parameter θ based on the Gamma density function: f(θ | α, β) = 1 Γ(α) βαθα−1e−βθ (11) where Γ(.) is the Gamma function, which helps ensure that the density function integrates to unity. Using Bayes’ theorem in its proportional form (i.e. without the denominator) we have: f(θ | X) ∝ L(θ | X)f(θ | α, β) ∝ θne−θ ∑n i=1 xi 1 Γ(α) βαθα−1e−βθ ∝ 1 Γ(α) βαθα+n−1e−θ(β+ ∑n i=1 xi) (12) So our posterior distribution for θ takes the form of a new Gamma distribution with parameters (α + n) and (β + ∑n i=1 xi). The normalising constant is not yet properly ACS6132 Agent-based modelling and multi-agent systems Lecture 8: Model calibration and validation I 6 Robin Purshouse ACSE, University of Sheffield represented, but we know what such a constant needs to look like for a Gamma distribution – it will need to be ( (β + ∑n i=1 xi) (α+n) ) / (Γ(α+ n)). 3.3.2 Markov chain Monte Carlo Often the restrictions posed by conjugate priors are too severe for our modelling work and we need to use combinations of prior and likelihood that cannot produce an ana- lytical description of the posterior. In this case, we can use computational approaches (known as Markov chain Monte Carlo – MCMC) to develop a purely sample-based representation of the posterior. The most well-known MCMC technique is the Metropolis-Hastings algorithm. In this approach, we define a Markov chain that generates a sequence of states, θ[1], θ[2], . . . using the following transition rule: 1. Generate a candidate value θ? from a proposal density q(θ? | θ[k]). 2. Compute the acceptance ratio a(θ?, θ[k]) using: a(θ?, θ[k]) = min ( 1, f(θ? | X) f(θ[k] | X) q(θ | θ?) q(θ? | θ) ) (13) 3. Sample u from U(0, 1). 4. Update the Markov chain according to: θ[k + 1] = { θ? u < a(θ?, θ[k]) θ otherwise. (14) 5. Return to 1. Given a suitable choice of proposal density, the Markov chain will eventually (after a burn-in period) start to generate samples from the posterior distribution. Note that if we choose a symmetric distribution (such as a normal or a uniform distribution) then q(θ, θ?) = q(θ?, θ) and these terms cancel in Equation 13. Noting that the denominator in Equation 7 will also get cancelled (since it does not involve consideration of θ), we can re-write the acceptance ratio equation as: a(θ?, θ[k]) = min ( 1, f(X | θ?)f(θ?) f(X | θ[k])f(θ[k]) ) (15) Choice of proposal density can substantially affect the speed with which the Metropolis- Hastings algorithm will converge on a representation of the posterior distribution. As- suming we are prepared to wait long enough, it is sufficient to choose a proposal density that is able to reach all the possible values of θ – i.e. Θ – from any point on the chain with non-zero probability. Typically, we choose a distribution that is easy to sample from, such as a normal distribution. ACS6132 Agent-based modelling and multi-agent systems Lecture 8: Model calibration and validation I 7 Robin Purshouse ACSE, University of Sheffield It is important to note that successive samples from the posterior generated by Metropolis-Hastings are not independent; therefore shuffling is required in applica- tions where the sequencing of samples from the posterior is important (e.g. most discrete-event simulations). 4 Bayesian methods for calibration of agent-based models The methods described in the preceding section cannot be directly applied to calibra- tion of simulation models because the likelihood function f(X | θ) is not directly com- putable. Here, we introduce a family of methods, known as Approximate Bayesian Computation (ABC), that have been designed expressly for simulation calibration. We will focus on three methods in particular: • Rejection sampling; • Markov chain Monte Carlo; • History matching. In all ABC approaches, we consider the ABM to be a data generating process. We sample from our prior distributions for θ, run the model with these values, and store the data that gets generated (the model outputs). We then compare the outputs from the model to equivalent empirical observations. Returning to the simple ABM from Section 2, at each time step k the model pro- duces a behaviour yi[k] for each individual agent i in the model. When calibrating ABMs we tend to focus on population summary measures (such as central tenden- cies and dispersions) rather than trying to capture the complete distribution of agent behaviours. In this case, we will compute the prevalence of behaviour y in the popu- lation: zˆ[k] = 1 N N∑ i=1 yi[k] (16) where N is the number of agents in the model. We will now assume that empirical data is available that also calculates the preva- lence of the behaviour over time – we will denote these observations as z[k]. 4.1 Rejection sampling 4.1.1 Sampling from the priors In the rejection sampling approach, a large number of samples from θ are drawn, e.g. 106 samples. Different approaches can be used to perform this sampling – here we focus on a popular approach called Latin Hypercube sampling (LHS). In this type of sampling, a decision is first made on how many samples to take (e.g. M = 106). Each parameter θ is divided into M equally probable intervals. Considering ACS6132 Agent-based modelling and multi-agent systems Lecture 8: Model calibration and validation I 8 Robin Purshouse ACSE, University of Sheffield the resulting hypergrid in θ-space, the aim of the procedure is to take exactly one sample along each row and column of the grid. The samples are chosen iteratively by remembering which rows and columns have been sampled from already and limiting future selections to the remaining non-sampled options. 4.1.2 Calculating distances between modelled and observed data All ABC algorithms replace the likelihood function with an estimated likelihood based on the difference between data generated by the model and equivalent data observed from the real system, d(zˆ, z). Since an ABM essentially generates time series data across K observations in time, popular distance measures are root-mean-square- error (RMSE) and mean-absolute-error (MAE), shown in Equation 17 and Equation 18 respectively. dRSME(zˆ, z) = √√√√ 1 K K∑ k=1 (zˆ[k]− z[k])2 (17) dMAE(zˆ, z) = 1 K K∑ k=1 |zˆ[k]− z[k]| (18) In the case where z is a vector of outputs, it is common for the different outputs to be measured on different scales. In order to compute a distance measure that combines outputs, it is then usual practice to normalise the observations in terms of units of standard deviation. 4.1.3 The algorithm 1. Sample from the priors s = 1 . . .M times and run the ABM to generate M sets of model outputs over K time periods. 2. Calculate the distance metric d(zˆs, z) for each set of outputs. 3. Accept θs if d(zˆs, z) ≤ The size of the rejection threshold reflects a trade-off between computability and accuracy. As → ∞, we tend to represent the prior; as → 0 we tend to represent the posterior. An alternative to specifying directly is to take a proportion of samples (e.g. 0.01%) that have the lowest distance measures. 4.2 Markov chain Monte Carlo Rejection sampling can be very inefficient, over-exploring areas of parameter space that have a very low estimated likelihood. This is a particular problem if the sim- ulation is slow and computational resources are scarce (a common problem when running ABMs!). An alternative approach is to sample sequentially and make use of an adapted Metropolis-Hastings algorithm (note that k here refers to the internal time steps of the algorithm, rather than time steps of the model): ACS6132 Agent-based modelling and multi-agent systems Lecture 8: Model calibration and validation I 9 Robin Purshouse ACSE, University of Sheffield 1. Generate a sample θ? from a proposal density q(θ? | θ[k]). 2. Run the model with θ? and generate output zˆ. 3. Calculate the distance metric d(zˆ, z) 4. Compute the acceptance ratio a(θ?,θ[k]) using: a(θ?,θ[k]) = { min ( 1, f(θ ?) f(θ[k]) q(θ|θ?) q(θ?|θ) ) d(zˆ, z) ≤ 0 otherwise. (19) 5. Sample u from U(0, 1). 6. Update the Markov chain according to: θ[k + 1] = { θ? u < a(θ?,θ[k]) θ otherwise. (20) 7. Return to 1. Like the original algorithm, this adapted Metropolis-Hastings method will eventually converge to the posterior distribution if the proposal density is defined appropriately – in this case the distribution will only be an approximation, with accuracy determined by the rejection threshold . 4.3 History matching A related approach to both of the above methods is history matching. This approach attempts to progressively narrow down on the posterior distribution in a series of chunky steps. The distance measure is defined using a metric known as implausi- bility, which is computed on each component of z (these could be different outputs, the same output at different points in time, or both). Considering component j: Ij(θ s) = |zˆj(θs)− zj |√ Var(zˆj(θs)) (21) If zj is observed with error, j , we can also include the observation error variance in the denominator: Ij(θ s) = |zˆj(θs)− zj |√ Var(zˆj(θs)) + Var(j) (22) A wave of the algorithm is then conducted using: 1. Sample from the priors s = 1 . . .M times and run the ABM to generate M sets of model outputs over K time periods. ACS6132 Agent-based modelling and multi-agent systems Lecture 8: Model calibration and validation I 10 Robin Purshouse ACSE, University of Sheffield how much information is in the data regarding the simulator’s parameters and gives guidance as to what fresh data might be needed to improve our system understanding. Furthermore, it is unaffected by multiple modes or correlation ridges in the posterior, which are typical manifestations of identifiability issues and plague other calibration methods, such as those based on MCMC. In models of the complexity and the dimensionality such as the one studied here, we would argue against reporting a single best estimate of the parameters, as there is always likely to be some uncertainty, that will be missed out by a single best estimate. Additionally, lack of identifiability does not imply that a history matched model is not useful. When using the history matched model to make a prediction, we would run it at a range of inputs Fig. 8. Simulator output (male and female HIV prevalence) in waves 1, 4, 7 and 10. The black lines show the average observed HIV prevalence with 95% credible ranges. doi:10.1371/journal.pcbi.1003968.g008 Fig. 9. Convergence of the simulator’s output to the empirical data with successive waves of history matching. Each of the 18 panels shows the range of the target data (horizontal region) and the simulator’s output in waves 1 (red), 4 (yellow), 7 (blue) and 10 (green) (left to right along the x-axis). doi:10.1371/journal.pcbi.1003968.g009 History Matching of Complex Infectious Disease Models PLOS Computational Biology | www.ploscompbiol.org 15 January 2015 | Volume 11 | Issue 1 | e1003968 Figure 2: Waves of history matching results for an agent-based model of HIV epidemi- ology (taken from Andrianakis et al.’s Models Using Emulation: A Tutorial and Case Study on HIV in Uganda, 2015) 2. Compute the implausibility for each component Ij(θs) for each sample using Equation 21. 3. Reject θs as implausible if Ij(θs) ≥ 3 for any component j. History matching is done in sequential waves. Once a wave has been com- pleted, we identify the non-implausible regions of the previous sample space. We then re-perform history matching in this parameter sub-space, identify the new non- implausible region, and continue in this fashion. Note that history matching, on its own, does not give us a posterior distribution over the parameters. Once regions of concentrated non-implausible parameterisa- tions have been identified, these can be used to trigger a MCMC scheme such as that described in Section 4.2 that can more accurately describe the true posterior for θ. Example results from a reported history matching process of an agent-based model are shown in Figure 2. The results from different waves are shown by the different colours (see electronic version of these notes). Observational data (includ- ing non-implausible bands) are shown in black. In the early stages of the calibration process, many parameterisations of the model (as sampled from the prior f(θ)) are highly implausible. Over ten waves of history matching, the remaining sub-space con- tains a high density of non-implausible parameterisations. ACS6132 Agent-based modelling and multi-agent systems Lecture 8: Model calibration and validation I 11 欢迎咨询51作业君