Main Examination period 2020 – January – Semester A MTH6102: Bayesian Statistical Methods Duration: 2 hours Apart from this page, you are not permitted to read the contents of this question paper until instructed to do so by an invigilator. You should attempt ALL questions. Marks available are shown next to the questions. Only non-programmable calculators that have been approved from the college list of non-programmable calculators are permitted in this examination. Please state on your answer book the name and type of machine used. Complete all rough work in the answer book and cross through any work that is not to be assessed. Possession of unauthorised material at any time when under examination conditions is an assessment offence and can lead to expulsion from QMUL. Check now to ensure you do not have any unauthorised notes, mobile phones, smartwatches or unauthorised electronic devices on your person. If you do, raise your hand and give them to an invigilator immediately. It is also an offence to have any writing of any kind on your person, including on your body. If you are found to have hidden unauthorised material elsewhere, including toilets and cloakrooms, it will be treated as being found in your possession. Unauthorised material found on your mobile phone or other electronic device will be considered the same as being in possession of paper notes. A mobile phone that causes a disruption in the exam is also an assessment offence. Exam papers must not be removed from the examination room. Examiners: J. Griffin, L. Pettit © Queen Mary University of London (2020) Turn Over Page 2 MTH6102 (2020) Question 1 [12 marks]. A box contains m= 5 balls, of which r are red and the rest black. The unknown quantity is r. Our prior distribution is that each value r = 0,1, . . . ,m has equal probability. We are told that twice, a ball was taken out and immediately replaced, and both times the ball was red. (a) Write down the likelihood for the observed data. What is the maximum likelihood estimate for r? [4] (b) Derive the normalized posterior distribution for r. What is the posterior mean for r? [5] (c) Find the posterior predictive probability that if another ball is taken from the box, it is black. [3] Question 2 [34 marks]. A biased coin with probability q of landing heads is repeatedly tossed until the first head is seen. The number of tails X before the first head is modelled as a geometric distribution with probability mass function P(X = x) = q(1−q)x. The experiment was repeated n times and x1, x2, . . . , xn tails were observed. (a) Write down the likelihood for q. Show that the maximum likelihood estimate for q is qˆ= n n+ S , where S = n∑ i=1 xi. [6] (b) Find the Fisher information and hence the asymptotic variance for qˆ. [5] (c) A Beta(α0,β0) distribution is chosen as the prior distribution for q. Show that the posterior distribution is Beta(α1,β1), where you should determine α1 and β1. [6] (d) We have n= 5 and observed data x1, . . . , xn = 4,2,5,6,3. (i) What is the maximum likelihood estimate qˆ? [3] (ii) Find an approximate 95% confidence interval for q. [4] (iii) Before seeing the data, our probability distribution for q has mean 0.4 and standard deviation 0.2. Find values of α0 and β0 corresponding to this belief. What is then the posterior distribution for q? What is the posterior mean? [8] (iv) Comment on the posterior mean compared to the maximum likelihood estimate and the prior mean for this example. No further calculations or formulae are needed here. [2] © Queen Mary University of London (2020) MTH6102 (2020) Page 3 Question 3 [26 marks]. We want to estimate a single unknown parameter θ in a certain model. Assume that in R we have defined a function log post to calculate the log of the unnormalized posterior density as a function of θ. This function and the data y being analysed are not shown in the code extract below. The posterior density is p(θ | y). Consider the following R code: nb = 1000 nm = 10000 theta = vector(length=nm) s = 0.4 theta0 = 2 log post0 = log post(theta0) for(i in 1:(nb+nm)){ theta1 = rnorm(1, mean=theta0, sd=s) log post1 = log post(theta1) if(log(runif(1)) < log post1-log post0){ theta0 = theta1 log post0 = log post1 } if(i>nb) theta[i-nb] = theta0 } stheta = sort(theta) stheta[nm/2] stheta[nm*0.025] stheta[nm*0.975] Except where stated, an explanation in words is all that is needed for this question. (a) What is the name of the algorithm that the code is carrying out? [3] (b) Explain what the command theta1 = rnorm(1, mean=theta0, sd=s) is doing in the context of the algorithm. [4] (c) Explain what the command if(log(runif(1)) < log post1-log post0) is doing in the context of the algorithm. In your answer, include a formula involving p(θ | y) that the code is implementing. [5] (d) What are the effects on the behaviour of the algorithm of making the variable called s smaller? What are the effects of making it larger? [4] (e) What is the purpose of the variable called nb? [2] (f) When the code has run, what will the vector theta contain? [2] (g) In statistical terms, what will the command stheta[nm/2] output? [2] (h) In statistical terms, what will the last two lines of code output? [4] © Queen Mary University of London (2020) Turn Over Page 4 MTH6102 (2020) Question 4 [17 marks]. The observed data y= {yi j : i= 1, . . . ,n, j= 1, . . . ,mi} are the recorded counts of a disease in district j within county i. The population of each district is Ni j. The following hierarchical model is considered reasonable yi j ∼ Poisson(λiNi j), j= 1, . . . ,mi λi ∼ Gamma(α,β), i= 1, . . . ,n. α and β are unknown parameters which are given a prior distribution p(α,β). Suppose that we have generated a sample of size M from the joint posterior distribution p(α,β,λ1, . . . ,λn | y). (a) How would we obtain a sample from the marginal posterior distribution p(α,β | y) using the joint posterior sample? How would we estimate the posterior mean for α/β? [5] (b) Explain how to generate a sample from the posterior predictive distribution of the disease count for a district not in our dataset with population P, in each of the following two cases: if the county containing the district is in our dataset; or if the county is not in our dataset. In the latter case, how would we estimate the posterior predictive probability that the disease count in this district will be zero? [8] (c) Give two reasons why in general we might want to use a hierarchical model instead of a single-level model. [4] Question 5 [11 marks]. Two models M1 and M2 are under consideration, with corresponding parameters θ and ψ. θ is a single parameter with unbounded range. For the prior distribution p(θ | M1), we assign a normal distribution N(0,σ2) with an extremely large value of σ so that the prior is practically flat over the range supported by the likelihood. We also assign a prior distribution p(ψ | M2). The observed data is y. (a) State the formula for the Bayes factor B12 for comparing the models, in which large values of B12 favour model M1. [5] (b) For inference conditional upon model M1, what is the effect on the posterior mean for θ if we replace σ with 1000σ in p(θ | M1)? [3] (c) What is the effect on B12 if we replace σ with 1000σ in p(θ | M1)? [3] End of Paper – An appendix of 1 page follows. © Queen Mary University of London (2020) MTH6102 (2020) Page 5 Appendix: common distributions For each distribution, x is the random quantity and the other symbols are parameters. Discrete distributions Distribution Probability mass function Range of parameters and variates Mean Variance Binomial ( n x ) qx(1−q)n−x 0 ≤ q ≤ 1x= 0,1, . . . ,n nq nq(1−q) Poisson λxe−λ x! λ > 0 x= 0,1,2, . . . λ λ Geometric q(1−q)x 0 < q ≤ 1x= 0,1,2, . . . (1−q) q (1−q) q2 Negative binomial ( r+ x−1 x ) qr(1−q)x 0 < q ≤ 1, r > 0x= 0,1,2, . . . r(1−q) q r(1−q) q2 Continuous distributions Distribution Probability density function Range of parameters and variates Mean Variance Uniform 1 b−a −∞ < a < b a < x < b a+ b 2 (b−a)2 12 Normal N(µ,σ2) 1√ 2piσ2 exp ( −(x−µ) 2 2σ2 ) −∞ < µ 0 −∞ < x 2 The 95th and 97.5th percentiles of the standard N(0,1) distribution are 1.64 and 1.96, respectively. Normal No(µ,τ) √ τ√ 2pi exp ( −τ(x−µ) 2 2 ) −∞ < µ 0 −∞ < x τ−1 (precision τ) Exponential λe−λx λ > 0x > 0 1 λ 1 λ2 Gamma βαxα−1e−βx Γ(α) α > 0,β > 0 x > 0 α β α β2 Beta Γ(α+β) Γ(α)Γ(β) xα−1(1− x)β−1 α > 0,β > 00 < x < 1 α α+β αβ (α+β)2(α+β+ 1) End of Appendix. © Queen Mary University of London (2020) 欢迎咨询51作业君