- June 30, 2020

Chapter 6. Models for Multi-categorical Responses: Multivariate Extensions of GLM MAST90139 Statistical Modelling for Data Science Slides Guoqi Qian SCHOOL OF MATHEMATICS AND STATISTICS THE UNIVERSITY OF MELBOURNE Multivariate GLM (Ch6) Guoqi Qian 1 / 143 Outline 1 §6.1 Four examples §6.1.1 Example 1. Caesarian birth study §6.1.2 Example 2. Breathing test results §6.1.3 Example 3. Visual impairment study §6.1.4 Example 4. Ohio children respiratory infection 2 §6.2 Multicategorical response models §6.2.1 Multinomial distribution and Data §6.2.2 The multi-categorical logit model §6.2.3 Multivariate generalised linear models (MGLM) 3 §6.3 Models for nominal responses §6.3.1 Principle of maximum random utility §6.3.2 Modelling explanatory variables: choice of design matrix 4 §6.4 Models for ordinal responses §6.4.1 Cumulative (or threshold) models §6.4.2 Extended version of cumulative models §6.4.3 Link functions and design matrices for cumulative models §6.4.4 Sequential models and other approaches 5 §6.5 Statistical inference for MGLM (optional) 6 §6.6 Multivariate models for correlated responses §6.6.1 Conditional models (optional) §6.6.2 Margin models §6.6.3 Marginal models for longitudinal data Multivariate GLM (Ch6) Guoqi Qian 2 / 143 Chapter 6 Outline In this chapter we consider the following multivariate extensions of GLM: 1 One response variable with k > 2 nominal categories; 2 One response variable with k > 2 ordinal categories; 3 Multiple correlated binary response variables. Cases 1 and 2 refer to the polychotomous response which is often modelled by a multinomial distribution. Case 3 is often seen in situations involving repeated measurements or longitudinal data. We first present four examples which will be extensively analysed in this chapter. Multivariate GLM (Ch6) Guoqi Qian 3 / 143 Example 1. Caesarian birth study (1) Table 1: Data on classification of 251 births by caesarian section Caesarian planned Caesarian not planned Infection Infection I II non I II non Antibiotics Risk-factors 0 1 17 4 7 87 No risk-factors 0 0 2 0 0 0 No antibiotics Risk-factors 11 17 30 10 13 3 No risk-factors 4 4 32 0 0 9 Response variable — Infection, with 3 levels (I, II, non) Three dichotomous covariates (factors) 1 CP: caesarian planned or not; 2 RF: risk factors (e.g. diabetes, over-weight, etc.); 3 AB: antibiotics given as a prophylaxis? Multivariate GLM (Ch6) §6.1 Four examples Guoqi Qian 4 / 143 Example 1. Caesarian birth study (2) Aim: Analysing the effects of the covariates on the risk of infection. Note:Response variable is also called dependent variable or outcome variable. Note: Covariate is also called independent variable, explanatory variable, or predictor. The response variable Infection is categorical having > 2 levels. The level values are nominal, but may also be treated as ordinal which makes the analysis even more difficult. Multivariate GLM (Ch6) §6.1 Four examples Guoqi Qian 5 / 143 Example 1. Caesarian birth study (3) The data in Table 1 are grouped data. They can be converted to ungrouped data. Individual CP RF AB Infection 1 1 1 1 II 2 1 1 1 non … … … … … 18 1 1 1 non 19 1 0 1 non 20 1 0 1 non 21 1 1 0 I … … … … … 251 0 0 0 non CP=1 if yes and 0 if no; RF=1 if yes and 0 if no; AB=1 if yes and 0 if no. Multivariate GLM (Ch6) §6.1 Four examples Guoqi Qian 6 / 143 Example 2. Breathing test results Forthofer & Lehnen (1981) considered the effect of age & smoking on breathing test results (BTRs) for workers in industrial plants in Texas. The response variable is BTR, ordinal with K = 3 levels. Age (2 levels) and smoking status (3 levels) are predictors. Table 2: BTS of Houston industrial workers Breathing test results Age Smoking status Normal Borderline Abnormal < 40 Never smoked 577 27 7 Former smoker 192 20 3 Current smoker 682 46 11 40 ∼ 59 Never smoked 164 4 0 Former smoker 145 15 7 Current smoker 245 47 27 Multivariate GLM (Ch6) §6.1 Four examples Guoqi Qian 7 / 143 Example 3. Visual impairment study Table 3: Visual impairment data, from Liang, Zeger & Qaqish (1992) White Black Visual Age impairment 40-50 51-60 61-70 70+ 40-50 51-60 61-70 70+ Total Left eye Yes 15 24 42 139 29 38 50 85 422 No 617 557 789 673 750 574 473 344 4777 Right eye Yes 19 25 48 146 31 37 49 93 448 No 613 556 783 666 748 575 474 336 4751 Vector binary response variable (y1, y2), where y1 = 1 if left-eye impaired, 0 otherwise; y2 = 1 if right-eye impaired, 0 otherwise. Covariates: Age (yrs.), Race (W or B). Aim: find the effect of race and age on visual impairment. Complication: y1 and y2 are correlated. Methods: multivariate models for correlated responses; conditional models; asymmetric models, marginal models, GEE, etc.. Multivariate GLM (Ch6) §6.1 Four examples Guoqi Qian 8 / 143 Example 4. Respiratory infection in Ohio children (1) Table 4: Presence and absence of respiratory infection Mother did not smoke Mother smoked Age of child Frequency Age of child Frequency 7 8 9 10 7 8 9 10 0 0 0 0 237 0 0 0 0 118 0 0 0 1 10 0 0 0 1 6 0 0 1 0 15 0 0 1 0 8 0 0 1 1 4 0 0 1 1 2 0 1 0 0 16 0 1 0 0 11 0 1 0 1 2 0 1 0 1 1 0 1 1 0 7 0 1 1 0 6 0 1 1 1 3 0 1 1 1 4 1 0 0 0 24 1 0 0 0 7 1 0 0 1 3 1 0 0 1 3 1 0 1 0 3 1 0 1 0 3 1 0 1 1 2 1 0 1 1 1 1 1 0 0 6 1 1 0 0 4 1 1 0 1 2 1 1 0 1 2 1 1 1 0 5 1 1 1 0 4 1 1 1 1 1 1 1 1 1 7 Multivariate GLM (Ch6) §6.1 Four examples Guoqi Qian 9 / 143 Example 4. Respiratory infection in Ohio children (2) Data in Table 4 are grouped ones. They can be equivalently expressed as the following ungrouped ones: individual mother smoking status Infection History 1 0 0 0 0 0 ... ... ... ... ... ... 237 0 0 0 0 0 238 0 0 0 0 1 ... ... ... ... ... ... 247 0 0 0 0 1 ... ... ... ... ... ... 531 1 1 1 1 1 ... ... ... ... ... ... 537 1 1 1 1 1 Multivariate GLM (Ch6) §6.1 Four examples Guoqi Qian 10 / 143 Example 4. Respiratory infection in Ohio children (3) Data come from the Harvard Study of Air Pollution and Health (Laird, Beck & Ware, 1984). 537 children were exam annually from age 7 to 10 on the presence or absence of R.I. Response variable: presence/absence of R.I. Repeated measurements of the response variable for each child, regarded as a short time series, may be referred to as longitudinal data. Responses of one child may be correlated. Covariate: mother’s smoking status (regular, non-regular) at the beginning of the study. Aim: Analysing the effect of mother’s smoking on children’s R.I. Methods: Multivariate models for correlated responses, generalised estimating equations (GEEs), generalised linear mixed effects models (GLMM). For examples 1 to 4, statistical modelling techniques and models are needed for performing data analysis and drawing conclusions. Multivariate GLM (Ch6) §6.1 Four examples Guoqi Qian 11 / 143 §6.2.1 Multinomial distribution (1) Multinomial distribution is an important one for multi-categorical response variables. Response variable Y has k levels, labelled 1, 2, · · · , k, i.e. Y ∈ {1, 2, · · · , k}. Y can be represented by y = (y1, · · · , yq)T , q = k − 1, with binary dummy variable yr = { 1 if Y = r 0 otherwise , r = 1, · · · , q. Hence Y = r ⇐⇒ y = (0, · · · , 0, 1︸ ︷︷ ︸ r , 0, · · · , 0)T and Pr(Y = r) = Pr(yr = 1 and yj = 0 for j 6= r). Suppose we have m independent observations of Y : Y1, · · · ,Ym. They can be represented by y1, · · · , ym, with yi = (yi1, · · · , yiq)T , i = 1, · · · ,m; with yir = I (Yi = r). Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 12 / 143 §6.2.1 Multinomial distribution (2) Suppose pir = Pr(Yi = r), r = 1, · · · , k, remain constants for all i = 1, · · · ,m. Note k∑ r=1 pir = 1 and pi = (pi1, · · · , piq)T , q = k − 1. Let y˜ = m∑ i=1 yi = ( m∑ i=1 yi1, · · · , m∑ i=1 yiq )T denoted = (y˜1, · · · , y˜q)T which is a vector of frequencies of Yi = r , r = 1, · · · , q. Then y˜ follows a multinomial distribution M(m,pi) with pmf Pr (y˜ = (m1, · · · ,mq)) = m! m1! · · ·mq!(m−m1−· · ·−mq)!pi m1 1 · · ·pimqq (1−pi1−· · ·−piq)m−m1−···−mq It can be shown that E (y˜) = (mpi1, · · · ,mpiq)T = mpi. Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 13 / 143 §6.2.1 Multinomial distribution (3) It can also be shown that Cov(y˜) = m [ diag(pi)− pipiT ] = mpi1(1−pi1) −mpi1pi2 · · · −mpi1piq −mpi2pi1 mpi2(1−pi2) · · · −mpi2piq ... ... . . . ... −mpiqpi1 −mpiqpi2 · · · mpiq(1−piq) Let y¯ = y˜/m be the vector of relevant frequencies. It can be shown that E (y¯) = (pi1, · · · , piq)T = pi and Cov(y¯) = 1 m [ diag(pi)− pipiT ] = 1 m2 Cov(y˜). Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 14 / 143 §6.2.1 Data (1) Data under analysis are usually of two forms: Ungrouped data Response variable obs. Explanatory variables obs. Unit 1 y11 · · · y1q ... · · · ... yi1 · · · yiq ... · · · ... yn1 · · · ynq = yT1 ... yTi ... yTn n×q , x11 · · · x1L ... · · · ... xi1 · · · xiL ... · · · ... xn1 · · · xnL = xT1 ... xTi ... xTn n×L ... Unit i ... Unit n Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 15 / 143 §6.2.1 Data (2) Grouped data Suppose y (1) i , · · · , y(ni )i are response observations with fixed covariate observations xi = (xi1, · · · , xiL)T . Let y¯i = 1 ni ∑ni j=1 y (j) i be the mean vector over the group-i units, giving the relative frequencies of (Yi = 1, · · · ,Yi = q). Let y˜i = ∑ni j=1 y (j) i be the total vector over the group-i units, giving the frequencies of (Yi = 1, · · · ,Yi = q). Then the grouped data have the following form Group Size Response variable obs. Explanatory variables obs. 1 n1 y∗11 · · · y∗1q ... · · · ... y∗i1 · · · y∗iq ... · · · ... y∗g1 · · · y∗gq = y∗T1 ... y∗Ti ... y∗Tg g×q , x11 · · · x1L ... · · · ... xi1 · · · xiL ... · · · ... xg1 · · · xgL = xT1 ... xTi ... xTg g×L ... ... i ni ... ... g ng where y∗i is either the group mean y¯i or the group total y˜i , i = 1, · · · , g . Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 16 / 143 §6.2.2 The multi-categorical logit model (1) In the multinomial response case pii = µi = E (yi |xi ) is a q × 1 vector. pii = (pii1, · · · , piiq)T . The structure part of the GLM has the form pii = h(Ziβ), where h(·) is a q × 1 vector-valued response function (also called inverse link function); Zi is a q × p design matrix constructed from xi ; β is a p × 1 unknown vector parameter. The q × 1 vector linear predictor for unit i is ηi = Ziβ. And [ ηT1 , , · · · ,ηTn ]T = vec(η1, · · · ,ηn) is the nq × 1 vector linear predictor for all n units. Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 17 / 143 §6.2.2 The multi-categorical logit model (2) A multi-categorical logit model is expressed as piir = P(Yi = r) = exp ( βr0 + zTi βr ) 1 + ∑q s=1 exp ( βs0 + zTi βs ) , r = 1, 2, · · · , q where zi is an (a− 1)× 1 vector from xi . Equivalently, log piir piik = log P(Yi = r) P(Yi = k) = βr0 + z T i βr , r = 1, 2, · · · , q. Note piik = 1− ∑q s=1 piis for all i = 1, · · · , n. We see for this model, h = (h1, · · · , hq)T is a q × 1 vector response function, where for each unit i , hr = hr (ηi ) = hr (Ziβ) = exp(ηir ) 1+ ∑q s=1 exp(ηis) ; r =1, 2, · · ·, q; i =1, · · ·, n where ηir = βr0 + z T i βr and ηi = (ηi1, · · · , ηiq)T . Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 18 / 143 §6.2.2 The multi-categorical logit model (3) We also see for this model, the design matrix for each unit i is Zi = 1 zTi 0 0 T · · · 0 0T 0 0T 1 zTi · · · 0 0T ... ... ... ... . . . ... ... 0 0T 0 0T · · · 1 zTi q×(aq) β = (β10,β T 1 , · · · , βq0,βTq )T is an (aq)× 1 vector parameter. η1... ηn = [ηT1 , · · · ,ηTn ]T = [η11, · · · , η1q, · · · , ηn1, · · · , ηnq]T = [ (Z1β) T , · · · , (Znβ)T ]T = Z1β... Znβ = Z1... Zn (nq)×(aq) β Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 19 / 143 §6.2.2 The multi-categorical logit model (4) Note that the vector link function of this multi-categorical logit model is g = (g1, · · · , gq)T , where for each unit i , gr = gr (pii1, · · · , piiq) = log piir piik = log piiir 1− (pii1 + · · ·+ piiq) ; r = 1, · · · , q and i = 1, · · · , n. It shows that a multivariate GLM with multinomial responses is characterised by two specifications: the vector response function h (or the vector link function g = h−1); the design matrix that depends on the covariates and the model. Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 20 / 143 §6.2.2 Example: Caesarian birth study (1) Consider the data on infection following Caesarian birth given in §6.1. Now we want to distinguish between the two different types of infection. Therefore the response variable Y has 3 possible outcomes: 1 for type I infection, 2 for type II infection, and 3 for no infection. Represent Yi by 2 dummy variables (yi1, yi2), where yi1 = I (the ith Caesarian was followed by infection I) yi2 = I (the ith Caesarian was followed by infection II) There are 3 binary covariates: NoPlan, Antib, RiskF, coded by 3 dummy variables: NoPlan = I (the Caesarian has not been planned) Antib = I (antibiotics were given as a prophylaxis) RiskF = I (risk factors were present) Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 21 / 143 §6.2.2 Example: Caesarian birth study (2) The data can be condensed in the grouped data (group total) form. Size Response variable obs. Explanatory variables obs. (y˜i1, y˜i2) NoPlan, Antib, RiskF Group 1 n1 = 40 4 4 11 17 0 0 0 1 0 0 10 13 4 7 0 0 0 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 1 1 1 Group 2 n2 = 58 Group 3 n3 = 2 Group 4 n4 = 18 Group 5 n5 = 9 Group 6 n6 = 26 Group 7 n7 = 98 Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 22 / 143 §6.2.2 Example: Caesarian birth study (3) We fit the following multi-categorical logit model to the data log pii1 pii3 = log P(infection type I) P(no infection) = β10 + β1NNoPlan + β1AAntib + β1RRiskF log pii2 pii3 = log P(infection type II) P(no infection) = β20 + β2NNoPlan + β2AAntib + β2RRiskF Equivalently, P(infection type j) P(no infection) = eβj0eβjNNoPlaneβjAAntibeβjRRiskF, j = I, II. Thus, the exponential of the parameter gives the factorial contribution to the odds if the corresponding explanatory variable takes value 1 instead of 0. Alternatively, the exponential of the parameter may be seen as odds ratio between odds for value 1 and odds for value 0. Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 23 / 143 §6.2.2 Example: Caesarian birth study (4) For example, for NoPlan one obtains eβjN = P(infection type j |NoPlan=1,Antib,RiskF) P(no infection|NoPlan=1,Antib,RiskF) P(infection type j |NoPlan=0,Antib,RiskF) P(no infection|NoPlan=0,Antib,RiskF) , j = I, II. where Antib and RiskF can take any fixed values. Note the vector-valued link function for this model is g(pii1, pii2) = ( log pii1 1− pii1 − pii2 , log pii2 1− pii1 − pii2 )T Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 24 / 143 §6.2.2 Example: Caesarian birth study (5) The estimates of the parameters and their exponential are given in the following table Table 5: Parameter estimates in Caesarian birth study β exp(β) β exp(β) constant -2.621 0.072 constant -2.560 0.077 NoPlan (I) 1.174 3.235 NoPlan (II) 0.996 2.707 Antib (I) -3.520 0.030 Antib (II) -3.087 0.046 RiskF (I) 1.829 6.228 RiskF (II) 2.195 8.980 NoPlan of 1 vs. 0 increases the type I infection odds by a factor of 3.235. Taking Antib decreases the type I infection OR to 0.030 (from 1). RiskF increases the type I infection odds ratio to 6.228. Similar things can be said of the type II infection odds ratio. Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 25 / 143 §6.2.2 Example: Caesarian birth study (6) Caes.dat is.data.frame(Caes.dat) #TRUE Caes.dat size noInf Inf1 Inf2 NoPlan Antib RiskF 1 40 32 4 4 0 0 0 2 58 30 11 17 0 0 1 3 2 2 0 0 0 1 0 4 18 17 0 1 0 1 1 5 9 9 0 0 1 0 0 6 26 3 10 13 1 0 1 7 98 87 4 7 1 1 1 library(nnet) ##package nnet is used for fitting multinomial log-linear model ##or multi-categorical logit model. Caes1=multinom(as.matrix(Caes.dat[,2:4])~NoPlan+Antib+RiskF,data=Caes.dat) # weights: 15 (8 variable) initial value 275.751684 iter 10 value 161.068578 final value 160.937147 converged Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 26 / 143 §6.2.2 Example: Caesarian birth study (7) Caes1=multinom(as.matrix(Caes.dat[,2:4])~NoPlan+Antib+RiskF,data=Caes.dat) summary(Caes1) Call: multinom(formula=as.matrix(Caes.dat[,2:4]) ~ NoPlan + Antib + RiskF, data=Caes.dat) Coefficients: (Intercept) NoPlan Antib RiskF Inf1 -2.621012 1.1742513 -3.520249 1.829241 Inf2 -2.559917 0.9959762 -3.087160 2.195458 Std. Errors: (Intercept) NoPlan Antib RiskF Inf1 0.5567209 0.5213014 0.6717416 0.6023322 Inf2 0.5462995 0.4813634 0.5498675 0.5869601 Residual Deviance: 321.8743 AIC: 337.8743 attributes(Caes1) $names [1] "n" "nunits" "nconn" "conn" "nsunits" [6] "decay" "entropy" "softmax" "censored" "value" [11] "wts" "convergence" "fitted.values" "residuals" "call" [16] "terms" "weights" "deviance" "rank" "lab" [21] "coefnames" "vcoefnames" "xlevels" "edf" "AIC" $class [1] "multinom" "nnet" Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 27 / 143 §6.2.2 Example: Caesarian birth study (8) Caes1$fitted ##fitted category probabilities noInf Inf1 Inf2 1 0.8695347 0.063240568 0.067224753 2 0.4656329 0.210951192 0.323415876 3 0.9943521 0.002140051 0.003507889 4 0.9568456 0.012827892 0.030326540 5 0.6922135 0.162899513 0.144886971 6 0.2300766 0.337273035 0.432650355 7 0.8855925 0.038416539 0.075990954 Caes1$residuals #resid(Caes1) does the same thing, = category means - fitted category probabilities. noInf Inf1 Inf2 1 -0.069534679 0.036759432 0.032775247 2 0.051608447 -0.021296019 -0.030312428 3 0.005647940 -0.002140051 -0.003507889 4 -0.012401124 -0.012827892 0.025229016 5 0.307786484 -0.162899513 -0.144886971 6 -0.114691994 0.047342349 0.067349645 7 0.002162595 0.002399788 -0.004562382 Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 28 / 143 §6.2.2 Example: Caesarian birth study (9) Caes1$deviance #residual deviance of the model, also given by deviance(Caes1) [1] 321.8743 Caes1$edf [1] 8 predict(Caes1, data.frame(NoPlan=1, Antib=1, RiskF=0), type="probs") noInf Inf1 Inf2 0.983753294 0.006850796 0.009395909 predict(Caes1, data.frame(NoPlan=1, Antib=1, RiskF=0), type="class") [1] noInf Levels: noInf Inf1 Inf2 predict(Caes1, data.frame(NoPlan=1, Antib=1, RiskF=0), type="probs", se.fit=TRUE) noInf Inf1 Inf2 0.983753294 0.006850796 0.009395909 Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 29 / 143 §6.2.2 Example: Caesarian birth study (10) step(Caes1) Start: AIC=337.87 as.matrix(Caes.dat[, 2:4]) ~ NoPlan + Antib + RiskF trying - NoPlan # weights: 12 (6 variable) ................................................................... converged Df AIC 8 337.8743 - NoPlan 6 340.8649 - RiskF 6 358.3103 - Antib 6 397.4663 ###The model ~ NoPlan + Antib + RiskF has the smallest AIC and is the best model. Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 30 / 143 §6.2.2 Example: Caesarian birth study (11) Caes2=multinom(as.matrix(Caes.dat[,2:4])~NoPlan+Antib,data=Caes.dat) summary(Caes2) Coefficients: (Intercept) NoPlan Antib Inf1 -1.431678 1.255595 -2.962218 Inf2 -1.058278 1.086508 -2.484319 Std. Errors: (Intercept) NoPlan Antib Inf1 0.2870129 0.4954738 0.6408664 Inf2 0.2470012 0.4475669 0.5136334 Residual Deviance: 346.3103 AIC: 358.3103 Caes2$edf [1] 6 Caes2$deviance-Caes1$deviance [1] 24.43605 1-pchisq(24.43605,2) [1] 4.940594e-06 ## Hence RiskF has significant effect on the response variable. Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 31 / 143 §6.2.3 Extending Multicategory logit model to MGLM (1) Consider a multivariate q × 1 response variable yi for unit i . Let µi = E (yi |xi ), i = 1, · · · , n. Distributional assumption: The yi ’s given respective xi ’s are independent. Each yi has a distribution belonging to a simple (multivariate) exponential family with pdf f (yi |θi , φ, ωi ) = exp { yTi θi − b(θi ) φ · ωi + c(yi , φ, ωi ) } Structural assumption: µi ’s depends on the linear predictor ηi = Ziβ in the form µi = h(ηi ) = h(Ziβ) = [h1(Ziβ), · · · , hq(Ziβ)]T , i = 1, · · · , n where the response function h : S → M is defined on S ⊂ Rq, taking values in M ⊂ Rq; Zi is a q × p design matrix for unit i ; β = (β1, · · · , βp)T is an unknown parameter vector from B ⊂ Rp. Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 32 / 143 §6.2.3 Extending Multicategory logit model to MGLM (2) For the case of a multi-categorical response, one has to consider the multinomial distribution which may be embedded into the framework of a simple multivariate exponential family. For yi = (yi1, · · · , yiq)T ∼ Multinomial(ni ,pii ), the pmf of y¯i = n−1i yi has the form f (y¯i |θi , φ, ωi ) = exp { y¯Ti θi − b(θi ) φ · ωi + c(y¯i , φ, ωi ) } , i = 1, · · · , g where the natural parameter θi is given by θi = [ log pii1 piik , · · · , log piiq piik ]T ; piik = 1− q∑ j=1 piij ; and b(θi )=− log(1−pii1−· · ·−piiq) = − log piik = log ( 1+eθi1 +· · ·+eθiq); c(yi , φ, ωi ) = log ni ! yi1!···yiq!(ni−yi1−···−yiq)! ; and ωi = ni . Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 33 / 143 §6.3.1 Principle of maximum random utility (1) Models for response variables with unordered categories may be motivated from a consideration of latent variables. In probabilistic choice theory, it is often assumed that an unobserved utility Ur is associated with category r of the response variable Y . Let Ur be a latent variable associated with category r . Now assume Ur = ur + εr , with ur being a fixed value and ε1, · · · , εk i.i.d. having continuous cdf F . Following the principle of maximum random utility Y = r ⇔ Ur = max{U1, · · · ,Uk}, r = 1, · · · , k. Now it follows that P(Y = r) = P(Ur − U1 ≥ 0, · · · ,Ur − Uk ≥ 0) = P(ε1 ≤ ur − u1 + εr , · · · , εk ≤ ur − uk + εr ) = ∫ +∞ −∞ ∏ s 6=r F (ur−us +ε) · f (ε)dε, where f =F ′ is the pdf of εr . Multivariate GLM (Ch6) §6.3 Models for nominal responses Guoqi Qian 34 / 143 §6.3.1 Principle of maximum random utility (2) Different distributional assumption for εr ’s yields different models. If εr ’s are i.i.d. N(0, 1), one gets independent probit model for Y . (This is true if k = 2. But I am not able to prove it when k > 2.) Try the case k = 3. Then Ui ∼ N(ui , 1), i = 1, 2, 3; and U1,U2,U3 are independent. Let W1 = U1 − U2 and W2 = U1 − U3. Then[ W1 W2 ] ∼ N ([ u1 − u2 u1 − u3 ] , [ 2 1 1 2 ]) , with joint pdf f (w1,w2) = 1 2 √ 3pi exp { −1 6 [ w1−(u1−u2) w2−(u1−u3) ]T[ 2 −1 −1 2 ][ w1−(u1−u2) w2−(u1−u3) ]} = 1 2 √ 3pi e{− 13 [(w1−(u1−u2))2−(w1−(u1−u2))(w2−(u1−u3))+(w2−(u1−u3))2]} Then e.g. P(Y =1)=P(U1−U2 ≥ 0,U1−U3 ≥ 0)=P(W1 ≥ 0,W2 ≥ 0) = ∫∞ 0 ∫∞ 0 f (w1,w2)dw1dw2 = ∫∞ 0 1 4 √ pi e −1 4 (w2−(u1−u3))2 [ erf ( 1 2 √ 3 [w2−(u1−u3)+2(u1−u2)] ) +1 ] dw2 where erf(x) = 2√ pi ∫ x 0 e−t 2 dt = 2[Φ( √ 2x)− 12 ]. This does not seem to result in a probit model. Multivariate GLM (Ch6) §6.3 Models for nominal responses Guoqi Qian 35 / 143 §6.3.1 Principle of maximum random utility (3) If ε1, · · · , εk i.i.d. following the extreme-value distribution with cdf F (x) = exp{−e−x} and pdf f (x) = exp{−e−x} exp(−x), then we get the multinomial logit model P(Y = r) = eur∑k s=1 e us = eur−uk 1+ ∑q s=1 e us−uk = e u˜r 1+ ∑q s=1 e u˜s , with u˜r = ur − uk . We see a specific cdf F determines the link or response function of the model. Proof. P(Y = r) = ∫ +∞ −∞ ∏ s 6=r exp{−e−ur+us−ε} · exp{−e−ε} · exp{−ε}dε = ∫ +∞ −∞ e− ∑ s 6=r e us−ur−ε−e−εe−εdε = ∫ +∞ −∞ e−[ ∑ s 6=r e us−ur +1]e−εe−εdε = e−[ ∑ s 6=r e us−ur +1]e−ε∑ s 6=r e us−ur + 1 ∣∣∣∣∣ +∞ −∞ = 1∑ s 6=r e us−ur + 1 = eur∑k s=1 e us . Multivariate GLM (Ch6) §6.3 Models for nominal responses Guoqi Qian 36 / 143 §6.3.2 Choice of design matrix (1) Explanatory variables and their coefficients in a linear predictor function ηi = Ziβ are the terms influencing the response variable. Assume the response variable has multiple categories. Explanatory variables can be classified into 3 types: 1 global: the ones whose values depend on individuals but not the response categories; 2 alternative-specific: the ones whose values depend on the response categories but not the individuals; 3 neither (1) or (2); or both (1) and (2). The coefficient parameters (which always do not depend on individuals) are classified into 1 global: whose values don’t depend on response categories; 2 category-specific: depend on categories. Multivariate GLM (Ch6) §6.3 Models for nominal responses Guoqi Qian 37 / 143 §6.3.2 Choice of design matrix (2) Example. Suppose the mean utility for individual i is uir , satisfying uir = αr0 + z T i αr , r = 1, · · · , k; i = 1, · · · , n. Then zi is global and αr is category-specific. The multinomial logit model becomes P(Yi = r |zi ) = e βr0+zTi βr 1 + ∑q s=1 e βs0+zTi βs where βr0 + zTi βr = u˜ir = uir − uik = (αr0 − αk0) + zTi (αr −αk). Multivariate GLM (Ch6) §6.3 Models for nominal responses Guoqi Qian 38 / 143 §6.3.2 Choice of design matrix (3) Example (continued 1) The associated GLM is pii1… piiq = P(Yi = 1|zi )… P(Yi = q|zi ) = eηi1 1+ ∑q s=1 e ηis … e ηiq 1+ ∑q s=1 e ηis denoted= h(ηi ) where the linear predictor has the form ηi = Ziβ, with the design matrix for individual i being Zi = 1 zTi 0 0 T · · · 0 0T 0 0T 1 zTi · · · 0 0T … … … … . . . … … 0 0T 0 0T · · · 1 zTi q×p and β = (β10,β T 1 , · · · , βq0,βTq )T is a p × 1 parameter vector. Multivariate GLM (Ch6) §6.3 Models for nominal responses Guoqi Qian 39 / 143 §6.3.2 Choice of design matrix (4) Example (continued 2) If uir also depends on some alternative-specific explanatory variables, then uir may be assumed of the form uir = αr0 + z T i αr + w T r γ, r = 1, · · · , k; i = 1, · · · , n where wr is an alternative-specific explanatory vector variable and γ is a global parameter. Then u˜ir = uir − uik = βr0 + zTi βr + (wr −wk)Tγ. The resultant multinomial logit model will have the following design matrix for individual i Zi = 1 zTi 0 0 T · · · 0 0T (w1 −wk)T 0 0T 1 zTi · · · 0 0T (w2 −wk)T … … … … . . . … … … 0 0T 0 0T · · · 1 zTi (wq −wk)T q×p and β = (β10,β T 1 , · · · , βq0,βTq ,γT )T is a p × 1 parameter vector. Multivariate GLM (Ch6) §6.3 Models for nominal responses Guoqi Qian 40 / 143 §6.4 Models for ordinal responses Response variable having k > 2 categories may be of ordinal nature. Example: Breathing test results Ordinal variables may come from quite different mechanisms. Two distinguished types are: grouped continuous and assessed ordered categorical variables. Grouped continuous — merely a categorized continuous variable Assessed ordered — obtained after an assessor’s judgment. Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 41 / 143 §6.4.1 Cumulative models: the threshold approach Suppose there is an unobserved continuous variable U for the response variable Y having k categories. Suppose Y is determined by U in the following way Y = r ⇔ θr−1 < U ≤ θr ; r = 1, · · · , k where −∞ = θ0 < θ1 < · · · < θk = +∞ unknown. That means Y is a coarser (categorized) version of U determined by the thresholds θ1, · · · , θk−1. Suppose U depends on a p × 1 vector explanatory variable x in the following way U = −xTγ + ε where γ = (γ1, · · · , γp)T is an unknown vector parameter and ε is random with cdf F . It follows that P(Y ≤ r |x) = P(U ≤ θr ) = F (θr + xTγ). This is called a cumulative model or threshold model with cdf F . Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 42 / 143 §6.4.1 Cumulative logistic (or proportional odds) model (1) If choose F (x) = [ 1 + e−x ]−1 which is the cdf of a logistic distribution, then P(Y ≤ r |x) = e θr+xTγ 1 + eθr+x Tγ , r = 1, · · · , q = k − 1. (1) It is equivalent to log P(Y ≤ r |x) P(Y > r |x) = θr + x Tγ or (2) P(Y ≤ r |x) P(Y > r |x) = exp{θr + x Tγ} (3) The model given by (1), (2) or (3) is called the cumulative logistic model. Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 43 / 143 §6.4.1 Cumulative logistic (or proportional odds) model (2) The cumulative logistic model is also called a proportional odds model due to following interpretation: Suppose there are 2 populations (groups) identified by x1 and x2 respectively. Then P(Y ≤ r |x1) P(Y > r |x1) = e θr+xT1 γ and P(Y ≤ r |x2) P(Y > r |x2) = e θr+xT2 γ . It follows that cumulative odds ratio = P(Y ≤ r |x1)/P(Y > r |x1) P(Y ≤ r |x2)/P(Y > r |x2) = e (x1−x2)Tγ which is the same for all r = 1, · · · , q. Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 44 / 143 §6.4.1 Grouped Cox (or proportional hazards) model If choose F as the cdf of the extreme minimal-value distribution, i.e. F (x) = 1− exp{−ex}, then P(Y ≤ r |x) = 1− exp{−eθr+xTγ} (4) ⇐⇒ log [− log P(Y > r |x)]︸ ︷︷ ︸ complementary log-log link = θr + x Tγ, r = 1, · · · , q Model (4) is referred to as the grouped Cox model or proportional hazards model due to the following interpretation: For 2 populations (groups) identified by x1 and x2 respectively log P(Y > r |x1) = log P(Y > r |x2) · e(x1−x2)Tγ , r = 1, · · · , q Differentiating w.r.t. r on both sides results in λ(r |x1) = λ(r |x2) · e(x1−x2)Tγ , ⇔ λ(r |x1) λ(r |x2) = e (x1−x2)Tγ , which is the same for all r = 1, · · · , q. Here λ(r |·) = − d dr logP(Y > r |·) is the hazard function. Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 45 / 143 §6.4.1 Cumulative model based on extreme maximal-value CDF If choose F as the cdf of the extreme maximal-value distribution, i.e. F (x) = exp{−e−x}, then P(Y ≤ r |x) = exp{−e−(θr+xTγ)} (5) ⇐⇒ log [− log P(Y ≤ r |x)]︸ ︷︷ ︸ log-log link = −(θr + xTγ), r = 1, · · · , q. (6) Model (5) or (6) is referred to as the extreme maximal-value distribution model. Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 46 / 143 S6.4.2 Extended version of cumulative models (1) Recall that an ordinal categorical response Y with k categories is determined by a latent variable U as Y = r ⇔ θr−1and U = −xTγ + ε, with ε having cdf F . Then a cumulative model or threshold model with cdf F is P(Y ≤ r |x) = P(U ≤ θr ) = F (θr + xTγ). (7) One may assume a linear form for thresholds θ1, · · · , θq, θr = βr0 + w Tβr where w is a vector explanatory variable and βr = (βr1, · · · , βrm)T is a category-specific parameter vector. Then an extended cumulative model is obtained: P(Y ≤ r |x,w) = F (βr0 + wTβr + xTγ). (8) Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 47 / 143 S6.4.2 Extended version of cumulative models (2) Remarks 1 Model (8) becomes unidentifiable if wi = xi for each individual i = 1, · · · , n; i.e., it will not be possible to separate γ from each βr if wi = xi . 2 So wi and xi must not be the same. 3 βr can be global, while γ can be replaced by a category-specific parameter. 4 wi are called threshold variables, while xi are called shift variables. Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 48 / 143 §6.4.3 Link functions for cumulative models For a cumulative GLM, the link function g =(g1, · · · , gq), as a vector function of (pi1 = P(Y =1), · · · , piq = P(Y =q)), is given by gr (pi1, · · · , piq) = F−1(pi1 + · · ·+ pir ), r = 1, · · · , q, where F−1(·) is the inverse function for F that specifies the cumulative model pi1 + · · ·+ pir = P(Y ≤ r |x,w) = F (βr0 + wTβr + xTγ). F being the N(0, 1) cdf gives the probit link gr (pi1, · · · , piq) = Φ−1(pi1 + · · ·+ pir ), r = 1, · · · , q. For proportional odds model, the link is the cumulative logit gr (pi1, · · · , piq) = log pi1 + · · ·+ pir 1− (pi1 + · · ·+ pir ) , r = 1, · · · , q. For proportional hazards model, the link is the complement log-log gr (pi1, · · · , piq) = log{− log(1− (pi1 + · · ·+ pir ))}, r = 1, · · · , q. Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 49 / 143 §6.4.3 Design matrices for cumulative models (1) For the simple cumulative model P(Yi ≤ r |xi ) = F (θr + xTi γ), the design matrix Zi in the linear predictor ηi = Ziβ is given by Zi = 1 xTi 1 xTi . . . … 1 xTi q×(q+dim(xi )) and β = (θ1, · · · , θq,γT )T . Here xi should not contain a constant component. Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 50 / 143 §6.4.3 Design matrices for cumulative models (2) For the extended cumulative model P(Yi ≤ r |xi ,wi ) = F (βr0 + wTi βr + xTi γ), the design matrix Zi in the linear predictor ηi = Ziβ is given by Zi = 1 wTi x T i 1 wTi x T i . . . . . . … 1 wTi x T i q×(q(1+dim(wi ))+dim(xi )) and β = (β10,β T 1 , · · · , βq0,βTq ,γT )T . Here neither wi nor xi should contain a constant component. Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 51 / 143 §6.4.3 Alternative formulation/parameterisation in cumulative models (1) The threshold parameters θ1, · · · , θq in cumulative models P(Y ≤ r |x) = F (θr + xTγ) are restricted by θ1 < θ2 < · · · < θq The corresponding restriction for the extended cumulative model is β10 + w Tβ1 < β20 + w Tβ2 < · · · < βq0 + wTβq which is more complex. If such constraints are not explicitly accounted for in estimation, the iterative estimation procedure may fail due to fitting inadmissible parameters. Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 52 / 143 §6.4.3 Alternative formulation/parameterisation in cumulative models (2) For the simple cumulative model, the problem can be avoided by using an alternative formulation or reparameterisation: α1 = θ1, αr = log(θr − θr−1), r = 2, · · · , q ⇐⇒ θ1 = α1, θr = θ1 + r∑ i=2 eαi , r = 2, · · · , q. The new parameters α1, · · · , αq are unconstrained. The cumulative model under the new formulation has a linear structure of the following form F−1 (P(Y = 1|x)) = α1 + xTγ log [ F−1 (P(Y ≤ r |x))− F−1 (P(Y ≤ r − 1|x))] = αr , r = 2, · · · , q. The link function corresponding to the new formulation can be determined accordingly. Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 53 / 143 §6.4.3 Alternative formulation/parameterisation in cumulative models (3) For the special case of the cumulative logistic model, one obtains the following link function (log-logit link) g1(pi1, · · · , piq) = log ( pi1 1− pi1 ) gr (pi1, · · · , piq) = log [ log { pi1 + · · ·+ pir 1− pi1 − · · · − pir } − log { pi1 + · · ·+ pir−1 1− pi1 − · · · − pir−1 }] , q = 2, · · · , q. If this alternative link function is used, the design matrix Zi has to be adapted. Now Zi = 1 xTi 1 0 . . . ... 1 0 q×(q+dim(xi )) and the parameter vector becomes β = (α1, · · · , αq,γT )T . Here xi should not contain a constant component. Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 54 / 143 §6.4.3 Computing packages and an example The polr() function in the R MASS package can be used to fit cumulative models (but not extended cumulative models). More generally, the ordinal package in R can be used for fitting ordinal regression models. Example 2 in §6.1. Breathing test results. The response variable is BTR, ordinal with k = 3 levels. Age (2 levels) and Smoking status (3 levels) are predictors. The aim is to see how Age and Smoking are associated with BTR. Command polr in R MASS package is used in the analysis. Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 55 / 143 §6.4.3 Breathing testing example (1) Usage of polr: polr(formula, data, weights, start, ..., subset, na.action, contrasts = NULL, Hess = FALSE, model = TRUE, method = c("logistic", "probit", "loglog", "cloglog", "cauchit")) Note the estimates of γ are those from polr multiplied by −1 due to the model used in polr being P(Y ≤ r |x) = F (θr − xTγ). > library(MASS); BTR.dat BTR Age Smoking Freq 1 1Normal <40 1Never 577 2 2Border <40 1Never 27 3 3Abnorm <40 1Never 7 4 1Normal <40 2Former 192 5 2Border <40 2Former 20 6 3Abnorm <40 2Former 3 7 1Normal <40 3Current 682 8 2Border <40 3Current 46 9 3Abnorm <40 3Current 11 10 1Normal 40to59 1Never 164 11 2Border 40to59 1Never 4 12 3Abnorm 40to59 1Never 0 13 1Normal 40to59 2Former 145 14 2Border 40to59 2Former 15 15 3Abnorm 40to59 2Former 7 16 1Normal 40to59 3Current 245 17 2Border 40to59 3Current 47 18 3Abnorm 40to59 3Current 27 Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 56 / 143 §6.4.3 Breathing testing example (2) > is.factor(BTR.dat$BTR) #TRUE; > is.ordered(BTR.dat$BTR) #FALSE > is.factor(BTR.dat$Age) #TRUE; > is.factor(BTR.dat$Smoking) #TRUE > is.factor(BTR.dat$Freq) #FALSE; > BTR.dat$BTR=as.ordered(BTR.dat$BTR) #Set BTR as an ordinal factor > help(polr) > BTR.logit=polr(BTR~Age+Smoking+Age:Smoking, data=BTR.dat, weights=Freq, Hess=T,method=”logistic”) > summary(BTR.logit) Coefficients: Value Std. Error t value Age40to59 -0.8857 0.5359 -1.653 Smoking2Former 0.6973 0.2822 2.471 Smoking3Current 0.3472 0.2238 1.551 Age40to59:Smoking2Former 1.1458 0.6228 1.840 Age40to59:Smoking3Current 2.2007 0.5689 3.868 Intercepts: Value Std. Error t value 1Normal|2Border 2.8334 0.1764 16.0603 2Border|3Abnorm 4.3078 0.2130 20.2210 Residual Deviance: 1564.968 AIC: 1578.968 Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 57 / 143 §6.4.3 Breathing testing example (3) BTR.logit1=polr(BTR~Age+Smoking,data=BTR.dat, weights=Freq,Hess=T,method=”logistic”) summary(BTR.logit1) Coefficients: Value Std. Error t value Age40to59 0.7772 0.1482 5.243 Smoking2Former 0.7815 0.2333 3.350 Smoking3Current 0.9607 0.1918 5.010 Intercepts: Value Std. Error t value 1Normal|2Border 3.1927 0.1749 18.2544 2Border|3Abnorm 4.6543 0.2125 21.9076 Residual Deviance: 1589.744 AIC: 1599.744 anova(BTR.logit, BTR.logit1) Likelihood ratio tests of ordinal regression models Response: BTR Model Resid. df Resid. Dev Test Df LR stat. Pr(Chi) 1 Age + Smoking 2214 1589.744 2 Age + Smoking + Age:Smoking 2212 1564.968 1 vs 2 2 24.77585 4.168e-06 Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 58 / 143 §6.4.3 Breathing testing example (4) > BTR.logit$fitted 1Normal 2Border 3Abnorm 1 0.9444565 0.04225961 0.013283873 2 0.9444565 0.04225961 0.013283873 3 0.9444565 0.04225961 0.013283873 4 0.8943685 0.07930628 0.026325236 5 0.8943685 0.07930628 0.026325236 6 0.8943685 0.07930628 0.026325236 7 0.9231700 0.05813458 0.018695369 8 0.9231700 0.05813458 0.018695369 9 0.9231700 0.05813458 0.018695369 10 0.9763207 0.01815786 0.005521451 11 0.9763207 0.01815786 0.005521451 12 0.9763207 0.01815786 0.005521451 13 0.8671630 0.09895796 0.033879045 14 0.8671630 0.09895796 0.033879045 15 0.8671630 0.09895796 0.033879045 16 0.7633793 0.17036521 0.066255462 17 0.7633793 0.17036521 0.066255462 18 0.7633793 0.17036521 0.066255462 > BTR.logit$residual #does not exist. > BTR.logit$lp #linear predictor values 1 2 3 4 5 6 7 8 9 0.0000000 0.0000000 0.0000000 0.6972823 0.6972823 0.6972823 0.3472245 0.3472245 0.3472245 10 11 12 13 14 15 16 17 18 -0.8857462 -0.8857462 -0.8857462 0.9573393 0.9573393 0.9573393 1.6621466 1.6621466 1.6621466 > attributes(BTR.logit) [1] “coefficients” “zeta” “deviance” “fitted.values” “lev” “terms” [7] “df.residual” “edf” “n” “nobs” “call” “method” [13] “convergence” “niter” “lp” “Hessian” “model” “contrasts” [19] “xlevels” $class [1] “polr” Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 59 / 143 §6.4.3 Breathing testing example (5) On the previous 3 slides we fit two cumulative logistic models to BTR using Age and Smoking: the first (BTR.logit includes the Age:Smokings interaction-effect terms as well as the main-effect terms; while the second (BTR.logit1 includes main-effect terms only. Dummy coding (i.e. contr.treatment coding in R) is used for representing Age and Smoking: Age40to59 = { 1, if age is 40 ∼ 59 0 if age < 40 Smoking2Former = { 1, former smoker 0 else , Smoking3Current = { 1, current smoker 0 else Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 60 / 143 §6.4.3 Breathing testing example (6) The first model can be written as log P(BTR = Normal) P(BTR > Normal) = θ1 + γ1Age40to59 + γ2Smoking2Former + γ3Smoking3Current +γ4Age40to59 · Smoking2Former + γ5Age40to59 · Smoking3Current log P(BTR ≤ Bordering) P(BTR > Bordering) = θ2 + γ1Age40to59 + γ2Smoking2Former + γ3Smoking3Current +γ4Age40to59 · Smoking2Former + γ5Age40to59 · Smoking3Current MLEs of the parameters in the first model are θˆ1 = 2.833, θˆ2 = 4.308, γˆ1 = 0.886, γˆ2 = −0.697, γˆ3 = −0.347, γˆ4 = −1.146, γˆ5 = −2.201 with their standard errors given in the R output. Estimated values of the γ parameters can be interpreted as log cumulative odds ratios. For example, the odds of having a normal test result (or normal or bordering result) for a worker aged 40 to 59 who never smoked is estimated to be e−γˆ2−γˆ4 = e0.697+1.146 = 6.32 times of that for a worker aged 40 to 59 who is a former worker. Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 61 / 143 §6.4.3 Breathing testing example (7) The second model can be written as log P(BTR = Normal) P(BTR > Normal) = θ1 + γ1Age40to59 + γ2Smoking2Former + γ3Smoking3Current log P(BTR ≤ Bordering) P(BTR > Bordering) = θ2 + γ1Age40to59 + γ2Smoking2Former + γ3Smoking3Current MLEs of the parameters in the first model are θˆ1 = 3.193, θˆ2 = 4.654, γˆ1 = −0.777, γˆ2 = −0.782, γˆ3 = −0.961 with their standard errors given in the R output. “H0 : γ4 = γ5 = 0 vs. H1 : not H0” is the hypothesis about the Age:Smoking interaction effects on BTR. A χ2 analysis of deviance test comparing models BTR.logit and textttBTR.logit1 gives an LR statistic of 22.776 and p-value of 4.168× 10−6. Therefore, there is very strong evidence to reject H0. We conclude that Age and Smoking have very strong interaction effects on BTR. Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 62 / 143 §6.4.3 Breathing testing example (8) Linear predictor of the first model is η1 = θ1 + γ1Age40to59 + γ2Smoking2Former + γ3Smoking3Current +γ4Age40to59 · Smoking2Former + γ5Age40to59 · Smoking3Current η2 = θ2 + γ1Age40to59 + γ2Smoking2Former + γ3Smoking3Current +γ4Age40to59 · Smoking2Former + γ5Age40to59 · Smoking3Current However, the linear predictor returned from polr is just xTγ, i.e. γ1Age40to59 + γ2Smoking2Former + γ3Smoking3Current +γ4Age40to59 · Smoking2Former + γ5Age40to59 · Smoking3Current thus has just 6 different values knowing that there are 6 combinations of Age and Smoking. Knowing these tricks, there should be no difficulty to reveal how the fitted values returned by BTR.logit$fitted are obtained. Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 63 / 143 §6.4.3 Breathing testing example (9) Effect coding (i.e. contr.sum coding in R, −1 for last category) may also be used for representing Age and Smoking: Age1 = { 1, if age < 40 −1, if age is 40 ∼ 59 Smoking1 = 1, never smoked0, former smoker−1, current smoker , Smoking2 = 0, never smoked1, former smoker−1, current smoker With this coding, the first model can be written as log P(BTR = Normal) P(BTR > Normal) = θ1 + γ1Age1 + γ2Smoking1 + γ3Smoking2 +γ4Age1 · Smoking1 + γ5Age1 · Smoking2 log P(BTR ≤ Bordering) P(BTR > Bordering) = θ2 + γ1Age1 + γ2Smoking1 + γ3Smoking2 +γ4Age1 · Smoking1 + γ5Age1 · Smoking2 Although the parameter estimates will be different now, other results from the model will be the same as that using dummy coding. Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 64 / 143 §6.4.3 Breathing testing example (10) > options(contrasts = c(“contr.sum”, “contr.poly”)) > BTR.logit=polr(BTR~Age+Smoking+Age:Smoking, data=BTR.dat, weights=Freq, Hess=T,method=”logistic”) > summary(BTR.logit) Coefficients: Value Std. Error t value Age1 -0.11487 0.1086 -1.0580 Smoking1 -0.90596 0.1890 -4.7933 Smoking2 0.36428 0.1421 2.5644 Age1:Smoking1 0.55777 0.1890 2.9512 Age1:Smoking2 -0.01517 0.1421 -0.1068 Intercepts: Value Std. Error t value 1Normal|2Border 2.3704 0.1086 21.8209 2Border|3Abnorm 3.8447 0.1599 24.0486 Residual Deviance: 1564.968 AIC: 1578.968 > BTR.logit1=polr(BTR~Age+Smoking, data=BTR.dat, weights=Freq, Hess=T,method=”logistic”) > anova(BTR.logit, BTR.logit1) Likelihood ratio tests of ordinal regression models Response: BTR Model Resid. df Resid. Dev Test Df LR stat. Pr(Chi) 1 Age + Smoking 2214 1589.744 2 Age + Smoking + Age:Smoking 2212 1564.968 1 vs 2 2 24.77585 4.168618e-06 Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 65 / 143 §6.4.3 Breathing testing example (11) > attributes(BTR.logit) $names [1] “coefficients” “zeta” “deviance” “fitted.values” “lev” “terms” “df.residual” [8] “edf” “n” “nobs” “call” “method” “convergence” “niter” [15] “lp” “Hessian” “model” “contrasts” “xlevels” > BTR.logit$fitted 1Normal 2Border 3Abnorm 1 0.9444565 0.04225911 0.013284361 2 0.9444565 0.04225911 0.013284361 3 0.9444565 0.04225911 0.013284361 4 0.8943660 0.07930714 0.026326874 5 0.8943660 0.07930714 0.026326874 6 0.8943660 0.07930714 0.026326874 7 0.9231672 0.05813603 0.018696801 8 0.9231672 0.05813603 0.018696801 9 0.9231672 0.05813603 0.018696801 10 0.9763222 0.01815653 0.005521309 11 0.9763222 0.01815653 0.005521309 12 0.9763222 0.01815653 0.005521309 13 0.8671585 0.09895993 0.033881540 14 0.8671585 0.09895993 0.033881540 15 0.8671585 0.09895993 0.033881540 16 0.7633676 0.17037058 0.066261785 17 0.7633676 0.17037058 0.066261785 18 0.7633676 0.17037058 0.066261785 > BTR.logit$lp #linear predictor values 1 2 3 4 5 6 7 8 9 10 -0.4630592 -0.4630592 -0.4630592 0.2342498 0.2342498 0.2342498 -0.1157938 -0.1157938 -0.1157938 -1.3488684 11 12 13 14 15 16 17 18 -1.3488684 -1.3488684 0.4943191 0.4943191 0.4943191 1.1991525 1.1991525 1.1991525 Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 66 / 143 §6.4.3 Breathing testing example (12) We also fit a grouped Cox or proportional hazards model. > BTR.ph=polr(BTR~Age+Smoking+Age:Smoking,data=BTR.dat, weights=Freq, Hess=T, method=”cloglog”) > summary(BTR.ph) Coefficients: Value Std. Error t value Age40to59 -0.2865 0.14264 -2.008 Smoking2Former 0.2125 0.10054 2.114 Smoking3Current 0.1088 0.07301 1.490 Age40to59:Smoking2Former 0.4333 0.18941 2.288 Age40to59:Smoking3Current 0.8379 0.16357 5.122 Intercepts: Value Std. Error t value 1Normal|2Border 1.0477 0.0558 18.7612 2Border|3Abnorm 1.5528 0.0629 24.6916 Residual Deviance: 1559.949 AIC: 1573.949 Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 67 / 143 §6.4.3 Breathing testing example (13) We further fit a cumulative model with the extreme maximal-value distribution. > BTR.extm=polr(BTR~Age+Smoking+Age:Smoking,data=BTR.dat,weights=Freq, Hess=T,method=”loglog”) > summary(BTR.extm) Coefficients: Value Std. Error t value Age40to59 -0.8672 0.5286 -1.640 Smoking2Former 0.6755 0.2700 2.501 Smoking3Current 0.3368 0.2167 1.554 Age40to59:Smoking2Former 1.1003 0.6070 1.813 Age40to59:Smoking3Current 2.0724 0.5573 3.719 Intercepts: Value Std. Error t value 1Normal|2Border 2.8614 0.1715 16.6838 2Border|3Abnorm 4.2761 0.2080 20.5605 Residual Deviance: 1566.335 AIC: 1580.335 Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 68 / 143 §6.4.3 Breathing testing example (14) We fit a grouped Cox or proportional hazards model using the effect coding. > options(contrasts = c(“contr.sum”, “contr.poly”)) > BTR.ph=polr(BTR~Age+Smoking+Age:Smoking,data=BTR.dat, weights=Freq, Hess=T, method=”cloglog”) > summary(BTR.ph) Coefficients: Value Std. Error t value Age1 -0.06862 0.03421 -2.00575 Smoking1 -0.31898 0.05366 -5.94408 Smoking2 0.11022 0.04961 2.22168 Age1:Smoking1 0.21188 0.05361 3.95205 Age1:Smoking2 -0.00481 0.04960 -0.09699 Intercepts: Value Std. Error t value 1Normal|2Border 0.8720 0.0353 24.7072 2Border|3Abnorm 1.3771 0.0441 31.2213 Residual Deviance: 1559.949 AIC: 1573.949 Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 69 / 143 §6.4.3 Breathing testing example (15) We finally fit a cumulative model with the extreme maximal-value distribution using the effect coding. > options(contrasts = c(“contr.sum”, “contr.poly”)) > BTR.extm=polr(BTR~Age+Smoking+Age:Smoking,data=BTR.dat,weights=Freq, Hess=T,method=”loglog”) > summary(BTR.extm) Coefficients: Value Std. Error t value Age1 -0.09523 0.1053 -0.904 Smoking1 -0.86618 0.1854 -4.671 Smoking2 0.35943 0.1361 2.642 Age1:Smoking1 0.52877 0.1854 2.852 Age1:Smoking2 -0.02136 0.1361 -0.157 Intercepts: Value Std. Error t value 1Normal|2Border 2.4288 0.1054 23.0536 2Border|3Abnorm 3.8434 0.1573 24.4373 Residual Deviance: 1566.335 AIC: 1580.335 Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 70 / 143 §6.4.4 Motivating example for sequential models In many applications the ordering of the response categories is due to a sequential mechanism. The categories are ordered since they can be reached only successively. Example: Tonsil size. Children have been classified according to their tonsil size and whether or not they are carriers of Streptococcus pyogenes. Table 6: Tonsil size and Streptococcus pyogenes (Holmes & Williams, 1954) Present but not enlarged Enlarged Greatly enlarged Carriers 19 29 24 Noncarriers 497 560 269 It may be assumed that tonsil size always starts in the normal state “present but not enlarged” (category 1). If the tonsils grow abnormally, they may become “enlarged” (category 2); if the process does not stop, tonsils may become “greatly enlarged” (category 3). But in order to get greatly enlarged tonsils, they first have to be enlarged for the duration of the intermediate state “enlarged.” Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 71 / 143 §6.4.4 Sequential models (1) Let latent variables Ur , r = 1, · · · , q with q = k − 1 have the linear form Ur = −xTγ + εr , where εr has cdf F . We assume the ordinal response variable Y is determined by Ur ’s in the following sequential way: Y = 1 ⇔ U1 ≤ θ1 Y = 2 given Y ≥ 2 ⇔ U2 ≤ θ2; and in general Y = r given Y ≥ r ⇔ Ur ≤ θr ; or equivalently Y > r given Y ≥ r ⇔ Ur > θr , r = 1, · · · , q (9) The sequential mechanism (9) models the transition from category r to category r + 1 given that category r is reached. The main difference to the threshold approach used in the cumulative model is the conditional modelling of the transitions. The sequential mechanism assumes a binary decision in each step. Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 72 / 143 §6.4.4 Sequential models (2) The sequential response mechanism (9) combined with the linear form of the latent variable Ur = −xTγ + εr , where εr has cdf F , leads to the sequential model with cdf F P(Y = r |Y ≥ r , x) = P(Ur ≤ θr ) = F (θr + xTγ), r = 1, · · · , k, (10) where θk = +∞. It follows that P(Y = r |x) = P(Y = r |Y ≥ r , x) · P(Y ≥ r |x) = F (θr + x Tγ) · P(Y > r − 1|Y ≥ r − 1, x) · P(Y > r − 2|Y ≥ r − 2, x) · · ·P(Y > 1|Y ≥ 1, x) = F (θr + x Tγ)[1− F (θr−1 + xTγ)] · · · [1− F (θ1 + xTγ)] = F (θr + x Tγ) r−1∏ i=1 [1− F (θi + xTγ)], r = 1, · · · , k; Note ∏0i=1[·] = 0 Model (10) is also called the continuation ratio model. Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 73 / 143 §6.4.4 Sequential models (3) Note that there is no need to impose any order restriction on θ1, · · · , θq in (10). Now specify a cdf F will give a specific sequential model. 1 Choosing F (x) = (1 + e−x)−1 gives the sequential logit model P(Y = r |Y ≥ r , x) = exp(θr + x Tγ) 1 + exp(θr + xTγ) , r = 1, · · · , k which is equivalent to log { P(Y = r |x) P(Y > r |x) } = θr + x Tγ. 2 Choosing F (x) = 1− exp(−ex), the sequential model has the form P(Y = r |Y ≥ r , x) = 1− exp ( −eθr+xTγ ) , r = 1, · · · , k (11) which is equivalent to log [ − log { P(Y > r |x) P(Y ≥ r |x) }] = θr + x Tγ, r = 1, · · · , k. (12) Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 74 / 143 §6.4.4 Sequential models (4) It is worth to note that (12) is equivalent to the cumulative model with F (x) = 1− exp(−ex). This means (12) is a special parametric form of the grouped Cox model. This equivalence follows by the reparameterisation: θr = log{e θ˜r − e θ˜r−1}, r = 1, · · · , k − 1. Note θ˜0 = −∞ and θ1 = θ˜1. Further θ˜r = log ( r∑ i=1 eθi ) . Substituting θr = log{e θ˜r − e θ˜r−1} into (12), one obtains − logP(Y > r |x) + logP(Y > r − 1|x) = e θ˜r+xTγ − e θ˜r−1+xTγ , r = 1, · · · , q ⇒ − logP(Y > r |x) = e θ˜r+xTγ , r = 1, · · · , q ⇒ P(Y ≤ r |x) = 1− exp ( −e θ˜r+xTγ ) which is a grouped Cox model. Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 75 / 143 §6.4.4 Sequential models (5) 3 Choosing exponential cdf F (x) = 1− e−x , the exponential sequential model becomes P(Y = r |Y ≥ r , x) = 1− e−(θr+xTγ) ⇔ − log { P(Y>r |x) P(Y≥r |x) } = θr + xTγ, r = 1, · · · , q. Generalised sequential models: If assuming θr = δr0 + zTδr , where δr = (δr1, · · · , δrm)T is a category-specific vector parameter, then one gets the Generalised sequential model : P(Y = r |Y ≥ r , x, z) = F (δr0 + zTδr + xTγ) (13) Remark: The effect of z is non-homogeneous over the categories, while the effect of x is homogeneous. Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 76 / 143 §6.4.4 Link functions and design matrices in sequential models Response and link functions may be derived directly from model (10). The link function g = (g1, · · · , gq)T is given by gr (pi1, · · · , piq) = F−1 ( pir 1− pi1 − · · · − pir−1 ) , r = 1, · · · , q and the response function h = (h1, · · · , hq)T has the form hr (η1, · · · , ηq) = F (ηr ) r−1∏ i=1 (1− F (ηi )), r = 1, · · · , q. For the sequential logit model, with r = 1, · · · , q gr (pi1, · · · , piq) = log pir 1−pi1−· · ·−pir−1 and hr (η1, · · · , ηq) = e ηr r−1∏ i=1 (1 + eηi )−1. In regard to the design matrices there is no difference between sequential and cumulative models. Thus the design matrices from §3.3.3 apply. Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 77 / 143 §6.4.4 Strict stochastic ordering (optional) Strict stochastic ordering is a concept generalizing the proportional odds for simple cumulative model (7) ∆c(x1, x2) = F −1 (P(Y ≤ r |x1))− F−1 (P(Y ≤ r |x2)) . If ∆c(x1, x2) = γT (x1 − x2), we say strict stochastic ordering holds. Then for general cumulative model (8) where w = x is set and xTγ is omitted, one can test strict stochastic ordering by testing β1 = β2 = · · · = βq. The concept can be generalized to the sequential model (10) as well. Let ∆s(x1, x2) = F −1 (P(Y = r |Y ≥ r , x1))−F−1 (P(Y = r |Y ≥ r , x2)) . If ∆s(x1, x2) = γT (x1 − x2), we say strict stochastic ordering holds for the sequential model. Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 78 / 143 §6.4.4 Two-step models (optional) (1) It may be more appropriate to divide the ordered categories into sets of categories with very homogeneous responses in the first step, then model the categories within each set separately in the second step. This is the two-step modelling approach. Example: Rheumatoid arthritis. Mehta, Patel & Tsiatis (1984) analysed data of patients with acute rheumatoid arthritis. A new agent was compared with an active control, and each patient was evaluated on a 5-point assessment scale. Table 7: Clinical trial of a new agent and an active control Global assessment Drug Much improved Improved No change Worse Much worse New agent 24 37 21 19 6 Active control 11 51 22 21 7 Global assessment may be subdivided into“improvement”, “no change” and “worse” first. Then “improvement” is split up into “much improved” and “improved” and the “worse” category is split into “worse” and “much worse.” Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 79 / 143 §6.4.4 Two-step models (optional) (2) Specifically, let the categories 1, · · · , k be subdivided into t basic sets S1, · · · ,St , where Sj = {mj−1 + 1, · · · ,mj}, m0 = 0 and mt = k. In Step 1: Y ∈ Sj ⇔ θj−1 < U0 ≤ θj , j = 1, · · · , t. Assume U0 = −xTγ0 + ε. In Step 2: Y = R | Y ∈ Sj ⇔ θj ,r−1 < Uj ≤ θjr . Assume Uj = −xTγ j + εj . Also ε, εj ’s are iid with cdf F . Then the two-step model becomes P(Y ∈ Tj |x) = F (θj + xTγ0), P(Y ≤ r |Y ∈ Sj , x) = F (θjr + xTγ j) (14) where Tj = S1 ∪ · · · ∪ Sj , θ1 < · · · < θt−1, θt =∞; θj ,mj−1+1 < · · · < θj ,mj−1 , θj ,mj =∞, j = 1, · · · , t. Since cumulative mechanisms are used in both steps, (14) is called a two-step cumulative model. Two-step sequential model can be similarly defined. Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 80 / 143 §6.4.4 Alternative approaches (optional) 1 Anderson (1984). P(Y = r |x) = e βr0−φrβT x 1 + ∑q i=1 e βi0−φiβT x , r = 1, · · · , q; where 1 = φ1 > · · · > φk = 0. 2 Williams and Grizzle (1972). k∑ r=1 srP(Y = r |x) = xTβ where s1, · · · , sk are some given scores for the categories. 3 Adjacent categories logit model (Agresti 1984). log P(Y = r |x) P(Y = r − 1|x) = x Tβr ⇔ P(Y = r |Y ∈ {r , r + 1}, x) = F (xTβr ) where F is the logistic cdf. Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 81 / 143 §6.5 Statistical inference for MGLM overview 1 MLE 2 Testing of linear hypotheses 3 Goodness of fit (model adequacy) statistics 4 Power-divergence statistics Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 82 / 143 §6.5.1 Maximum likelihood estimation in MGLM (1) An extension of the exponential family is the multivariate exponential family, having the pdf f (yi |θi , φ, ωi ) = exp { yTi θi − b(θi ) φ · ωi + c(yi , φ, ωi ) } For q × 1 vector observations y1, · · · , yn which are suppose to be independent of each other, let µi = E (yi |xi ), i = 1, · · · , n, with xi being the covariate vector. The structural assumption of MGLM is that µi ’s depend on the linear predictor ηi = Ziβ in the form µi = h(ηi ) = h(Ziβ) = [h1(Ziβ), · · · , h1(Ziβ)]T , i = 1, · · · , n where the response function h : S → M is defined on S ⊂ Rq, taking values in M ⊂ Rq; Zi is a q × p design matrix for unit i ; β = (β1, · · · , βp)T is an unknown parameter vector from B ⊂ Rp. Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 83 / 143 §6.5.1 Maximum likelihood estimation in MGLM (2) The MLE of β may be derived in analogy to the one-dimensional case. The log-likelihood kernel for observation yi is `i (µi ) = yTi θi − b(θi ) φ · ωi , where θi = θ(µi ) (15) Using the relation µi = h(ηi ) = h(Ziβ), it can be found that the score function is s(β) = ∂` ∂β = n∑ i=1 si (β), with si (β) = Z T i Di (β)Σ −1 i (β)[yi − µi (β)], (16) where Di (β) = ∂hT (ηi ) ∂ηi is the derivative matrix of hT (ηi ) evaluated at ηi = Ziβ, which is also called the Jacobian (matrix). And Σi (β) = cov(yi ). Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 84 / 143 §6.5.1 Maximum likelihood estimation in MGLM (3) An alternative form for si (β) is si (β) = Z T i Wi (β) ∂g(µi ) ∂µTi [yi − µi (β)], where Wi (β) = Di (β)Σ −1 i (β)Di (β) = { ∂g(µi ) ∂µTi Σi (β) ∂gT (µi ) ∂µi }−1 (17) which approximately equals (cov(g(yi )) −1. The expected Fisher information matrix is F (β) = cov(s(β)) = n∑ i=1 ZTi Wi (β)Zi . Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 85 / 143 §6.5.1 Maximum likelihood estimation in MGLM (4) In matrix form the above may be rewritten as s(β) = ZTD(β)Σ−1(β)[y− µ(β)] F (β) = ZTW (β)Z . where y = (yT1 , · · · , yTn )T , µ(β) = (µ1(β)T , · · · ,µn(β)T )T , Σ(β) = diag (Σi (β)), W (β) = diag (Wi (β)), D(β) = diag (Di (β)), all being block diagonal, and the total design matrix Z = [ ZT1 , · · · ,ZTn ]T . In the situation of grouped data, yi is the mean vector over ni observations and Σi (β) is replaced by n −1 i Σi (β). Under regularity conditions, MLE βˆ a∼ N(β,F−1(βˆ)). Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 86 / 143 §6.5.1 Numerical computing in MGLM This is the same as in the univariate case, except using the multivariate versions. 1 Working or pseudo observation vector y˜(β) = (y˜1(β) T , · · · , y˜n(β)T )T , where y˜i (β) = Ziβ + ( D−1i (β) )T [yi − µi (β)] is an approximation for g(µi (β)). 2 The IRWLS estimate is βˆ (k+1) = ( ZTW (βˆ (k) )Z )−1 ZTW (βˆ (k) )y˜(β(k)) which is equivalent to the Fisher scoring iteration βˆ (k+1) = βˆ (k) + ( ZTW (βˆ (k) )Z )−1 s(β(k)). Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 87 / 143 §6.5.2 Testing linear hypotheses in MGLM The linear hypotheses H0 : Cβ = ξ vs. H1 : Cβ 6= ξ can be tested in the same way as in the univariate GLM, except replacing the score and Fisher information by their multivariate versions. Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 88 / 143 §6.5.2 Goodness-of-fit statistics in MGLM (1) Goodness of fit (or model adequacy) of the MGLM can be assessed based on the Pearson statistics and the deviance (the data need be grouped as much as possible). General Pearson Statistics: χ2 = g∑ i=1 (yi − µˆi )TΣ−1i (βˆ)(yi − µˆi ) In the case of multi-categorical response variable with multinomial distribution niyi ∼ M(ni ,pii ), we have χ2 = g∑ i=1 χ2p(yi , pˆii ) = g∑ i=1 ni k∑ j=1 (yij − pˆiij)2 pˆiij with yik = 1− yi1 − · · · − yiq and pˆiik = 1− pˆii1 − · · · − pˆiiq. Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 89 / 143 §6.5.2 Goodness-of-fit statistics in MGLM (2) Deviance or likelihood ratio (LR) statistics: D = −2 g∑ i=1 {`i (pˆii )− `i (yi )} . For multinomial data, D = 2 g∑ i=1 χ2D(yi , pˆii ) = 2 g∑ i=1 ni k∑ j=1 yij log yij pˆiij Under regularity conditions, χ2 a∼ χ2(g(k − 1)− p) and D a∼ χ2(g(k − 1)− p) where p is the number of estimated parameters. For sparse data with small ni , one should use alternative asymtotics. Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 90 / 143 §6.5.3 Power-divergence family (1) For categorical data a more general single-parameter family of goodness-of-fit statistics is the power-divergence family, introduced by Cressie and Read (1984, JRSSB). The power-divergence statistic with parameter λ ∈ R is given by Sλ = g∑ i=1 SDλ(yi , pˆii ) where the sum of deviations over observations at yi is SDλ(yi , pˆii ) = 2ni λ(λ+ 1) k∑ j=1 yij [( yij pˆiij )λ − 1 ] , −∞ < λ Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 91 / 143 §6.5.3 Power-divergence family (2) 1 At λ = 1, S1 = g∑ i=1 ni k∑ j=1 (yij − pˆiij)2 pˆiij , the usual Pearson statistic. 2 At λ→ 0, S0 = 2 g∑ i=1 ni k∑ j=1 yij log yij pˆiij , the likelihood ratio statistic. 3 At λ→ −1, S−1 = 2 g∑ i=1 ni k∑ j=1 pˆiij log pˆiij yij , Kullback’s minimum discrimination information statistic. 4 At λ = −2, S−2 = g∑ i=1 ni k∑ j=1 (yij − pˆiij)2 yij , Neyman’s minimum modified χ2-statistic. 5 At λ = −12 , S− 12 =4 g∑ i=1 k∑ j=1 ( √ niyij − √ ni pˆiij) 2, Freeman-Tukey statistic. In general, λ ∈ [−1, 2] is recommended. Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 92 / 143 §6.5.3 Asymptotic properties under classical “fixed cells” assumptions Classical assumptions in asymptotic theory for grouped data imply 1 a fixed number of groups (g is fixed) 2 increasing sample sizes ni →∞, i = 1, · · · , g , such that nin → λi for fixed proportions λi > 0, i = 1, · · · , g . 3 a fixed “number of cells” k in each group 4 a fixed number of parameters. If these assumptions hold and in addition, the model under consideration is correct, then Sλ a∼ χ2(g(k − 1)− p), p is the # of estimated parameters. And under local alternative hypothesis, the limit distribution of Sλ is non-central χ2. Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 93 / 143 §6.5.3 Asymptotic properties under sparseness and “increasing-cells” assumptions If several explanatory variables are considered, the number of observations for a fixed explanatory variable x is often small and the usual asymptotic machinery will fail. Under such sparseness conditions, it may be assumed that with increasing sample size n→∞ the number of groups (values of the explanatory variables) is also increasing with g →∞, resulting in the “increasing-cells” setting. Now Sλ is no longer χ 2-distributed in the limit. Rather Sλ has an asymptotic normal distribution under H0 that the model holds. Details are not pursued here. Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 94 / 143 §6.6 Multivariate models for correlated responses So far we have been assuming there is only one response variable, possibly multi-categorical, being under consideration. Now we consider the situations where a vector of correlated or clustered response variables is observed, together with covariates, for each unit in the sample. These situations often happen in longitudinal studies, repeated measurements studies, and grouped (clustered) studies, etc. When the response variables are approximately Gaussian, multivariate linear models are commonly used for the underlying data analysis, which has been well established. The situations become more difficult when the response variables are discrete, categorical or mixed discrete/categorical/continuous. Three main approaches will be introduced here: 1 conditional models 2 marginal models 3 random effects models Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 95 / 143 §6.6.1 Conditional models: Asymmetric models (optional)1 Asymmetric models In many applications, the components of a response vector are ordered in a way that some components are prior to the other components, e.g. if they refer to events that take place earlier. Consider the simplest case of a response vector Y = (Y1,Y2), with categorical components Y1 ∈ {1, · · · , k1} and Y2 ∈ {1, · · · , k2}. Let Y2 refer to events that may be considered conditional on Y1. An example: response years in school (Y1) and the statement about happiness (Y2). Then Y1 may be modelled to be dependent on the explanatory variables x, using a model introduced in §3.2 and §3.3. And Y2 may be modelled based on Y1 and x, also using a model introduced in §3.2 and §3.3. Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 96 / 143 §6.6.1 Conditional models: Asymmetric models (optional)2 Asymmetric models In general, consider m categorical responses Y1, · · · ,Ym, where Yj depends on Y1, · · · ,Yj−1 but not on Yj+1, · · · ,Ym. Models that make use of this dependence structure are based on the decomposition P(Y1, · · · ,Ym|x) = P(Y1|x) · P(Y2|Y1, x) · · ·P(Ym|Y1, · · · ,Ym−1, x) (19) Simple models arise if each component in (19) is specified by a GLM: P(Yj = r |Y1, · · · ,Yj−1, x) = hj(Zjβ) (20) where Zj = Z (Y1, · · · ,Yj−1, x) is a function of previous components Y1, · · · ,Yj−1 and the explanatory variables x. Models of form (20) are sometimes called data-driven ones. Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 97 / 143 §6.6.1 Conditional models: Asymmetric models (optional)3 Markov-type transition models follow from the additional assumption P(Yj = r |Y1, · · · ,Yj−1, x) = P(Yj = r |Yj−1, x). One such example for binary responses is log P(y1 = 1|x) P(y1 = 0|x) = β01 + z T 1 β1 log P(yj = 1|y1, · · · , yj−1, x) P(yj = 0|y1, · · · , yj−1, x) = β0j + z T j βj + yj−1γj , j = 2, · · · , k. It is called a regressive logistic model when log P(yj = 1|y1, · · · , yj−1, x) P(yj = 0|y1, · · · , yj−1, x) = β0 + z T j β + γ1y1 + · · ·+ γj−1yj−1. Models of the type (20) may be embedded into the GLM framework, meaning some built-in functions in R (e.g. glm()) may be used to do the computing. In general, however, you need to write your own program to do the analysis. Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 98 / 143 §6.6.1 Conditional models: Asymmetric models (optional)4 Example: Reported happiness. Clogg (1982) investigated the association between gender (x), years in school (Y1), and reported happiness (Y2). The data are given in the following. Since x and Y1 are prior to the statement about happiness, Y2 is modelled conditionally on Y1 and x . Table 8: Cross classification of gender, reported happiness, and years of schooling Years of school completed Gender Reported happiness < 12 12 13-16 ≥ 17 Male Not too happy 40 21 14 3 Pretty happy 131 116 112 27 Very happy 82 61 55 27 Female Not too happy 62 26 12 3 Pretty happy 155 156 95 15 Very happy 87 127 76 15 Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 99 / 143 §6.6.1 Conditional models: Symmetric models (optional)1 Symmetric models Suppose the response vector Y = (y1, · · · , ym). If there is no natural ordering among y1, · · · , ym, or if one does not want to use this ordering, models that treat response components in a symmetric way are more sensible. An example is Visual Impairment Study seen in Chapter 1. Now focus on the case where all y1, · · · , ym are binary. (General cases can be handled similarly.) Symmetric conditional models can be developed by specifying a conditional distribution to be used: P(yj = 1|yk , k 6= j ; xj), j = 1, · · · ,m (21) For binary responses such conditional distributions uniquely determine the joint distribution. Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 100 / 143 §6.6.1 Example: Visual impairment study Table 9: Visual impairment data, from Liang, Zeger & Qaqish (1992) White Black Visual Age impairment 40-50 51-60 61-70 70+ 40-50 51-60 61-70 70+ Total Left eye Yes 15 24 42 139 29 38 50 85 422 No 617 557 789 673 750 574 473 344 4777 Right eye Yes 19 25 48 146 31 37 49 93 448 No 613 556 783 666 748 575 474 336 4751 Vector binary response variable (y1, y2), where y1 = 1 if left-eye impaired, 0 otherwise; y2 = 1 if right-eye impaired, 0 otherwise. Covariates: Age (yrs.), Race (W or B). Aim: find the effect of race and age on visual impairment. Complication: y1 and y2 are correlated. Methods: multivariate models for correlated responses; conditional models; asymmetric models, marginal models, GEE, etc.. Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 101 / 143 §6.6.1 Conditional models: Symmetric models (optional)2 Symmetric models Symmetric logistic models are a natural choice for (21): pi = P(yj = 1|yk , k 6= j ; xj) = h(α(wj ;θ) + xTj βj), j = 1, · · · ,m (22) where h(t) = et 1 + et is the logistic cdf; and α(·) is some function of a parameter θ and wj = ∑ k 6=j yk . When k = 2, pi = P(yj = 1|yk , k 6= j ; xj) = h(θ0+θ1yk +xTj βj), j , k = 1, 2. (23) Full likelihood estimation procedure for (22) or (23) may be computationally cumbersome, because of the normalising constant involved in the joint pmf. Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 102 / 143 §6.6.1 Conditional models: Symmetric models (optional)3 Instead, a quasi-likelihood approach (Conolly and Liang, 1988) may be used, with an independent working quasi-likelihood and quasi-score function for each cluster i(= 1, · · · , n): Li (β,θ) = m∏ j=1 pi yij ij (1− piij)1−yij Si (β,θ) = m∑ j=1 ∂piij ∂(βT ,θT )T σ−2ij (yij − piij(β,θ)) where yi = (yi1, · · · , yij , · · · , yim)T are the responses in cluster i , piij(β,θ) = P(yij = 1|·) is defined by (22), and σ2ij = piij(β,θ)(1− piij(β,θ)). Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 103 / 143 §6.6.1 Conditional models: Symmetric models (optional)4 Denoting Mi = diag { ∂pii1 ∂(βT ,θT )T , · · · , ∂piim ∂(βT ,θT )T } , Σi = diag{σ2i1, · · · , σ2im} and pi = (pii1, · · · , piim)T , we can rewrite Si (β,θ) in matrix form Si (β,θ) = MiΣ −1 i (yi − pii ) which is a multivariate extension of the quasi-score. Roots (βˆ, θˆ) of the resulting generalised estimating equation (GEE) S(β,θ) = n∑ i=1 Si (β,θ) = 0 are consistent & asymptotically normal under regularity assumptions: (βˆ T , θˆ T )T a∼ N((βT ,θT )T , Fˆ−1Vˆ Fˆ−1) with Fˆ = ∑n i=1 Mˆi Σˆ −1 i Mˆi and Vˆ = ∑n i=1 Sˆi Sˆ T i . Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 104 / 143 §6.6.1 Two drawbacks of the conditional models 1 They measure the effect of x on a binary component yj conditional on incorporating into the effects of other yk , k 6= j . Thus the model is not able to provide prediction based on x alone. 2 Interpretation of the effects depends on the dimension of y. Both drawbacks can be avoided by using a marginal modelling approach. Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 105 / 143 §6.6.2 Marginal models In many situations the primary interest is to analyse the marginal mean of the response given the covariates. The association between the responses is often of secondary interest. An example is the visual impairment study. Marginal models were first proposed by Liang & Zeger (1986) and Zeger & Liang (1986) in the context of longitudinal data with many short time series. Their modelling and estimation approach is based on GEE and has subsequently been modified and further developed. Focus in this subsection is on modelling and estimating marginal means of correlated and categorical response data by the first-order generalised estimating equations (GEE1). Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 106 / 143 §6.6.2 Marginal models for correlated univariate responses (1) In marginal models, the effects of covariates on responses and the association between responses is modelled separately. Let yi = (yi1, · · · , yimi )T be the vector of responses, and xTi = (x T i1, · · · , xTimi ) be the vector of covariates from cluster i , i = 1, · · · , n. Here the cluster size mi may vary with the cluster index i . Response observations within the same cluster are correlated. Response observations between different clusters are assumed independent of each other. Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 107 / 143 §6.6.2 Marginal models for correlated univariate responses (2) Specification of a marginal model is as following 1 The marginal means of yij , j = 1, · · · ,mi , are assumed correctly specified by common univariate response models: µij(β) = E (yij |xij) = h(zTij β) (24) where h(·) is a response function, e.g. a logit function, and zij is an appropriate design vector. 2 The marginal variance of each yij is specified as a function of µij : σ2ij = var(yij |xij) = v(µij)φ (25) where v(·) is the variance function of known form. 3 The correlation between yij and yik is a function of µij = µij(β), µik = µik(β) and perhaps of additional association parameters α: corr(yij , yik) = c(µij , µik ,α) (26) with the function c(·, ·, ·) being of known form. Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 108 / 143 §6.6.2 Marginal models for correlated univariate responses (3) Remarks: 1 Parameters (β,α) are the same for each cluster. Hence marginal models are appropriate for analysing population-averaged effects. 2 Marginal effects β can be consistently estimated even if the correlation function is incorrectly specified. However, the efficiency of the estimator βˆ can be compromised. Since the primary scientific objective is often the regression relationship, it is important to correctly specify the marginal mean structure (24), while c(µij , µik ,α) only needs to be a working correlation for the association between yij and yik . Together with (25) a working covariance matrix cov(yi ) = Σi (β,α) can be obtained, where an additional dispersion parameter φ is involved but suppressed for presentation simplicity. Two main approaches for specifying the working correlations or covariances have been considered in the literature. Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 109 / 143 §6.6.2 Marginal models for correlated univariate responses (4) First approach: The variance structure (25) is supplemented by a working correlation matrix Ri (α), so that the working covariance matrix is of the form Σi (β,α) = C 1/2 i (β)Ri (α)C 1/2 i (β), where Ci (β) = diag [var(yij |xij)] = diag{σ2i1, · · · , σ2imi}. Common choices for Ri (α): 1 working independence model: Ri (α) = I , the identity matrix. 2 equicorrelation (or exchangeable) model: corr(yij , yik) = α for all j 6= k, i.e. α reduces to be a scalar, and Ri (α) = 1 α · · · α α 1 · · · α ... ... . . . ... α α · · · 1 . 3 Ri (α) completely unspecified, except being positive definite, i.e. αjk = corr(yij , yik), j < k. Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 110 / 143 §6.6.2 Marginal models for correlated univariate responses (5) Second approach: For binary responses the odds ratio is an alternative measure of association that is easier to interpret and has some desirable properties. The odds ratio for yij , yik , j , k = 1, · · · ,m, j 6= k, is defined as γijk = P(yij = 1, yik = 1) · P(yij = 0, yik = 0) P(yij = 1, yik = 0) · P(yij = 0, yik = 1) From this definition, the probability P(yij = yik = 1), denoted as pii11, can be shown to satisfy pii11 = E (yijyik) = { 1−(piij+piik )(1−γijk )−s(piij ,piik ,γijk) 2(γijk−1) if γijk 6= 1 piijpiik if γijk = 1 (27) with s(piij , piik , γijk) = ( [1−(piij +piik)(1−γijk)]2 − 4(γijk−1)γijkpiijpiik )1/2 . It can be seen that the covariance matrix of yi = (yi1 · · · , yimi )T can be expressed as a function of (piij , piik , γijk). Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 111 / 143 §6.6.2 Marginal models for correlated univariate responses (6) One advantage of the odds ratio is that it is unconstrained, being able to take any positive values. In contrast, corr(yij , yik) = pii11 − piijpiik√ piij(1− piij)piik(1− piik) , the correlation between yij and yik is constrained (i.e. its possible values may all fall into a subset of [−1, 1]), because the involved pii11 is constrained by max(0, piij + piik − 1) ≤ pii11 ≤ min(piij , piik). Note pii11 = P(yij = 1) + P(yik = 1)−P(yij = 1 or yik = 1) ≥ piij +piik − 1. Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 112 / 143 §6.6.2 Marginal models for correlated univariate responses (7) To reduce the number of unknown parameters involved in the odds ratio, we assume γijk = γijk(α), a function of a vector parameter α. Common choices of γijk(α): 1 γijk = γ, for all i , j , k. 2 log γijk = α Tωijk . Using the relation (27), cov(yi ) = Σi (β,α) is also a function of β and α. Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 113 / 143 §6.6.2 Marginal models for correlated univariate responses (8) Some examples of marginal models: I Continuous responses: µij(β) = E (yij |xij) = zTij β; var(yij |xij) = φ = σ2; corr(yij , yik) = αjk . II Binary responses: µij(β) = piij(β) = P(yij = 1|xij), log piij(β) 1− piij(β) = z T ij β; var(yij |xij) = piij(β)(1− piij(β)); corr(yij , yik) = 0 (independence struc.) or γijk = α (equal odds ratio). III Count data: logµij(β) = log E (yij |xij) = zTij β; var(yij |xij) = µij(β)φ; corr(yij , yik) = α (equicorrelation). Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 114 / 143 §6.6.2 GEE approach in statistical inference (1) For cluster i = 1, · · · , n, let yi = (yi1, · · · , yimi )T , xi = (xTi1, · · · , xTimi )T , µi (β) = (µi1(β1), · · · , µimi (βmi ))T and Σi (β,α) be defined as before. In linear models for Gaussian responses, there is no problem to perform maximum likelihood estimation, since specification of means and covariances determines the likelihood function. This is not the case with non-Gaussian data. Therefore a generalised estimating approach is proposed. Keeping the association parameter α and φ, if present, fixed for the moment, the generalised estimating equation (GEE) for effect β is Sβ(β,α) = n∑ i=1 ZTi Di (β)Σ −1 i (β,α)(yi − µi (β)) = 0, (28) with mi -row design matrix Zi = (zi1, · · · , zimi )T and diagonal matrices Di (β) = diag{Dij(β)}, Dij(β) = ∂h∂ηij evaluated at ηij = zTij β. The middle-part of (28) is a multivariate quasi-score function. Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 115 / 143 §6.6.2 GEE approach in statistical inference (2) A GEE estimate βˆ is computed by iterating a modified Fisher scoring algorithm w.r.t. β and estimation of α (and φ): (i) Given current estimates αˆ (and φˆ), the GEE (28) for βˆ is solved by the iterations βˆ (k+1) = βˆ (k) + (Fˆ (k))−1Sˆβ(βˆ (k) , αˆ), k = 0, 1, 2, · · · , with Fˆ (k) = n∑ i=1 ZTi Di (βˆ (k) )Σ−1i (βˆ (k) , αˆ)Di (βˆ (k) )Zi being the observed quasi-information matrix. Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 116 / 143 §6.6.2 GEE approach in statistical inference (3) (ii) If α is unknown, it can be consistently estimated by a method of moments based on Pearson residuals rˆij = yij − µˆij√ v(µˆij) . The dispersion parameter φ is estimated consistently by φˆ = 1 N − p n∑ i=1 mi∑ j=1 rˆ 2ij , with N = ∑n i=1 mi and p = dim(β). Estimation of α depends on the choice of Ri (α). For exchangeable correlation matrix Ri (α) with dim(α) = 1, αˆ = [ φˆ { n∑ i=1 1 2 mi (mi − 1)− p }]−1 n∑ i=1 ∑ k>j rˆik rˆij . An unspecified working correlation matrix R can be estimated by Rˆ = 1 nφˆ n∑ i=1 C − 12 i (yi − µˆi )(yi − µˆi )TC − 12 i if all cluster sizes mi = m and m << n. Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 117 / 143 §6.6.2 GEE approach in statistical inference (4) Cycling between Fisher scoring steps for β and consistent estimation of α and φ leads to a consistent estimation of β. Alternatively, α (and possible φ) can be estimated by simultaneously solving an additional estimating equation. Details are not pursued here. Under regularity conditions, the GEE estimator βˆ satisfies βˆ a∼ N(β,F−1VF−1) with F = ∑n i=1 Z T i DiΣ −1 i DiZi and V = ∑n i=1 Z T i DiΣ −1 i SiΣ −1 i DiZi , where Si is the true covariance matrix of yi . cov(βˆ) is approximated by the “sandwich matrix”: Aˆ = Fˆ−1 { n∑ i=1 ZTi Dˆi Σˆ −1 i (yi − µˆi )(yi − µˆi )T Σˆ−1i DˆiZi } Fˆ−1 cov(Sβ(β,α)) = V and Sβ(β,α) a∼ N(0,V ). Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 118 / 143 §6.6.2 Visual impairment example (1) We use GEE approach to fit a marginal model to the visual impairment data. R packages gee with command gee() and geepack with command geeglm() can be used. (There are a few problems with both packages). library(gee); library(geepack) VI.dat ID Yes No Age Race 1 1 15 617 40to50 White 2 1 19 613 40to50 White 3 2 24 557 51to60 White 4 2 25 556 51to60 White 5 3 42 789 61to70 White 6 3 48 783 61to70 White 7 4 139 673 70+ White 8 4 146 666 70+ White 9 5 29 750 40to50 Black 10 5 31 748 40to50 Black 11 6 38 574 51to60 Black 12 6 37 575 51to60 Black 13 7 50 473 61to70 Black 14 7 49 474 61to70 Black 15 8 85 344 70+ Black 16 8 93 336 70+ Black Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 119 / 143 §6.6.2 Visual impairment example (2) VI1edata = VI.dat, family = binomial, corstr = "exchangeable") summary(VI1e) GEE: GENERALIZED LINEAR MODELS FOR DEPENDENT DATA gee S-function, version 4.13 modified 98/01/27 (1998) Model: Link: Logit Variance to Mean Relation: Binomial Correlation Structure: Exchangeable Call: gee(formula = cbind(Yes, No) ~ Age + Race, id = ID, data = VI.dat, family = binomial, corstr = "exchangeable") Summary of Residuals: Min 1Q Median 3Q Max 15.0 28.0 39.9 58.6 145.8 Coefficients: Estimate Naive S.E. Naive z Robust S.E. Robust z (Intercept) -3.225 0.1125 -28.66 0.0254 -127.22 Age51to60 0.478 0.1451 3.30 0.0167 28.65 Age61to70 0.837 0.1348 6.21 0.0905 9.26 Age70+ 1.973 0.1227 16.08 0.0531 37.13 RaceWhite -0.350 0.0768 -4.56 0.0662 -5.28 Estimated Scale Parameter: 0.571 Number of Iterations: 1 Working Correlation [,1] [,2] [1,] 1.000 0.886 [2,] 0.886 1.000 Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 120 / 143 §6.6.2 Visual impairment example (3) Call: gee(formula = cbind(Yes, No) ~ Age + Race, id = ID, data = VI.dat, family = binomial, corstr = "exchangeable") Summary of Residuals: ###Residuals calculation is not correct in gee(). Min 1Q Median 3Q Max 15.0 28.0 39.9 58.6 145.8 Coefficients: Estimate Naive S.E. Naive z Robust S.E. Robust z (Intercept) -3.225 0.1125 -28.66 0.0254 -127.22 Age51to60 0.478 0.1451 3.30 0.0167 28.65 Age61to70 0.837 0.1348 6.21 0.0905 9.26 Age70+ 1.973 0.1227 16.08 0.0531 37.13 RaceWhite -0.350 0.0768 -4.56 0.0662 -5.28 Estimated Scale Parameter: 0.571 Number of Iterations: 1 Working Correlation [,1] [,2] [1,] 1.000 0.886 [2,] 0.886 1.000 Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 121 / 143 §6.6.2 Visual impairment example (4) total=VI.dat$Yes+VI.dat$No pearson.res1e= (VI.dat$Yes-total*VI1e$fitted)/sqrt(total*(VI1e$fitted)*(1-VI1e$fitted)) sum(pearson.res1e^2)/11 [1] 0.5708722 #estimate of \phi based on Pearson residuals > VI1e$scale [1] 0.5708722 (VI1e$scale*3)^(-1)*sum(pearson.res1e[2*(1:8)-1]*pearson.res1e[2*(1:8)]) [1] 0.8861748 > VI1e$working.correlation [,1] [,2] [1,] 1.0000000 0.8861748 [2,] 0.8861748 1.0000000 Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 122 / 143 §6.6.2 Visual impairment example (5) VI3 data=VI.dat, id=ID, corstr = “exchangeable”, std.err=”san.se”) summary(VI3) Call: geeglm(formula = cbind(Yes, No) ~ Age + Race, family = binomial(link = “logit”), data = VI.dat, id = ID, corstr = “exchangeable”, std.err = “san.se”) Coefficients: Estimate Std.err Wald Pr(>|W|) (Intercept) -3.2253 0.0254 16184.7 < 2e-16 *** Age51to60 0.4783 0.0167 820.7 < 2e-16 *** Age61to70 0.8374 0.0905 85.7 < 2e-16 *** Age70+ 1.9728 0.0531 1378.3 < 2e-16 *** RaceWhite -0.3498 0.0662 27.9 1.3e-07 *** --- Estimated Scale Parameters: #IScale seems defined differently in gee() and geeglm() Estimate Std.err (Intercept) 0.000604 0.000323 Correlation: Structure = exchangeable Link = identity Estimated Correlation Parameters: Estimate Std.err #The estimation here is about half of that in gee(). alpha 0.441 0.274 Number of clusters: 8 Maximum cluster size: 2 Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 123 / 143 §6.6.2 Visual impairment example (6) pearson.res3=(VI.dat$Yes/total-VI3$fitted)/sqrt((VI3$fitted)*(1-VI3$fitted)/total) phi.est=sum(pearson.res3^2)/11 #=0.5709 sum((summary(VI3)$deviance.resid)^2)/11 [1] 0.577 #very close to 0.5709. #Using deviance residuals or Pearson residuals give similar results. VI3$geese$gamma #How is gamma defined? (Intercept) 0.000604 (phi.est*3)^(-1)*sum(pearson.res3[2*(1:8)-1]*pearson.res3[2*(1:8)]) [1] 0.8862 #estimated alpha (0.577*3)^(-1)*sum(summary(VI3)$devi[2*(1:8)-1]*summary(VI3)$devi[2*(1:8)]) [1] 0.872 VI3$geese$alpha alpha 0.4415 #approximately half of 0.8862. Then How is alpha defined here? Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 124 / 143 §6.6.2 Marginal models for correlated responses having k categories (1) Suppose categorical responses Yij , j = 1, · · · ,mi , are observed in cluster i , i = 1, · · · , n. For simplicity, each Yij has the same k categories and is coded by a dummy vector yij = (yij1, · · · , yijq)T , q = k − 1 Let yTi = (y T i1, · · · , yTimi ) be observations of yij in cluster i ; xTi = (x T i1, · · · , xTimi ) be the corresponding covariate observations. For data involving categorical responses, a marginal categorical response model can be defined for each response variable, and then supplemented by a working association model to relate to the responses with each other with in each cluster. Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 125 / 143 §6.6.2 Marginal models for correlated responses having k categories (2) Here we focus on ordinal responses. Other types of categorical responses can be handled similarly. (i) The vector of marginal means or categorical probabilities of Yij is assumed being correctly specified by an ordinal response model : piij(β) = (piij1(β), · · · , piijq(β))T = h(Zijβ) with piijr = P(Yij ≤ r |xij) = ∑r `=1 P(yij` = 1|xij), and the response function h(·) and design matrix Zij . (ii) The marginal covariance function of yij is given by Σij = cov(yij |xij) = diag(piij)− piijpiTij i.e. the covariance matrix of a multinomial random variable. Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 126 / 143 §6.6.2 Marginal models for correlated responses having k categories (3) (iii) Association between Yij and Yik is modelled by a working correlation matrix Ri or by global cross-ratios. For example, the working matrix of exchangeable correlations is Ri (α) = I Q · · · Q QT I · · · Q ... ... . . . ... QT QT · · · I where Qq×q contains α to be estimated by a method of moments. Global cross-ratios (GCR), for each pair of categories ` and m of Yij and Yik , are defined as γijk (`,m) = P(Yij ≤ `,Yik ≤ m)P(Yij > `,Yik > m) P(Yij > `,Yik ≤ m)P(Yij ≤ `,Yik > m) GCR can be modelled log-linearly, i.e. log(γijk(`,m)) = α`m or by a regression model including covariate effects. For the model specified by (i), (ii) and (iii), the involved regression and association parameters can be estimated by a multivariate extension of the GEE approach discussed before. Details not pursued here. R packages multgee, geepack and repolr may be used to fit the above model. Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 127 / 143 §6.6.2 Likelihood-based inference for marginal models The GEE approach is not likelihood-based as it does not require specification of the joint distribution of multivariate response vector yi , i = 1, · · · , n. Difficulty with the likelihood-based inference is due to the difficulty in formulating this joint distribution. So in general it is difficult to perform likelihood-based inference for marginal models. But there are new developments: Constructing joint multivariate distribution by copulas Vector GLM Bayesian inference. Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 128 / 143 §6.6.3 Marginal models for longitudinal data (1) In many longitudinal studies, data consist of a small or moderate number of repeated observations for many subjects; and the main objective is to analyse the effects of covariates on a response variable, without conditioning on previous responses. In this setting it is more natural to view the data as a cross section of individual time series yi = (y T i1, · · · , yTiTi )T , xi = (xTi1, · · · , xTiTi )T , i = 1, · · · , n. It is also often assume that, given the covariates, individual time series are mutually independent. Marginal models introduced in §3.5.2 can be used for these cases. Suppose yit = (yit1, · · · , yitq)T are q = k − 1 dummy variables describing a multi-categorical response Yit with k categories. For simplicity, assume the number of categories is the same for all Yit responses. Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 129 / 143 §6.6.3 Marginal models for longitudinal data (2) Marginal GEE models for longitudinal data are then defined as (i) The marginal mean is correctly specified by a response model µit(β) = E (yit |xit) = h(ηit), ηit = Zitβ where ηit is q-dimensional with components ηitr = z T itrβ, z T itr is the rth row of Zit . (ii) The association structure for yi = (y T i1, · · · , yTiTi )T is specified by a “working” covariance matrix Σi (β,α), depending on regression parameters β, association parameters α, and possibly an additional nuisance parameter φ. Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 130 / 143 §6.6.3 Marginal models for longitudinal data (3) Σi (β,α) can be determined by a variance function for yit together with a “working” correlation matrix Ri (α) or some odds ratio model γ(α). Common choices for Ri (α): 1 stationary correlations between yis and yit (Ri (α))st = α(|t − s|) (29) 2 its special form α(|t − s|) = α|t−s|. AR(1) correlation. Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 131 / 143 Statistical inference in marginal GEE models (1) Estimation of β using GEEs is carried out in complete analogy to §3.5.2. Denote E (yi |xi ) = µi (β) = (µTi1, · · · ,µTiTi )T , ZTi = (Zi1, · · · ,ZiTi ), and (block-) diagnol matrices Di (β) = diag{Dit(β)}, with Dit(β) = ∂hT ∂ηit |ηit= Zitβ. The GEE for β is sβ(β,α) = n∑ i=1 ZTi Di (β)Σ −1 i (β,α)(yi − µi (β)) = 0 (30) Given current estimates αˆ and φˆ, the GEE (30) for βˆ is solved by βˆ (k+1) = βˆ (k) + ( Fˆ (k) )−1 sˆ(k), k = 1, 2, · · · with the “working” Fisher information matrix Fˆ (k) = ∑n i=1 Z T i Di (βˆ (k) )Σ−1i (βˆ (k) , αˆ)DTi (βˆ (k) )Zi and sˆ(k) = sβ(βˆ (k) , αˆ). Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 132 / 143 Statistical inference in marginal GEE models (2) Given a current estimate of β, α and φ can be estimated from Pearson residuals rˆit = yit − µˆit (v(µˆit))1/2 by the method of moments. 1 φˆ = 1 N − p n∑ i=1 Ti∑ t=1 rˆ 2it , N = n∑ i=1 Ti 2 Estimation of α depends on the choice of Ri (α). For the exchangeable correlation model αˆ = { φˆ [ n∑ i=1 1 2 Ti (Ti − 1)− p ]}−1 n∑ i=1 ∑ t>s rˆit rˆis 3 A totally unspecified R = R(α) can be estimated if T << n. 4 In particular for binary and categorical observations, estimation of α via GEE2 is often preferrable. Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 133 / 143 Statistical inference in marginal GEE models (3) Consistency and asymptotic normality results for βˆ can be obtained along the lines of asymptotic theory of quasi-MLEs for independent observations if Ti , i = 1, · · · , n are fixed and n→∞. If αˆ is √ n-consistent given β and φ; and φˆ is √ n-consistent given β, then βˆ is √ n-consistent and asymptotically normal, i.e. βˆ a∼ N(β,F−1VF−1) with F = n∑ i=1 ZTi DiΣ −1 i D T i Zi , V = n∑ i=1 ZTi DiΣ −1 i Cov(yi )Σ −1 i D T i Zi . Asymptotic covariance matrix A = F−1VF−1 can be consistently estimated by replacing β, α and φ by their respective consistent estimates and Cov(yi ) by (yi − µˆi )(yi − µˆi )T , i.e. by sandwich matrix Aˆ = Ĉov(βˆ) a≈ Fˆ−1 { n∑ i=1 ZTi Dˆi Σˆ −1 i (yi−µˆi )(yi−µˆi )T Σˆ−1i DˆTi Zi } Fˆ−1 (31) Based on the preceeding asymptotic result, confidence intervals for β and Wald and score tests may be constucted; cf. §2.3.1. Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 134 / 143 Example 6.4 Ohio children (1) The ohio data concern 537 children from Steubenville, Ohio and were taken as part of a study on the effects of air pollution. Children were in the study for four years from age seven to ten. The response is whether they wheezed or not. The variables are resp: an indicator of wheeze status (1=yes, 0=no) id: an identifier for the child, taking values from 0 to 536 age: 7 yrs = −2; 8 yrs = −1; 9 yrs =0; 10 yrs =1 smoke: mother’s smoking status at start of the study (1=smoker, 0=nonsmoker) Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 135 / 143 Example 6.4 Ohio children (2) Some analysis has been done to the data in R, producing the following output. > head(ohio) resp id age smoke 1 0 0 -2 0 2 0 0 -1 0 3 0 0 0 0 4 0 0 1 0 5 0 1 -2 0 6 0 1 -1 0 > tail(ohio) resp id age smoke 2143 1 535 0 1 2144 1 535 1 1 2145 1 536 -2 1 2146 1 536 -1 1 2147 1 536 0 1 2148 1 536 1 1 Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 136 / 143 Example 6.4 Ohio children (3) > str(ohio) ‘data.frame’: 2148 obs. of 4 variables: $ resp : int 0 0 0 0 0 0 0 0 0 0 … $ id : int 0 0 0 0 1 1 1 1 2 2 … $ age : int -2 -1 0 1 -2 -1 0 1 -2 -1 … $ smoke: int 0 0 0 0 0 0 0 0 0 0 … > library(geepack} > fit.exch data=ohio, id=id, corstr = “exchangeable”, std.err=”san.se”) > summary(fit.exch) Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 137 / 143 Example 6.4 Ohio children (4) > summary(fit.exch) Call: geeglm(formula = resp ~ age + smoke, family = binomial(link = “logit”), data = ohio, id = id, corstr = “exchangeable”, std.err = “san.se”) Coefficients: Estimate Std.err Wald Pr(>|W|) (Intercept) -1.880 0.114 272.597 < 2e-16 *** age -0.113 0.044 6.684 0.00973 ** smoke 0.265 0.178 2.224 0.13588 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Estimated Scale Parameters: Estimate Std.err (Intercept) 0.9985 0.1116 Correlation: Structure = exchangeable Link = identity Estimated Correlation Parameters: Estimate Std.err alpha 0.3543 0.06244 Number of clusters: 537 Maximum cluster size: 4 Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 138 / 143 Example 6.4 Ohio children (5) Use the above output to answer the following questions. 1 Let yit be the response value resp of child i at age t. Write down the model involved in the analysis, including the mean, variance and correlation coefficient of yit ’s. Give the estimates of the parameters appearing in the model. 2 Write down the model’s design matrix for data where id=536. 3 Estimate the odds ratio of wheezing for a child for every one year older in age. Also calculate an approximate standard error of your odds ratio estimate. 4 Estimate the odds ratio of wheezing for a 10-year old child with a smoking mother versus a 9-year old child with nonsmoking mother. Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 139 / 143 Example 6.4 Ohio children (6) 1 Let yit be the response value resp of child i at age t. Write down the model involved in the analysis, including the mean, variance and correlation coefficient of yit ’s. Give the estimates of the parameters appearing in the model. Let piit = P(yit = 1|age, smoke), t = 1, · · · , 4; i = 1, · · · , 537. Then the model is specified by the following log pˆiit 1− pˆiit = βˆ0 + βˆ1 · age + βˆ2 · smoke = −1.880− 0.113× age + 0.265× smoke V̂ar(yit) = φˆpˆiit(1− pˆiit) = 0.9985e −1.880−0.113·age+0.265·smoke 1 + e−1.880−0.113·age+0.265·smoke Ĉorr(yit , yis) = αˆ = 0.3543 for t 6= s Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 140 / 143 Example 6.4 Ohio children (7) 2 Write down the model’s design matrix for data where id=536. Z (id = 536) = 1 −2 1 1 −1 1 1 0 1 1 1 1 Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 141 / 143 Example 6.4 Ohio children (8) 3 Estimate the odds ratio of wheezing for a child for every one year older in age. Also calculate an approximate standard error of your odds ratio estimate. ÔR = eβˆ1 = e−0.113 = 0.893 s.e.(ÔR) ≈ eβˆ1 · s.e.(βˆ1) = e−0.113 × 0.044 = 0.039. Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 142 / 143 Example 6.4 Ohio children (9) 4 Estimate the odds ratio of wheezing for a 10-year old child with a smoking mother versus a 9-year old child with nonsmoking mother. For a 10-year old child with a smoking mother log pˆiit 1− pˆiit = βˆ0 + βˆ1 · 1 + βˆ2 · 1 = −1.880− 0.113 + 0.265 For a 9-year old child with a non-smoking mother log pˆiit 1− pˆiit = βˆ0 + βˆ1 · 0 + βˆ2 · 0 = −1.880 Therefore ÔR = eβˆ0+βˆ1+βˆ2 eβˆ0 = eβˆ1+βˆ2 = e−0.113+0.265 = e0.152 = 1.164. Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 143 / 143