- May 26, 2020

Module Code: MATH5745M01 Q1. (a) Suppose that A is a square n× n matrix. What do you understand by the phrase “matrix A is not of full rank”? Outline some properties of a not full rank matrix. How would you check whether the matrix A is not of full rank? [8 marks] (b) Suppose that B is a symmetric matrix such that B > 0. Explain carefully what the notation “B > 0” tells you about the matrix B. Outline some properties of matrix B. [8 marks] (c) Suppose you are told that a square n×n matrix C is an orthogonal matrix. What does this tell you about matrixC? Outline some properties of matrix C. [6 marks] (d) The spectral decomposition theorem states: “Any symmetric (n×n) matrix S can be written as S = GDG′ where G is the matrix of standardized eigen-vectors of S and D is a diagonal matrix of eigen-values of S.” Discuss the importance of the spectral decomposition theorem to Multivariate Statistics. [4 marks] (e) Why is the multivariate normal distribution important in Multivariate Analysis? Discuss its usefulness and its limitations. [8 marks] (f) The figure below shows the results of a cluster analysis of some data for 28 countries based upon the values of two variables at the end of April 2020. Explain to a non-statistician what the plot shows (you do not need to know what the data is) and also explain the methodology used to construct the plot. Discuss whether alternative methods might be suitable for producing a more informative cluster analysis plot. [11 marks] 0 50 0 10 00 15 00 20 00 25 00 Data 30/4/2020: Average linkage − Euclidean distance D is ta nc e Br a zi l Po la nd Ja pa n S. K o re a M al ay sia Ira n R om an ia R us si a Fi nl an d N et he rla nd s Sw e de n Po rtu ga l UK Fr a n ce G er m a ny Is ra e l Ca na da N or wa y Au st ria D en m ar k Sp ai n Ic el an d Be lg iu m Ire la nd Si ng ap or e Ita ly US A Sw itz e rla nd Page 1 of 13 Turn the page over Module Code: MATH5745M01 Q2. (a) The figure shows the contours of probability density functions from three bivariate normal distributions with variables x1 and x2. They have the same mean vector µ = (5, 5)′, but different structures of covariance matrix Σ. (a) x1 x2 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0 2 4 6 8 10 0 2 4 6 8 10 (b) x1 x2 0.01 0.02 0.03 0.04 0.05 0.06 0.0 7 0.08 0 2 4 6 8 10 0 2 4 6 8 10 (c) x1 x2 0.05 0.1 0.15 0.2 0.3 0 2 4 6 8 10 0 2 4 6 8 10 Answer the following questions, and explain briefly your reasoning. (i) Which figure has the highest correlation between x1 and x2? [2 marks] (ii) Which figure shows independence between x1 and x2? [2 marks] (iii) Which figure has the lowest generalised population covariance |Σ|? [2 marks] (iv) What can you say about the relationship between x1 and x2 in each of figures (a) and (b) and (c) above? [3 marks] (v) If you are told that x2 = 6, what can you then say about the distribution of x1 in each of figures (a) and (b) and (c) above? [6 marks] (There is no need to try to estimate the mean and variance exactly.) Page 2 of 13 Turn the page over Module Code: MATH5745M01 Question Q2 continued: (b) An anthropologist is interested in the physical characteristics of adult males in an isolated tribe. He measures the total body height (x1), arm length (x2), and head circumference (x3) of 20 adult males. The measurements are all in centimeters. The sample mean vector x and sample covariance matrix S of the measurements are given by x = 154.661.7 53.4 , S = 7.095 2.768 2.6952.768 5.589 3.495 2.695 3.495 2.673 . (i) Let X be the data matrix of size 20× 3 for the above example. Suppose the anthropologist wishes to have the measurements of the data to be in inches instead of centimeters. Using matrix operations, how would you define a new data matrix Y with the measurements in inches? How would you obtain the sample covariance matrix S(Y) of the data matrix Y if given S? Do you expect the elements of S(Y) to be larger or smaller in magnitude than those of S? Explain your reasoning. [4 marks] (Note that there is no need to calculate the elements of S(Y). Also note that one inch is equal to 2.54 centimeters.) (ii) For the given sample mean vector x and the given sample covariance matrix S above, what can you deduce about the three variables x1, x2 and x3 and their inter-relationship? [4 marks] (iii) Let R be the sample correlation matrix of X. How would you obtain R using matrix operations if given S? What can you say when comparing R to the sample correlation matrix R(Y) based upon the data matrix Y? Explain your reasoning. [4 marks] (c) The anthropologist is interested in the null hypotheses H0 : Σ = Σ0, where Σ is the population covariance matrix of X and Σ0 = 5 0 00 5 0 0 0 5 . (i) Looking at the structure of Σ0, what is being hypothesised by the anthropol- ogist with regard to the dependencies between variables? Do you think this is a plausible hypothesis? Explain your answer. [3 marks] (ii) Suppose V is the maximum likelihood estimate of the covariance matrix of X. How do you calculate V if given S? [1 mark] (iii) From the above data, we have loge |Σ−10 V| = −2.64038, and trace(Σ−10 V) = 2.91783. Test the null hypothesis H0 at the 5% significance level. What do you conclude? [4 marks] (iv) If the null hypothesis were of the form Σ0 = vI, where I is a (3× 3) identity matrix, and given the values in (c)(iii) above, write down your new test statistic U in terms of v. What value of v makes U a minimum? [5 marks] Page 3 of 13 Turn the page over Module Code: MATH5745M01 Q3. Morphology is the branch of biology that deals with the form (structure) of living organ- isms. An expert measures the length (in cm) and weight (in hundred of grams) of 20 adult birds from the same species, but from two different sub-species (10 birds in each sub-species). The data can be seen in the following figure, where the points are marked differently to distinguish observations from sub-species 1 and sub-species 2. 40 42 44 46 48 4 6 8 10 12 14 Length (cm) W e ig ht (1 00 gr ) Sub−species 1 Sub−species 2 The sample mean for Variety 1 is y 1 = (45.4, 8.01)′ and for Variety 2 it is y 2 = (43.0, 10.06)′. The pooled sample covariance matrix Sp and its inverse S −1 p is given by: Sp = ( 3.578 2.053 2.053 2.002 ) , S−1p = ( 0.6795 −0.6970 −0.6970 1.2145 ) . (a) Describe the principle of linear discriminant analysis. [3 marks] (b) Looking solely from the horizontal axis (Length) or vertical axis (Weight) in the above figure, can you identify a clear separation between the two varieties? Explain briefly your reasoning. [2 marks] (c) Find the discriminant function from the above data. What can you say about the discriminant function line? Calculate the standardised coefficients of the discrimi- nant function. [5 marks] (d) Suppose the expert found two new observations. The first one is a bird with length 44 cm and weight 8.7 (in units of 100 grams). The second one is a bird with length 47 cm and weight 9.0 (in units of 100 grams). (i) Before performing any calculations, identify to which sub-species should each new observation be classified. Explain briefly your reasoning. [3 marks] (ii) Now write down the discriminant rule. Based on this rule, to which sub-species should each new observation be classified? [4 marks] Page 4 of 13 Turn the page over Module Code: MATH5745M01 Question Q3 continued: (e) Consider the following two plots each showing data from two different groups. 42 43 44 45 46 47 48 4 6 8 10 12 (a) x1 x2 0.0 0.5 1.0 1.5 2.0 2.5 3.0 4. 0 4. 4 4. 8 5. 2 (b) x1 x2 For each of the above two figures, answer these questions: (i) Do you see a group separation in the figure? (ii) Is linear discriminant analysis suitable for separating the groups? Explain your reasoning. [4 marks] (f) Discuss how linear discriminant analysis is similar to or is different from the aims of cluster analysis. [4 marks] Page 5 of 13 Turn the page over Module Code: MATH5745M01 Q4. Ten students in the School of Mathematics took the same set of modules in Semester 1, denoted M1, M2, M3, and M4. Their module marks are shown in the following table, where the students are labelled by A-J. Student M1 M2 M3 M4 A 60 57 65 56 B 65 63 63 48 C 58 56 64 57 D 67 58 60 66 E 65 49 60 50 F 52 52 54 47 G 57 50 60 54 H 55 60 62 53 I 73 63 66 61 J 67 55 59 51 The sample mean vector y and covariance matrix S of the data are given by y = 61.9 56.3 61.3 54.3 , S = 42.544 13.589 10.144 17.033 13.589 24.456 10.789 9.678 10.144 10.789 12.233 9.456 17.033 9.678 9.456 35.122 . (a) Explain the idea of principal component analysis. [3 marks] (b) Consider the following edited output of analysis in R, where dat contains the dataset above and “??” replaces a real number. > eigen(cov(dat)) $values [1] ?? 21.594106 18.779765 5.446926 $vectors [,1] [,2] [,3] [,4] [1,] 0.6785561 -0.55556959 0.4795013 -0.03134488 [2,] 0.3995959 -0.17495326 -0.7956614 -0.42028241 [3,] 0.2901608 0.01717662 -0.3320533 0.89735850 [4,] 0.5437751 0.81267383 0.1635297 -0.13087369 (i) What is the trace of S? What does this trace represent? [2 marks] (ii) What is the value that is replaced by ?? in the above R output? [1 mark] (iii) Calculate the cumulative proportion of variability of the principal components. Giving reasons, suggest how many principal components should be considered. [6 marks] (iv) Explain briefly your assessment of the loadings of the principal components you selected in part (b)(iii) above. What proportion of total variability in the data do your chosen principal components represent? [4 marks] Page 6 of 13 Turn the page over Module Code: MATH5745M01 Question Q4 continued: (c) The following figure shows the first two principal components from the above data, where the points have been replaced by the student labels. Comments on the patterns that you see in the figure, in light of your answer in part (b)(iv) above and the data table above. [4 marks] A B C D E F G H I J −15 −10 −5 0 5 10 15 − 10 − 5 0 5 10 z1 z2 (d) Suppose, hypothetically, the sample covariance matrix that you observe is in the form S⋆ = 250 10 10 10 10 10 8 8 10 8 10 8 10 8 8 10 . What potential problem could arise in the interpretation of the results of the analysis using S⋆? What solution do you recommend to deal with the problem? Explain your reasoning. Would your solution give the same results as using S⋆? [3 marks] (e) Explain briefly the differences and similarities between the aims of principal com- ponent analysis and those of factor analysis. [2 marks] Page 7 of 13 Turn the page over Module Code: MATH5745M01 Q5. An expert in modern language conducted an experiment in which the time (in millisec- onds) of pronunciation of two different syllables, denoted x and y, was measured in three different contexts from 12 unrelated individuals. The expert believes that both syllables are highly correlated in terms of the time to pronounce them across different contexts. Some of the data are shown in the following table: x1 x2 x3 y1 y2 y3 28 29 37 30 32 42 28 24 28 26 34 35 28 29 29 33 32 27 27 25 25 25 24 29 … … … … … … The overall sample covariance matrix S can be partitioned as follows: S = ( Sxx Sxy Syx Syy ) . The diagonal elements of S are given by (4.99, 12.73, 14.82, 12.45, 13.12, 27.42). The corresponding sample correlation matrix Rxy between x variables and y variables is given by Rxy = y1 y2 y3 x1 0.633 0.600 0.193 x2 0.390 0.442 −0.039 x3 0.438 0.416 0.451 . (a) Explain what is the aim of canonical correlation analysis. Why is it done? [2 marks] (b) Looking at the correlation matrix Rxy (and without doing any calculation), is the expert’s belief likely? Explain your reasoning. [2 marks] (c) Consider the following matrix Mx = S −1 xxSxyS −1 yy Syx, with eigenvalues and eigenvectors (as an output from R) satisfying: Page 8 of 13 Turn the page over Module Code: MATH5745M01 Question Q5(c) continued: > eigen(mx) $values [1] 0.721597936 0.231768804 0.002894897 $vectors [,1] [,2] [,3] [1,] -0.9084010 -0.3187745 -0.8398968 [2,] -0.1633495 -0.6918638 0.4608839 [3,] -0.3848694 0.6478482 0.2866345 (i) What is the largest canonical correlation, denoted r1, between x’s and y’s? [1 mark] (ii) What can be inferred when you compare r1 to the elements of Rxy? What could be the reason? Explain your answer. [2 marks] (d) Consider now the matrix My = S −1 yy SyxS −1 xxSxy with eigenvectors (as output from R) satisfying: $vectors [,1] [,2] [,3] [1,] -0.8744107 0.2947793 -0.72690136 [2,] -0.3866843 -0.7338849 0.68363101 [3,] -0.2930549 0.6119789 0.06529217 What are the eigenvalues of My? Explain briefly your reasoning. [2 marks] (e) (i) Which variables substantially contribute to the largest canonical correlation? Explain briefly your reasoning. [5 marks] (ii) Are the pair of canonical covariates you have considered in part (e)(i) above independent? Explain briefly your answer. [2 marks] (f) Test whether all correlations between x’s and y’s are zero at the 5% significance level. Is the expert’s belief justified? [5 marks] (g) If there was just one y variable and just one x variable, what would the matrices Mx andMy correspond to? What do the eigen-vectors ofMx andMy now equal? Explain your answer. [4 marks] Page 9 of 13 Turn the page over Module Code: MATH5745M01 Normal Distribution Function Tables The first table gives Φ(x) = 1√ 2pi ∫ x −∞ e− 1 2 t2dt and this corresponds to the shaded area in the figure to the right. Φ(x) is the probability that a random variable, normally distributed with zero mean and unit variance, will be less than or equal to x. When x < 0 use Φ(x) = 1−Φ(−x), as the normal distribution with mean zero is symmetric about zero. To interpolate, use the formula Φ(x) ≈ Φ(x1) + x− x1 x2 − x1 ( Φ(x2)− Φ(x1) ) −3 −2 −1 0 1 2 3 0. 0 0. 1 0. 2 0. 3 0. 4 x Table 1 x Φ(x) x Φ(x) x Φ(x) x Φ(x) x Φ(x) x Φ(x) 0.00 0.5000 0.50 0.6915 1.00 0.8413 1.50 0.9332 2.00 0.9772 2.50 0.9938 0.05 0.5199 0.55 0.7088 1.05 0.8531 1.55 0.9394 2.05 0.9798 2.55 0.9946 0.10 0.5398 0.60 0.7257 1.10 0.8643 1.60 0.9452 2.10 0.9821 2.60 0.9953 0.15 0.5596 0.65 0.7422 1.15 0.8749 1.65 0.9505 2.15 0.9842 2.65 0.9960 0.20 0.5793 0.70 0.7580 1.20 0.8849 1.70 0.9554 2.20 0.9861 2.70 0.9965 0.25 0.5987 0.75 0.7734 1.25 0.8944 1.75 0.9599 2.25 0.9878 2.75 0.9970 0.30 0.6179 0.80 0.7881 1.30 0.9032 1.80 0.9641 2.30 0.9893 2.80 0.9974 0.35 0.6368 0.85 0.8023 1.35 0.9115 1.85 0.9678 2.35 0.9906 2.85 0.9978 0.40 0.6554 0.90 0.8159 1.40 0.9192 1.90 0.9713 2.40 0.9918 2.90 0.9981 0.45 0.6736 0.95 0.8289 1.45 0.9265 1.95 0.9744 2.45 0.9929 2.95 0.9984 0.50 0.6915 1.00 0.8413 1.50 0.9332 2.00 0.9772 2.50 0.9938 3.00 0.9987 The inverse function Φ−1(p) is tabulated below for various values of p. Table 2 p 0.900 0.950 0.975 0.990 0.995 0.999 0.9995 Φ−1(p) 1.2816 1.6449 1.9600 2.3263 2.5758 3.0902 3.2905 Page 10 of 13 Turn the page over Module Code: MATH5745M01 Percentage Points of the χ2-Distribution This table gives the percentage points χ2ν(P ) for various values of P and degrees of freedom ν, as indicated by the figure to the right. If X is a variable distributed as χ2 with ν de- grees of freedom, P/100 is the probability that X ≥ χ2ν(P ). For ν > 100, √ 2X is approximately normally dis- tributed with mean √ 2ν − 1 and unit variance. 0 χ2ν(P ) P/100 Percentage points P ν 10 5 2.5 1 0.5 0.1 0.05 1 2.706 3.841 5.024 6.635 7.879 10.828 12.116 2 4.605 5.991 7.378 9.210 10.597 13.816 15.202 3 6.251 7.815 9.348 11.345 12.838 16.266 17.730 4 7.779 9.488 11.143 13.277 14.860 18.467 19.997 5 9.236 11.070 12.833 15.086 16.750 20.515 22.105 6 10.645 12.592 14.449 16.812 18.548 22.458 24.103 7 12.017 14.067 16.013 18.475 20.278 24.322 26.018 8 13.362 15.507 17.535 20.090 21.955 26.124 27.868 9 14.684 16.919 19.023 21.666 23.589 27.877 29.666 10 15.987 18.307 20.483 23.209 25.188 29.588 31.420 11 17.275 19.675 21.920 24.725 26.757 31.264 33.137 12 18.549 21.026 23.337 26.217 28.300 32.909 34.821 13 19.812 22.362 24.736 27.688 29.819 34.528 36.478 14 21.064 23.685 26.119 29.141 31.319 36.123 38.109 15 22.307 24.996 27.488 30.578 32.801 37.697 39.719 16 23.542 26.296 28.845 32.000 34.267 39.252 41.308 17 24.769 27.587 30.191 33.409 35.718 40.790 42.879 18 25.989 28.869 31.526 34.805 37.156 42.312 44.434 19 27.204 30.144 32.852 36.191 38.582 43.820 45.973 20 28.412 31.410 34.170 37.566 39.997 45.315 47.498 25 34.382 37.652 40.646 44.314 46.928 52.620 54.947 30 40.256 43.773 46.979 50.892 53.672 59.703 62.162 40 51.805 55.758 59.342 63.691 66.766 73.402 76.095 50 63.167 67.505 71.420 76.154 79.490 86.661 89.561 80 96.578 101.879 106.629 112.329 116.321 124.839 128.261 Page 11 of 13 Turn the page over Module Code: MATH5745M01 Percentage Points of the t-Distribution This table gives the percentage points tν(P ) for various values of P and degrees of freedom ν, as indicated by the figure to the right. The lower percentage points are given by sym- metry as −tν(P ), and the probability that |t| ≥ tν(P ) is 2P/100. The limiting distribution of t as ν → ∞ is the normal distribution with zero mean and unit vari- ance. 0 tν(P ) P/100 Percentage points P ν 10 5 2.5 1 0.5 0.1 0.05 1 3.078 6.314 12.706 31.821 63.657 318.309 636.619 2 1.886 2.920 4.303 6.965 9.925 22.327 31.599 3 1.638 2.353 3.182 4.541 5.841 10.215 12.924 4 1.533 2.132 2.776 3.747 4.604 7.173 8.610 5 1.476 2.015 2.571 3.365 4.032 5.893 6.869 6 1.440 1.943 2.447 3.143 3.707 5.208 5.959 7 1.415 1.895 2.365 2.998 3.499 4.785 5.408 8 1.397 1.860 2.306 2.896 3.355 4.501 5.041 9 1.383 1.833 2.262 2.821 3.250 4.297 4.781 10 1.372 1.812 2.228 2.764 3.169 4.144 4.587 11 1.363 1.796 2.201 2.718 3.106 4.025 4.437 12 1.356 1.782 2.179 2.681 3.055 3.930 4.318 13 1.350 1.771 2.160 2.650 3.012 3.852 4.221 14 1.345 1.761 2.145 2.624 2.977 3.787 4.140 15 1.341 1.753 2.131 2.602 2.947 3.733 4.073 16 1.337 1.746 2.120 2.583 2.921 3.686 4.015 18 1.330 1.734 2.101 2.552 2.878 3.610 3.922 21 1.323 1.721 2.080 2.518 2.831 3.527 3.819 25 1.316 1.708 2.060 2.485 2.787 3.450 3.725 30 1.310 1.697 2.042 2.457 2.750 3.385 3.646 40 1.303 1.684 2.021 2.423 2.704 3.307 3.551 50 1.299 1.676 2.009 2.403 2.678 3.261 3.496 70 1.294 1.667 1.994 2.381 2.648 3.211 3.435 100 1.290 1.660 1.984 2.364 2.626 3.174 3.390 ∞ 1.282 1.645 1.960 2.326 2.576 3.090 3.291 Page 12 of 13 Turn the page over Module Code: MATH5745M01 5 Percent Points of the F -Distribution This table gives the percentage points Fν1,ν2(P ) for P = 0.05 and degrees of freedom ν1, ν2, as indicated by the figure to the right. The lower percentage points, that is the values F ′ν1,ν2(P ) such that the probability that F ≤ F ′ν1,ν2(P ) is equal to P/100, may be found us- ing the formula F ′ν1,ν2(P ) = 1/Fν2,ν1(P ) 0 F (P ) P/100 ν1 ν2 1 2 3 4 5 6 12 24 ∞ 2 18.513 19.000 19.164 19.247 19.296 19.330 19.413 19.454 19.496 3 10.128 9.552 9.277 9.117 9.013 8.941 8.745 8.639 8.526 4 7.709 6.944 6.591 6.388 6.256 6.163 5.912 5.774 5.628 5 6.608 5.786 5.409 5.192 5.050 4.950 4.678 4.527 4.365 6 5.987 5.143 4.757 4.534 4.387 4.284 4.000 3.841 3.669 7 5.591 4.737 4.347 4.120 3.972 3.866 3.575 3.410 3.230 8 5.318 4.459 4.066 3.838 3.687 3.581 3.284 3.115 2.928 9 5.117 4.256 3.863 3.633 3.482 3.374 3.073 2.900 2.707 10 4.965 4.103 3.708 3.478 3.326 3.217 2.913 2.737 2.538 11 4.844 3.982 3.587 3.357 3.204 3.095 2.788 2.609 2.404 12 4.747 3.885 3.490 3.259 3.106 2.996 2.687 2.505 2.296 13 4.667 3.806 3.411 3.179 3.025 2.915 2.604 2.420 2.206 14 4.600 3.739 3.344 3.112 2.958 2.848 2.534 2.349 2.131 15 4.543 3.682 3.287 3.056 2.901 2.790 2.475 2.288 2.066 16 4.494 3.634 3.239 3.007 2.852 2.741 2.425 2.235 2.010 17 4.451 3.592 3.197 2.965 2.810 2.699 2.381 2.190 1.960 18 4.414 3.555 3.160 2.928 2.773 2.661 2.342 2.150 1.917 19 4.381 3.522 3.127 2.895 2.740 2.628 2.308 2.114 1.878 20 4.351 3.493 3.098 2.866 2.711 2.599 2.278 2.082 1.843 25 4.242 3.385 2.991 2.759 2.603 2.490 2.165 1.964 1.711 30 4.171 3.316 2.922 2.690 2.534 2.421 2.092 1.887 1.622 40 4.085 3.232 2.839 2.606 2.449 2.336 2.003 1.793 1.509 50 4.034 3.183 2.790 2.557 2.400 2.286 1.952 1.737 1.438 100 3.936 3.087 2.696 2.463 2.305 2.191 1.850 1.627 1.283 ∞ 3.841 2.996 2.605 2.372 2.214 2.099 1.752 1.517 1.002 Page 13 of 13 End.