- June 9, 2020

STAT7002 Examination 2018 Page 1 Answer ALL questions. Section A carries 40% of the total marks and Section B carries 60% of the total marks. The relative weights attached to each question are as follows: A1 (9), A2 (8), A3 (8), A4 (9), A5 (6); B1 (20), B2 (20), B3 (20). The numbers in square brackets indicate the relative weight attached to each part question. An appendix containing some formulae from the STAT7002 course is provided at the end of this examination paper. Section A A1. A travel company uses the following questions on a questionnaire given to tourists as part of a visit to London. For each question, briefly identify a potential problem that might lead to bias. Explain your reasoning. (a) ‘The Phantom of the Opera has played continuously at Her Majesty’s Theatre since 1986, winning over 70 major theatre awards and receiving much critical acclaim. Did you see it during your stay?’ [3] (b) ‘Do you think that public transport in London is easy to use and reasonably priced? Please circle a response.’ YES NO [3] (c) ‘In the box below, please write down the amount that you spent during your stay in London (not including money spent on accommodation and travel).’ £ [3] A2. A British town contains 25000 eligible voters. Before an election, a political researcher performs a simple random sample of 400 eligible voters. Each sampled voter is asked if they intend to vote for the Labour party candidate, with two possible responses: ‘Yes ’ or ‘No’. Of the sampled voters, 118 answered ‘Yes ’ to this question. (a) Define the term simple random sample. [2] (b) Assuming that there was no non-response, calculate an estimate and an associated 95% confidence interval for the proportion of Labour voters in the town. You should define any notation that you introduce and show your working clearly. [6] Turn Over STAT7002 Examination 2018 Page 2 A3. A property investor owns four houses in London. The values of the houses are shown in the table below, with letters (A–D) to denote the different houses. House Value (£) A 499,950 B 525,000 C 610,000 D 774,950 (a) A prospective buyer wishes to view two of the property investor’s houses. The buyer will choose which houses to view using a simple random sample. Assuming this sampling approach, derive the sampling distribution of the sample mean house value. You should define any notation or terms that you introduce. [5] (b) Using your answer to (a), calculate the expectation of the sample mean. [3] A4. The descriptions (a)–(c) outline sampling schemes. For each of (a)–(c), identify the type of sampling scheme used and describe a potential problem with the proposed sampling scheme. Justify your answers. (a) A researcher wants to know about the experience of passengers who use the London Underground. A questionnaire is devised and the researcher stands outside Hol- born Station between 0900 and 1100 on a given Monday morning, asking potential respondents who pass by to participate in a survey. [3] (b) An investigative reporter is interested in finding out about the living conditions of illegal workers. The reporter knows three illegal workers, who agree to participate in a study. These illegal workers are asked to invite any other illegal workers, whom they know, to participate in the same study. [3] (c) A high school contains 1200 pupils aged 11–16. The list of pupils in the school is ordered by date of birth (youngest to oldest) and every tenth pupil on the ordered list is selected to participate in a school sports event, until an overall sample size of 50 pupils is reached. [3] Continued STAT7002 Examination 2018 Page 3 A5. Over a 30 month period, 60 obese males participated in a weight loss study. The weight of each study participant was recorded at several time points. To show the change in mean weight of the participants over time, the study research team produced the following visual display of their data. l l l l l 0 50 10 0 15 0 20 0 25 0 30 0 0 3 12 24 30 (a) Identify two, distinct, problematic features of this visual display. Justify your an- swer. [2] (b) Identify the scale type used for each of the following study variables. You should justify your answer in each case. (i) A participant’s weight (in pounds). (ii) The number of visits to the gym that a participant makes. [4] Turn Over STAT7002 Examination 2018 Page 4 Section B B1. Researchers from a town’s council want to measure residents’ attitudes concerning the living environment in their town. Below are two statements that the researchers aim to present to a sample of residents as part of a questionnaire. ‘There is too much litter around the town centre.’ ‘Public spaces and gardens within our town are well maintained.’ (a) Using these statements as examples, describe how a Likert Scale could be con- structed in this questionnaire to measure the attitude of residents concerning the living environment in the town. Your answer should include a description of polarity and a definition of the polarity of each of the above statements. [9] (b) Describe how Likert Scale responses for a single item (such as either of those above) could be summarised and presented for a sample of residents who complete the questionnaire. [3] (c) Explain what is meant by the reliability of a measurement instrument and describe how the reliability of a questionnaire, in which several responses are used to measure the same attitude with a Likert scale, may be assessed. [4] The council decide that they will sample 200 of the town’s residents; the target population for the study is all adult residents of the town (20000 people). Sampling will be done by randomly selecting e-mail addresses of people who have paid Council Tax using the council’s online payment system, with a link to an online questionnaire sent to each selected e-mail address. (d) Is this proposed sampling scheme satisfactory? Justify your answer. [4] Continued STAT7002 Examination 2018 Page 5 B2. Suppose that Y1, . . . , YN are binary variables in a population of size N ∈ N with N > 2. The population proportion is given by P = 1 N N∑ i=1 Yi. A researcher wants to draw a simple random sample of size n from this population (where n < N), in which the sampled variables are denoted y1, . . . , yn. (a) Denoting Pˆ as the sample mean of the n sampled binary variables, show that Var(Pˆ ) = P (1− P )(N − n) n(N − 1) . You may use the following without proof Cov(yj, yk) = −P (1− P ) N − 1 for j 6= k. [6] (b) The researcher wants to sample enough binary variables so that the standard error of Pˆ is less than some pre-specified positive constant c. Show that the number of sampled variables, n, should satisfy n > [ 4(N − 1)c2 N + 1 N ]−1 . [5] A high school, in which the number of registered pupils is 900, wishes to perform a simple random sample of pupils. Each sampled pupil will be sent a postal questionnaire on various aspects of school life. One of the questions will ask ‘Overall, are you satisfied with the standard of teaching at school? ’ with respondents given two answer options of ‘Yes ’ or ‘No’. It is assumed that the proportion of pupils who would not answer this question is 10%. (c) Calculate the number of pupils that should be sampled so that the proportion of pupils who are satisfied with the school’s standard of teaching can be estimated with a standard error no larger than 0.03. You should show your working clearly and define carefully any assumptions that you make. [5] (d) The school’s headteacher assumes that pupils who are not satisfied with the standard of teaching at the school are less likely to answer the question on the standard of teaching than other pupils in the school, leading to missing data for some of the answers to this question. Describe this missing data assumption, using words and appropriate mathematical notation. [4] Turn Over STAT7002 Examination 2018 Page 6 B3. A town contains two medical centres (labelled A and B). Centre A has 2000 registered adult patients and Centre B has 3000 registered adult patients. A medical researcher carries out a stratified random sample of adult patients from these medical centres, with stratification done by the centre at which a patient is registered. A total of 400 adult patients are sampled (150 registered at Centre A and 250 registered at Centre B) and the body mass index (BMI) is recorded for each sampled patient. For patients sampled from Centre A, the sample mean and sample standard deviation BMI are 25.2 kg/m2 and 3.8 kg/m2, respectively. For patients sampled from Centre B, the sample mean and sample standard deviation BMI are 28.1 kg/m2 and 4.1 kg/m2, respectively. (a) Define the term stratified random sample. [3] (b) Calculate an estimate of the mean BMI of adults in the town and an associated 95% confidence interval. You should show your working and define any notation or terms that you introduce. [8] Another researcher plans to sample 400 of the town’s households at random and collect data on the BMI of adult occupants of each sampled home. (c) Identify the sampling approach that this researcher has proposed. Justify your answer. [3] (d) Assuming that this sampling approach is used, write down an appropriate statistical model for BMI that accounts for variability between adults within households and for variability between households. You should define any notation or terms that you introduce. [6] Continued STAT7002 Examination 2018 Page 7 STAT7002 Social Statistics: Some formulae Below are some formulae from the STAT7002 course notes. Note that these formulae are just copied, there is no properly introduced notation and no explanation regarding each formula. The same symbol may mean different things in different formulae and may not necessarily apply to any examination question where the same symbol is used. There is no guarantee that any of these formulae is needed in the examination. In addition, there is no guarantee that all formulae required for this examination are listed below. ese(µˆ) = √ s2(1− f) n , ese ˆ(T ) = N √ s2(1− f) n , ese(Pˆ ) = √ pq(1− f) n− 1 . ese(Tˆ ) = √∑ i N2i s 2 i (1− fi)/ni, ese(µˆ) = √∑ i W 2i s 2 i (1− fi)/ni . ese(Pˆ ) = √∑ i W 2i (1− fi)piqi/(ni − 1) , f = n N . 1 n ≤ ( k S )2 + 1 N , 1 n ≤ 4(N − 1)k 2 N + 1 N , 1 n ≤ ( k CV )2 + 1 N α = kr¯ 1 + (k − 1)r¯ , ni = Ni N n, ni = WiSi/ √ λci, ni = ( NiSi∑k i=1NiSi ) n µˆ = k∑ i=1 Wiy¯i, √ λ = ∑k i=1 √ ciWiSi C − c0 , √ λ = V + ∑k i=1W 2 i S 2 i /Ni∑k i=1 √ ciWiSi var(µˆ) = k∑ i=1 W 2i S 2 i (1− fi)/ni, ρ = σ2u σ2u + σ 2 ε α = k k − 1 ( 1− ∑k i=1 s 2 i s2Y ) , var(µˆ) = 1 n ( k∑ i=1 WiSi )2 − 1 N k∑ i=1 WiS 2 i , S2i = PiQiNi/(Ni − 1) ≈ PiQi, Tˆ = ∑ j Nj y¯j, X 2 = ∑ i (Oi − Ei)2 Ei deff = 1 + ρ(m¯− 1), µˆcl = ∑n i=1 yi∑n i=1mi , µˆIPW = ∑k i=1wiyi∑k i=1wini End of Paper