Skip to main content
留学咨询

辅导案例-STA261H1

By May 15, 2020No Comments

Bismillahir Rahmanir Rahim STA261H1: Probability and Statistics II Assignment Winter 2020 Submission deadline: Mar 30, 11.59pm (Late submissions will not be accepted) Instructions on completing the assignment The numerical calculations involved in this assignment are simple and you are already familiar with them (hopefully). The goal of this work is to help you “see” some of the theorems and concepts we have learned or used in this course using empirical data. Calculations are mostly repetitive in nature! I suggest using R (or any other programming language that you are comfortable with). Instructions on creating documents for submission • Please create 4 separate pdfs (one for each question). • I recommend using R-markdown(if you are familiar with it). If you are not familiar with R-markdown, you can write your answers using microsoft word and in the end save it as pdfs. Pdf is the only acceptable format of files. • We will use crowdmark for submission and grading. You will have to upload 4 separate documents as your answers to four separate questions. Crowdmark links to upload your documents will be emailed to you later this week. Academic Integrity Each student will work alone. If you need clarification on any of these questions, you are allowed to ask questions on Piazza. Don’t ask for solutions to anyone. Do not share your codes or answers on any platform. 1 Question 1 [10 points]. Suppose you have a population of size 5 [i.e. N=5]. You measure some quantity (X) and the corresponding numbers are: 11, 12, 13, 14, 15 a) Calculate the population mean (µ) b) Calculate the population variance (σ2) using the formula σ2 = ∑N j=1(Xj−µ)2 N c) Imagine you are taking samples (of size n = 3) from this population with replacement. Write down every possible way that you could have a sample of size 3 with replacement from this population. (hint: there will 5*5*5 = 125 possible combinations) Help: if you are struggling with figuring out the combinations try this code in R: expand.grid( c(11:15), c(11:15), c(11:15) ) d) For each of these samples of size 3, calculate the sample mean and record it (either as a new object in R or as a new column if you are using excel). Lets call this new column X bar. So you should have 125 values in this column. e) You should have noticed that the values in the X bar column are repetitive. For example, 11.3333333 will show up 3 times. Construct a frequency table based on the column X bar. [i.e. write down which values showed up how many times]. Now using the frequencies (also known as counts) calculate proportion of each of those repeated values. [For example: proportion of 11.3333333 will be 3/125] f) Plot these proportions against the values and connect the points using a non-linear line. Does the shape of this plot look like any known distribution? Name the distribution. g) Using the table of proportions or otherwise, calculate the mean of these 125 numbers (values under X bar) and compare it to your answer of 1(a). h) Using the table of proportions or otherwise, calculate the variance of these 125 numbers. Use the population variance formula (i.e. divide by 125 not 124). What is the relationship of this answer to your answer of 1(b)? i) Which theorem did you demonstrate empirically in part f, g and h? (No output needed for part(c and d) of this question) 2 Question 2 [5 points]. This question continues from question 1(c). For each of these sample of size 3, calculate the sample variance using the following two formulas S2 = 1 n− 1 ∑ (Xi − X¯)2 and σˆ2 = 1 n ∑ (Xi − X¯)2 Assume the population variance, σ2 = 2. (you should get 125 different values of S2 and and 125 different values of σˆ2) a. By calculating (numerically using the 125 different values) Bias[S2] and Bias[σˆ2] check the unbiasedness of these two estimators. b. By calculating all three components separately check the following identity MSE[σˆ2] = var[σˆ2] + (Bias[σˆ2])2 Question 3 [5 points]. Even though we need sample size n to be large to apply central limit theorem, but let’s apply it anyway. Suppose you know that the population variance, σ2 = 2. a. For each of these 125 cases, calculate a 95% confidence interval and finally calculate the propor- tion of the intervals that includes µ = 13. b. Suppose someone observes only one of these 125 combinations (13,14,15). If that person is testing the null hypothesis H0 : µ = 13 at level of significance, α = 0.05, based on this observed set of three numbers calculate the p-value that the person will get using central limit theorem. c. Calculate the p-value numerically using the 125 X¯ values that you calculated in part 1(d) (do not use CLT here). d. Why do you see a difference in your calculation in part(b) and part(c)? 3 Question 4 [10 points]. In week 3, we demonstrated an R code that replicates the sample distribution of X¯. Here is the code that was used in the lecture. Simply change the distribution and number of samples on line 2 of this code to do this question. Produce the density of X¯ = X1+X2+…+Xnn a) when n = 2, X ∼ Unif [0, 1] b) when n = 5, X ∼ Unif [0, 1] c) when n = 5, X ∼ χ2df=2 d) when n = 30, X ∼ χ2df=2 e) when n = 5, X ∼ χ2df=50 f) CLT says for large n, X¯ converges(in distribution) to a Normal distribution. By comparing your graphs from parts (a) to (e), can you comment on how large n has to be in order for X¯ to converge to a Normal distribution. What role the skewness of the original distributions(Unif [0, 1], χ2df=2 and χ 2 df=50) play here? 4

admin

Author admin

More posts by admin