- February 12, 2021

STA305/1004 – Week 4 (adapted from N. Taback) Finding Power, Intro to Causal Inference Week 4 Outline I Finding Power I Replication and Power: Case study on Power Poses I Power and Sample size formulae: Two-sample proportions I Power via simulation I Introduction to causal inference: I The fundamental problem I The assignment mechanism: Weight gain study Replication and Power: Case Study on Power poses: (Carney et al. (2010) Power Posing: Brief Nonverbal Display Affect Neuroendocrine Levels and Risk Tolerance, Psychological Science, 21(10), 1363-1368 Can power poses significantly change outcomes in your life? Study methods (Carney et al. (2010)): I Randomly assigned 42 participants to the high-power pose or the low-power-pose condition. I Participants believed that the study was about the science of physiological recordings and was focused on how placement of electrocardiography electrodes above and below the heart could influence data collection. I Participants’ bodies were posed by an experimenter into high-power or low-power poses. Each participant held two poses for 1 min each. I Participants’ risk taking was measured with a gambling task; feelings of power were measured with self-reports. I Saliva samples, which were used to test cortisol and testosterone levels, were taken before and approximately 17 min after the power-pose manipulation. Can power poses significantly change outcomes in your life? Study results (Carney et al. (2010)): As hypothesized, high-power poses caused an increase in testosterone compared with low-power poses, which caused a decrease in testosterone, F(1, 39) = 4.29, p < .05; r = .34. Also as hypothesized, high-power poses caused a decrease in cortisol compared with low-power poses, which caused an increase in cortisol, F(1, 38) = 7.45, p < .02; r = .43 Can power poses significantly change outcomes in your life? I The study was replicated by Ranehill et al. (2015) I An initial power analysis based on the effect sizes in Carney et al. (power = 0.8, α = .05) indicated that a sample size of 100 participants would be suitable. library(pwr) pwr.t.test(d=0.6,power = 0.8) Two-sample t test power calculation n = 44.58577 d = 0.6 sig.level = 0.05 power = 0.8 alternative = two.sided NOTE: n is number in *each* group Can power poses significantly change outcomes in your life? I Ranehill et al. study used a sample of 200 participants to increase reliability. I This study found none of the significant differences found in Carney et al.’s study. I The replication study obtained very precise estimates of the effects. I What happened? Can power poses significantly change outcomes in your life? I Sampling theory predicts that the variation between samples is proportional to 1√n .I In small samples we can expect variability. I Many researchers often expect that these samples will be more similar than sampling theory predicts. Study replication Suppose that you have run an experiment on 20 subjects, and have obtained a significant result from a two-sided z-test (H0 : µ = 0 vs.H1 : µ 6= 0) which confirms your theory (z = 2.23, p < 0.05, two-tailed). The researcher is planning to run the same experiment on an additional 10 subjects. What is the probability that the results will be significant at the 5% level by a one-tailed test (H1 : µ > 0), separately for this group? Week 4 Outline I Power and Sample size formulae: Two-sample proportions Comparing Proportions for Binary Outcomes I In many clinical trials, the primary endpoint is dichotomous, for example, whether a patient has responded to the treatment, or whether a patient has experienced toxicity. I Consider a two-arm randomized trial with binary outcomes. Let p1 denote the response rate of the experimental drug, p2 as that of the standard drug, and the difference is θ = p1 – p2. Comparing Proportions for Binary Outcomes Let Yik be the binary outcome for subject i in arm k; that is, Yik = { 1 with probability pk 0 with probability 1− pk , for i = 1, …, nk and k = 1, 2. The sum of independent and identically distributed Bernoulli random variables has a binomial distribution, nk∑ i=1 Yik ∼ Bin(nk , pk), k = 1, 2. (Yin, pg. 173-174) Comparing Proportions for Binary Outcomes The sample proportion for group k is pˆk = Y¯k = (1/nk) nk∑ i=1 Yik , k = 1, 2, and E ( Y¯k ) = pk and Var ( Y¯k ) = pk (1−pk )nk . The goal of the clinical trial is to determine if there is a difference between the two groups using a binary endpoint. That is, we want to test H0 : θ = 0 versus H0 : θ 6= 0. The test statistic (assuming that H0 is true) is: T = pˆ1 − pˆ2√ p1(1− p1)/n1 + p2(1− p2)/n2 ∼ N(0, 1), Comparing Proportions for Binary Outcomes The test rejects at level α if and only if |T | ≥ zα/2. Using the same argument as the case with continuous endpoints and ignoring terms smaller than α/2 we can solve for β β ≈ Φ ( zα/2 − |θ1|√ p1(1− p1)/n1 + p2(1− p2)/n2 ) . Comparing Proportions for Binary Outcomes Using this formula to solve for sample size. If n1 = r · n2 then n2 = ( zα/2 + zβ )2 θ2 (p1(1− p1)/r + p2(1− p2)) . Comparing Proportions for Binary Outcomes I The built-in R function power.prop.test() can be used to calculate sample size or power. I For example suppose that the standard treatment for a disease has a response rate of 20%, and an experimental treatment is anticipated to have a response rate of 28%. I The researchers want both arms to have an equal number of subjects. How many patients should be enrolled if the study will conduct a two-sided test at the 5% level with 80% power? power.prop.test(p1 = 0.2,p2 = 0.25,power = 0.8) Two-sample comparison of proportions power calculation n = 1093.739 p1 = 0.2 p2 = 0.25 sig.level = 0.05 power = 0.8 alternative = two.sided NOTE: n is number in *each* group Week 4 Outline I Power via simulation Calculating Power by Simulation I If the test statistic and distribution of the test statistic are known then the power of the test can be calculated via simulation. I Consider a two-sample t-test with 30 subjects per group and the standard deviation of the clinical outcome is known to be 1. I What is the power of the test H0 : µ1 − µ2 = 0 versus H0 : µ1 − µ2 = 0.5, at the 5% significance level? I The power is the proportion of times that the test correctly rejects the null hypothesis in repeated sampling. Calculating Power by Simulation We can simulate a single study using the rnorm() command. Let’s assume that n1 = n2 = 30, µ1 = 3.5, µ2 = 3, σ = 1, α = 0.05. set.seed(2301) t.test(rnorm(30,mean=3.5,sd=1),rnorm(30,mean=3,sd=1),var.equal = T) Two Sample t-test data: rnorm(30, mean = 3.5, sd = 1) and rnorm(30, mean = 3, sd = 1) t = 2.1462, df = 58, p-value = 0.03605 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 0.03458122 0.99248595 sample estimates: mean of x mean of y 3.339362 2.825828 Should you reject H0? Calculating Power by Simulation I Suppose that 10 studies are simulated. I What proportion of these 10 studies will reject the null hypothesis at the 5% level? I To investigate how many times the two-sample t-test will reject at the 5% level the replicate() command will be used to generate 10 studies and calculate the p-value in each study. I It will still be assumed that n1 = n2 = 30, µ1 = 3.5, µ2 = 3, σ = 1, α = 0.05. set.seed(2301) pvals rnorm(30,mean=3,sd=1), var.equal = T)$p.value) pvals # print out 10 p-values [1] 0.03604893 0.15477655 0.01777959 0.40851999 0.34580930 0.11131007 [7] 0.14788381 0.00317709 0.09452230 0.39173723 #power is the number of times the test rejects at the 5% level sum(pvals<=0.05)/10 [1] 0.3 Calculating Power by Simulation But, since we only simulated 10 studies the estimate of power will have a large standard error. So let’s try simulating 10,000 studies so that we can obtain a more precise estimate of power. set.seed(2301) pvals rnorm(30,mean=3,sd=1), var.equal = T)$p.value) sum(pvals<=0.05)/10000 [1] 0.4881 Calculating Power by Simulation This is much closer to the theoretical power obtained from power.t.test(). power.t.test(n = 30,delta = 0.5,sd = 1,sig.level = 0.05) Two-sample t test power calculation n = 30 delta = 0.5 sd = 1 sig.level = 0.05 power = 0.477841 alternative = two.sided NOTE: n is number in *each* group Calculating Power by Simulation I The built-in R functions power.t.test() and power.prop.test() don’t have an option for calculating power where the there is unequal allocation of subjects between groups. I These built-in functions don’t have an option to investigate power if other assumptions don’t hold (e.g., normality). I One option is to simulate power for the scenarios that are of interest. Another option is to write your own function using the formula derived above. Calculating Power by Simulation I Suppose the standard treatment for a disease has a response rate of 20%, and an experimental treatment is anticipated to have a response rate of 28%. I The researchers want both arms to have an equal number of subjects. I A power calculation above revealed that the study will require 1094 patients for 80% power. I What would happen to the power if the researchers put 1500 patients in the experimental arm and 500 patients in the control arm? Calculating Power by Simulation I The number of subjects in the experimental arm that have a positive response to treatment will be an observation from a Bin(1500, 0.28). I The number of subjects that have a positive response to the standard treatment will be an observation from a Bin(500, 0.2). I We can obtain simulated responses from these distributions using the rbinom() command in R. set.seed(2301) rbinom(1,1500,0.28) [1] 403 rbinom(1,500,0.20) [1] 89 Calculating Power by Simulation I The p-value for this simulated study can be obtained using prop.test(). set.seed(2301) prop.test(x=c(rbinom(1,1500,0.28),rbinom(1,500,0.20)), n=c(1500,500),correct = F) 2-sample test for equality of proportions without continuity correction data: c(rbinom(1, 1500, 0.28), rbinom(1, 500, 0.2)) out of c(1500, 500) X-squared = 16.62, df = 1, p-value = 4.568e-05 alternative hypothesis: two.sided 95 percent confidence interval: 0.05032654 0.13100680 sample estimates: prop 1 prop 2 0.2686667 0.1780000 Calculating Power by Simulation I A power simulation repeats this process a large number of times. I In the example below we simulate 10,000 hypothetical studies to calculate power. set.seed(2301) pvals prop.test(x=c(rbinom(n = 1,size = 1500,prob = 0.25), rbinom(n=1,size=500,prob=0.20)), n=c(1500,500),correct=F)$p.value) sum(pvals<=0.05)/10000 [1] 0.6231 If the researchers decide to have a 3:1 allocation ratio of patients in the treatment to control arm then the power will be _____? Week 4 Outline I Introduction to Causal Inference Introduction to causal inference – Bob’s headache I Suppose Bob, at a particular point in time, is contemplating whether or not to take an aspirin for a headache. I There are two treatment levels, taking an aspirin, and not taking an aspirin. I If Bob takes the aspirin, his headache may be gone, or it may remain, say, an hour later; we denote this outcome, which can be either “Headache” or “No Headache,” by Y (Aspirin). I Similarly, if Bob does not take the aspirin, his headache may remain an hour later, or it may not; we denote this potential outcome by Y (No Aspirin), which also can be either “Headache,” or “No Headache.” I There are therefore two potential outcomes, Y (Aspirin) and Y (No Aspirin), one for each level of the treatment. The causal effect of the treatment involves the comparison of these two potential outcomes. Introduction to causal inference – Bob’s headache Because in this example each potential outcome can take on only two values, the unit- level causal effect – the comparison of these two outcomes for the same unit – involves one of four (two by two) possibilities: 1. Headache gone only with aspirin: Y(Aspirin) = No Headache, Y(No Aspirin) = Headache 2. No effect of aspirin, with a headache in both cases: Y(Aspirin) = Headache, Y(No Aspirin) = Headache 3. No effect of aspirin, with the headache gone in both cases: Y(Aspirin) = No Headache, Y(No Aspirin) = No Headache 4. Headache gone only without aspirin: Y(Aspirin) = Headache, Y(No Aspirin) = No Headache Introduction to causal inference – Bob’s headache There are two important aspects of this definition of a causal effect. 1. The definition of the causal effect depends on the potential outcomes, but it does not depend on which outcome is actually observed. 2. The causal effect is the comparison of potential outcomes, for the same unit, at the same moment in time post-treatment. I The causal effect is not defined in terms of comparisons of outcomes at different times, as in a before-and-after comparison of my headache before and after deciding to take or not to take the aspirin. The fundamental problem of causal inference “The fundamental problem of causal inference” (Holland, 1986, p. 947) is the problem that at most one of the potential outcomes can be realized and thus observed. I If the action you take is Aspirin, you observe Y (Aspirin) and will never know the value of Y (No Aspirin) because you cannot go back in time. I Similarly, if your action is No Aspirin, you observe Y (No Aspirin) but cannot know the value of Y (Aspirin). I In general, therefore, even though the unit-level causal effect (the comparison of the two potential outcomes) may be well defined, by definition we cannot learn its value from just the single realized potential outcome. The fundamental problem of causal inference The outcomes that would be observed under control and treatment conditions are often called counterfactuals or potential outcomes. I If Bob took aspirin for his headache then he would be assigned to the treatment condition so Ti = 1. I Then Y (Aspirin) is observed and Y (No Aspirin) is the unobserved counterfactual outcome—it represents what would have happened to Bob if he had not taken aspirin. I Conversely, if Bob had not taken aspirin then Y (No Aspirin) is observed and Y (Aspirin) is counterfactual. I In either case, a simple treatment effect for Bob can be defined as treatment effect for Bob = Y (Aspirin)− Y (No Aspirin). I The problem is that we can only observe one outcome. The assignment mechanism I Assignment mechanism: The process for deciding which units receive treatment and which receive control. I Ignorable Assignment Mechanism: The assignment of treatment or control for all units is independent of the unobserved potential outcomes (“nonignorable” means not ignorable) I Unconfounded Assignment Mechanism: The assignment of treatment or control for all units is independent of all potential outcomes, observed or unobserved (“confounded” means not unconfounded) The assignment mechanism I Suppose that a doctor prescribes surgery (labeled 1) or drug (labeled 0) for a certain condition. I The doctor knows enough about the potential outcomes of the patients so assigns each patient the treatment that is more beneficial to that patient. unit Yi(0) Yi(1) Yi(1)− Yi(0) patient #1 1 7 6 patient #2 6 5 -1 patient #3 1 5 4 patient #4 8 7 -1 Average 4 6 2 Y is years of post-treatment survival. The assignment mechanism I Patients 1 and 3 will receive surgery and patients 2 and 4 will receive drug treatment. I The observed treatments and outcomes are in this table. unit Ti Y obsi Yi(1) Yi(0) patient #1 1 7 patient #2 0 6 patient #3 1 5 patient #4 0 8 Average Drug 7 Average Surg 6 I This shows that we can reach invalid conclusions if we look at the observed values of potential outcomes without considering how the treatments were assigned. I The assignment mechanism depended on the potential outcomes and was therefore nonignorable (implying that it was confounded). The assignment mechanism The observed difference in means is entirely misleading in this situation. The biggest problem when using the difference of sample means here is that we have effectively pretended that we had an unconfounded treatment assignment when in fact we did not. This example demonstrates the importance of finding a statistic that is appropriate for the actual assignment mechanism. The assignment mechanism Is the treatment assignment ignorable? I The doctor knows enough about the potential outcomes of the patients so assigns each patient the treatment that is more beneficial to that patient. I Suppose that a doctor prescribes surgery (labeled 1) or drug (labeled 0) for a certain condition by tossing a biased coin that depends on Y (0) and Y (1), where Y is years of post-treatment survival. I If Y (1) ≥ Y (0) then P(Ti = 1|Yi(0),Yi(1)) = 0.8. I If Y (1) < Y (0) then P(Ti = 1|Yi(0),Yi(1)) = 0.3. unit Yi(0) Yi(1) p1 p0 patient #1 1 7 0.8 0.2 patient #2 6 5 0.3 0.7 patient #3 1 5 0.8 0.2 patient #4 8 7 0.3 0.7 where, p1 = P(Ti = 1|Yi(0),Yi(1)),and p0 = P(Ti = 0|Yi(0),Yi(1)). Weight gain study From Holland and Rubin (1983). “A large university is interested in investigating the effects on the students of the diet provided in the university dining halls and any sex differences in these effects. Various types of data are gathered. In particular, the weight of each student at the time of his [or her] arrival in September and his [or her] weight the following June are recorded.” I The average weight for Males was 180 in both September and June. Thus, the average weight gain for Males was zero. I The average weight for Females was 130 in both September and June. Thus, the average weight gain for Females was zero. I Question: What is the differential causal effect of the diet on male weights and on female weights? I Statistician 1: Look at gain scores: No effect of diet on weight for either males or females, and no evidence of differential effect of the two sexes, because no group shows any systematic change. I Statistician 2: Compare June weight for males and females with the same weight in September: On average, for a given September weight, men weigh more in June than women. Thus, the new diet leads to more weight gain for men. I Is Statistician 1 correct? Statistician 2? Neither? Both? Weight gain study Questions: 1. What are the units? 2. What are the treatments? 3. What is the assignment mechanism? 4. Is the assignment mechanism useful for causal inference? 5. Would it have helped if all males received the dining hall diet and all females received the control diet? 6. Is Statistician 1 or Statistician 2 correct? Getting around the fundamental problem by using close substitutes I Are there situations where you can measure both Y 0i and Y 1i on the same unit? I Drink tea one night and milk another night then measure the amount of sleep. What has been assumed? I Divide a piece of plastic into two parts then expose each piece to a corrosive chemical. What has been assumed? I Measure the effect of a new diet by comparing your weight before the diet and your weight after. What has been assumed? I There are strong assumptions implicit in these types of strategies. Getting around the fundamental problem by using randomization and experimentation I The “statistical” idea of using the outcomes observed on a sample of units to learn about the distribution of outcomes in the population. I The basic idea is that since we cannot compare treatment and control outcomes for the same units, we try to compare them on similar units. I Similarity can be attained by using randomization to decide which units are assigned to the treatment group and which units are assigned to the control group. Getting around the fundamental problem by using randomization and experimentation I It is not always possible to achieve close similarity between the treated and control groups in a causal study. I In observational studies, units often end up treated or not based on characteristics that are predictive of the outcome of interest (for example, men enter a job training program because they have low earnings and future earnings is the outcome of interest). I Randomized experiments can be impractical or unethical. I When treatment and control groups are not similar, modeling or other forms of statistical adjustment can be used to fill in the gap. Fisherian Randomization Test I The randomization test is related to a stochastic proof by contradiction giving the plausibility of the null hypothesis of no treatment effect. I The null hypothesis is Y 0i = Y 1i , for all units. I Under the null hypothesis all potential outcomes are known from Y obs since Y obs = Y 1 = Y 0. I Under the null hypothesis, the observed value of any statistic, such as y¯ 1 − y¯ 0 is known for all possible treatment assignments. I The randomization distribution of y¯ 1 − y¯ 0 can then be obtained. I Unless the data suggest that the null hypothesis of no treatment effect is false then it’s difficult to claim evidence that the treatments are different.