Lesson 3: Inferential Statistics

This lesson reviews inferential statistics. By the end of this lesson, you will be able to:

Describe the basic steps of hypothesis testing:
1. Define null and alternative hypothesis
2. Define alpha
3. Select appropriate test
4. Run analysis
5. Draw conclusion
Define type I and type II error
Define power and it’s typical value (0.80)
Define power analysis
Describe the importance of sample size

Terms that appear frequently throughout this lesson are defined below:

Term	Definition
Null hypothesis	The opposite of the hypothesis proposed, typically that there is no difference, relationship, or effect
Alternative hypothesis	The proposed hypothesis or idea, typically that there is a difference, relationship, or effect
Alpha	Probability of incorrectly rejecting the null hypothesis
Type I error	Detecting a difference when there isn’t one
P-value	Probability that the observed statistic occurred by chance if the null hypothesis is true
Beta	Probability of incorrectly failing to reject the null hypothesis
Type II error	Failing to detect a difference when there is one
Power	Ability of a test to detect a difference when a difference exists
Power analysis	Method for determining how large a sample size must be to detect a difference if in fact a difference exists

Inferential statistics allows us to make inferences about the population based on the sample that we have studied. Inferential statistics relies on the use of hypothesis testing, which follows these basic steps:

Hypothesis testing

Define the hypotheses. The null hypothesis is the opposite of the hypothesis proposed, typically the idea that a difference does not exist. The alternative hypothesis is the idea being researched, typically the idea that a difference does exist. Rejecting the null means that your statistical test has found a significant difference in the groups being researched.
Define alpha, or the probability of incorrectly rejecting the null hypothesis. Most statistical analysis uses an alpha-level of 0.05, which means that there is a 1 in 20 chance that you will find a difference when there isn’t one. Detecting a difference when there isn’t one is referred to as Type 1 error. Failing to detect a difference when there is one is referred to as Type II error. This is based on the beta-level, or the probability of incorrectly failing to reject the null.
Select a test. Based on the measurement scale of the data and other research design considerations, an appropriate statistical test should be selected. Basic statistical tests will be reviewed later in this module.
Run the analysis. The test will provide a p-value, or the probability that the observed statistic occurred by chance.
Draw a conclusion. The p-value is compared to alpha. A statistically significant relationship (p-value =< alpha) means it was unlikely that the difference detected by the statistical test occurred by chance, enabling us to conclude that the independent variable is related to the dependent variable. Failure to find a statistical significance (p-value > alpha) means that our observed data can be explained by chance.

	Decision: Accept Null Hypothesis	Decision: Reject Hull Hypothesis
Null Hypothesis = True	CORRECT!	type 1 error
Null Hypothesis = False	type 2 error	CORRECT!

Power to detect a difference

Power is the probability that a statistical test will correctly reject the null/correctly accept the alternative hypothesis. In other words, power is the ability of a test to detect a difference when a difference actually exists. Most researchers set power at 0.8, which means that 80% of the time, we will find a statistical significance if a difference actually exists.

Power is directly related to sample size and a power analysis can be used to determine how large a sample must be to detect a difference if in fact a difference exists.

n = 2δ²(Z_β+Z_α/2)² / difference²

where
n = sample size in each group (assumes equal-sized groups)
δ = standard deviation of the outcome variable
Z_β represents the desired power (typically .84 for 80% power)
Z_α represents the level of statistical significance (typically 1.96)
difference = effect size (the difference in means)

Beware! Small sample! In a small sample, it is harder to detect a statistical difference; no difference may simply mean that there was not enough power in the study to show a relationship even if there was one.

Beware! Large sample! It is easier to detect a statistical difference with a large sample, but there may be no practical or clinical significance in the difference. As seen in the table below, a difference of 10 with a sample size of 4 is just as statistically significant as a difference of 0.2 for a sample size of 10,000.

Sample Size	Sample Mean	Population Mean	P value
4	110.0	100.0	0.05
25	104.0	100.0	0.05
64	102.5	100.0	0.05
100	102.0	100.0	0.05
400	101.0	100.0	0.05
2,500	100.4	100.0	0.05
10,000	100.2	100.0	0.05

Table adapted from Norman GR, Streiner DL. PDQ Statistics. BC Decker, INC: Philadelphia; 1986.

Beware! Very small p-values! Smaller p-values do not indicate the importance or magnitude of significance. In other words p-value = 0.0001 is NOT more significant than p-value = 0.048. Small p-values only suggest that the observed statistic was less likely to have happened by chance. Statistics calculated from large sample sizes are more resistant to chance, as there are more people in the sample from the population you are interested in understanding. With more people in the sample from the population you are interested in, it is less likely that any difference detected was by chance, resulting in smaller p-values.

Again, a smaller p-value does not mean the detected difference is bigger or more significant.

Hypotheses

In the literature, the null hypothesis is often implied rather than clearly stated. Any table with p-values and any results written with a claim of significance are associated with a hypothesis, even if that hypothesis is not stated. When hypotheses are stated, they are sometimes accompanied by symbols H₀ for the null and H_Afor the alternative.

Example 1

Classical hypothesis testing was used to determine whether the period-2 values were significantly different from those from period 1 by testing whether the study mean Ln(R) was significantly different from 0. The hypotheses were:

H₀: µ_Ln(R) = 0; alpha = 0.05
H_A: µ_Ln(R) not equal to 0 [Miyazawa, et al., 2002]

Example 2

Given the lack of empirical evidence linking student perceptions to actual outcomes in the general health sciences literature, we operated under a null hypothesis of no relationship (correlation) between any student perception indicators and the actual outcomes as measured by PCOA scores. [Naughton and Friesner, 2012]

Power

The following statements demonstrate the use of a power calculation to determine the sample size needed to detect an effect or difference:

Example 1

We needed 45 patients to have 95% power to reject the null hypothesis that the mean serum digoxin concentration was within 10% of the mean predicted digoxin concentration. Patients were recruited from two general practices and had been taking digoxin for at least four months. Exclusion criteria were dementia, low adherence to digoxin, and use of other medications known to interact to a clinically important extent with digoxin. [Kroese, et al., 2005]

Example 2

Power analysis calculation with an alpha of 0.05 and beta of 0.8, indicated that a minimum sample size of 85 was necessary to find significance associated with a 10-point difference in examination performance. Ninety-five students consented to participate in the study. [McLaughlin, et al., 2014]