Lesson 4: Measurement Considerations

This lesson reviews some measurement considerations, including the difference between parametric and nonparametric data. By the end of this lesson, you will be able to:

Define and differentiate between bias, validity, reliability, sensitivity, and specificity
Contrast correlated and independent variables
Identify when to use parametric or nonparametric statistics

Terms that appear frequently throughout this lesson are defined below:

Term	Definition
Data quality
Bias	Systematic error introduced by selecting or encouraging one outcome over others
Validity	Extent to which a measure actually represents what it claims to measure
Reliability	Degree to which results are stable and consistent
Sensitivity	Proportion of positives that are correctly identified
Specificity	Proportion of negatives that are correctly identified
Data relationships
Correlated	A statistical relationship existing between two variables or datasets that reflects a dependence between the two
Independent	The occurrence of one variable does not influence the probability of another variable
Data parametrics
Parametric	Data with an underlying normal distribution
Nonparametric	Data for which the probability distribution is unknown or known not to be normal

I. Data quality

A number of quality standards should be considered when collecting, analyzing, and reporting data:

Bias – Bias is any error that is introduced into a study by promoting or selecting one outcome over others. It can occur at any phase of research, including study design, data collection, analysis, and publication (see figure below). When conducting or reading about research, you must consider the degree to which bias interfered with the study and influenced the conclusions.

Click here for a detailed description. Borrowed from: Pannucci CJ, Wilkins EG. Identifying and avoiding bias in research. Plast Reconstr Surg. 2010;126(2):619-625.
Validity – Validity is the extent to which a measure actually represents what it claims to measure. You will see many types of validity talked about in the literature. Common types of validity include:
- External validity – The extent to which a finding can be generalized beyond the research
- Internal validity – The extent to which the research itself upholds the highest standards of quality and limits possible confounders
- Construct validity – How well a test or experiment measures a specific construct; convergent validity means constructs that should be related are related; divergent validity means that constructs that should not be related are not related
- Content validity – The extent to which an instrument or test represents all elements of a given construct
- Face validity – The extent to which an instrument, project, or measure appears to measure what it is intended to measure
Reliability – Reliability is the consistency of a measure. Factors that can impact consistency may include maturation, fatigue, motivation, distraction, test conditions or context, and test or instrument quality. Common types of reliability include:
- Inter-rater reliability – The extent of agreement between two or more raters
- Test-retest reliability – The degree to which test or instrument scores are consistent from one point in time to the next (the test taker and test conditions must be the same at both points in time)
- Internal consistency reliability – The consistency of responses across items on a single instrument or test
Adapted from experiment-resources.com
Sensitivity (also called the true positive rate) – The proportion of actual positives correctly identified as so. For example, the proportion of sick people who are correctly identified as sick
Specificity (also called the true negative rate) – The proportion of actual negatives correctly identified as so. For example, the proportion of healthy people who are correctly identified as not having the condition

Disease Positive Disease Negative

Test Positive a b

Test Negative c d

In the table above, sensitivity = a/(a+c) or the proportion of disease cases accurately determined to be positive by the test. Specificity = d/(b+d) or the proportion of non-disease cases accurately determined to be negative by the test.

II. Data relationships

Correlated – A statistical relationship existing between two variables or datasets that reflects a dependence between the two. For example, human height is positively correlated with shoe size.
Independence (Also called statistically independent or probabilistic independence) – The occurrence of one variables does not influence the probability of another variable. For example, shoe size is statistically independent from volume of ice cream consumption.

Click here for a detailed description.

III. Data Parametrics

Click here for a detailed description.

Parametric – Data that follows an approximately normal distribution is considered parametric. When data is determined to meet the assumptions of normality, parametric statistics can be used. Common approaches to determining whether or not data is approximately normal include:
- Visual examination of the data
- Evaluation of the skew (i.e., how far left or right the data distribution leans) and kurtosis (i.e., how high or low data distribution peak goes); common convention allows for some skew and kurtosis since data rarely follow an exactly normal distribution
Nonparametric – Data with an unknown distribution or a distribution known to be not normal (e.g., multiple peaks, skew or kurtosis outside of an acceptable range) is considered nonparametric. Nonparametric statistics make no assumptions about the probability distributions of the variables being examined. When data fails to meet the assumptions of normality or consists of a small sample size (n<30, for example), nonparametric statistical tests should be used.

Example 1: Sensitivity and Specificity

Colorectal Cancer Screening: Guaiac-Based FOBT vs. FIT

	Guaiac-Based FOBT		Fecal Immunochemical Test (100-ng/mL cut point)
	Sensitivity	Specificity	Sensitivity	Specificity
Advanced adenomas	13.6%	92.4%	33.9%	90.6%
Cancer	30.8%	92.4%	92.3%	90.1%
Advanced colorectal neoplasias	16.7%	92.9%	44.4%	82.1%

Sensitivity was much higher, but specificity was slightly lower with FIT than with guaiac-based FOBT.

Seven hundred seventy consecutive average-risk patients from four centers who were undergoing screening colonoscopy also provided stool samples. Findings suggest that the FIT provides a higher sensitivity for detecting advanced colorectal neoplasias and cancer than the guaiac-based FOBT, and has an acceptable specificity.

Example 2: Correlated vs Independent

Click here for a detailed description.

Figure A represents a scatter plot of two correlated variables: CEPH LCLs sensitivity toward rituximab versus ofatumumab. Figure B is a scatter plot of rituximab sensitivity versus CD20 gene expression, which appear to be statistically independent.

Example 3: Data Distribution

Click here for a detailed description.

Approximately normal data distributions for cholesterol levels. Notice that the peaks for underweight individuals are skewed slightly (the peak is to the left of the center line) and demonstrate higher kurtosis (the peak is higher than the overweight and normal weight individuals).

For more information

Example 1: Park et al. Am J Gastroenterol. 2010 Sep;105(9):2017-25.
Example 2: Small GW et al. Peer J. 2013; 1:e31.
Cholesterol graphic borrowed from SAS/GRAPH(R) 9.2: Graph Template Language User’s Guide, Second Edition.

	Disease Positive	Disease Negative
Test Positive	a	b
Test Negative	c	d