Skip to main content

This lesson reviews the basic characteristics of data. At the end of the lesson, you will be able to:

  1. Explain what a variable is and differentiate between independent and dependent variables
  2. Determine the measurement scale of a variable
  3. Describe continuous, discrete, and dichotomous variables

Terms that appear frequently throughout this lesson are defined below:

Term Definition
Dependent variable The output, outcome, or effect of interest
Independent variable An input, which may be varied or simply observed by the researcher. Sometimes called an experimental or predictor variable
Measurement Scale

Nominal

Named category

Dichotomous

A nominal variable that contains only two categories or levels (e.g., yes/no, male/female, community/hospital)

Ordinal

Ordered categories

Interval

Each value on the scale has a unique meaning, can be rank ordered, and are equally spaced

Ratio

Each value on the scale has a unique meaning, can be rank ordered, are equally spaced, and has a minimum value of zero

Continuous

Variables that can take on any value in a given range; typically interval or ratio variables and sometimes ordinal variables

Discrete

Variables that have a finite number of possible values; typically nominal or ordinal variables
Variable
Categorical Quantitative
Level Nominal Ordinal Interval Ratio
Defining Feature Distinct Categories Ordered Categories Meaningful Distances Absolute Zero

In biostatistics, an independent variable (also called predictor or experimental) is a variable that is observed or manipulated in order to determine its relationship with the dependent variable (also called outcome, output, or effect of interest).

  1. Discrete variables (also called categorical) have a finite number of possible responses, which are typically categories:
    1. Nominal variables fall into two or more named categories. For example, city is a nominal variable. Possible categories could include Atlanta, Boston, Cleveland, Dallas, and Phoenix.
      1. Dichotomous variables (also called binary) are nominal variables that fall into only two categories. For example, a coin toss can be categorized as heads or tails.
    2. Ordinal variables are clearly ordered categories. For example, the answer to a survey question may be categorized as low, medium, or high; highest level of education may be categorized as some high school, completed high school, some college, completed bachelors degree, some graduate school, completed graduate degree.
  1. Continuous variables can take on any value in a certain range:
    1. Interval variables have meaningful intervals between measurements. The difference between a temperature of 70 and 80 degrees, for example, is the same difference as between 80 and 90 degrees.
    2. Ratio variables include the value 0. Weight, height, and enzyme activity are examples.
  BEWARE! Sometimes, ordinal data is treated as continuous data. A common example of this is the Likert agreement scale (i.e., strongly agree to strongly disagree). While some researchers argue that the scale can be treated as continuous under certain conditions, others argue that it should never be treated as a continuous variable.

  BEWARE! Generally, the amount of information captured by data increases as you move from: nominal → ordinal →  interval → ratio. Restructuring continuous data into categorical data is akin to throwing data away. For example, recoding an exam score measured on a 100 point scale to Pass/Fail for the purposes of analysis reduces the amount of information in the data.

The following table presents adherence data broken down by patient characteristics. Multiple types of data are used. Consider the following questions:

1. What is the dependent variable? What are the independent variables?

2. What scale of measurement is used for each variable?

Patient Characteristics by Category of Adherencea

Characteristics Adherent to Three Classes Adherent to Two Classes Adherent to One Class Nonadherent to Any Classes P Value
Number of patients 201, 459 134, 694 77, 696 79, 760
Age, y (mean ± SD) 71.9 (8.9) 71.4 (9.4) 71.1 (9.7) 69.9 (10.5) < .001

< 65, %

12.7 15.2 16.7 20.9

65-74, %

49.1 47.0 46.3 45.4

≥ 75, %

38.2 37.8 37.1 33.8
Female, % 55.6 59.4 61.1 59.8 < .001
Race, % <.001

White

68.6 66.0 62.6 57.9

Black

12.8 15.0 17.2 20.8

Hispanic

6.5 8.5 10.0 11.0

Other

12.0 10.6 10.2 10.4
CCI (mean ± SD) 0.73 (1.54) 1.04 (1.83) 1.22 (1.99) 1.30 (2.07) < .001

CCI, Deyo-adapted Charlson Cormorbidity Index.
a Adherence was defined as proportion of days covered ≥ 80%.

Dependent variable: category of adherence

Independent variables: patient characteristics (i.e. age, gender, race, CCI)

Nominal: gender, race

Ordinal: age, category of adherence

Ratio: number of patients, age, CCI

Notice that age is represented on two scales: 1) as a continuous variable (as represented by mean +/- SD) and 2) as a categorical variable (as %s in < 65, 65 – 74, and > = 75)