Lesson 1: Data Characteristics

This lesson reviews the basic characteristics of data. At the end of the lesson, you will be able to:

Explain what a variable is and differentiate between independent and dependent variables
Determine the measurement scale of a variable
Describe continuous, discrete, and dichotomous variables

Terms that appear frequently throughout this lesson are defined below:

Term	Definition
Dependent variable	The output, outcome, or effect of interest
Independent variable	An input, which may be varied or simply observed by the researcher. Sometimes called an experimental or predictor variable
Measurement Scale
Nominal	Named category
Dichotomous	A nominal variable that contains only two categories or levels (e.g., yes/no, male/female, community/hospital)
Ordinal	Ordered categories
Interval	Each value on the scale has a unique meaning, can be rank ordered, and are equally spaced
Ratio	Each value on the scale has a unique meaning, can be rank ordered, are equally spaced, and has a minimum value of zero
Continuous	Variables that can take on any value in a given range; typically interval or ratio variables and sometimes ordinal variables
Discrete	Variables that have a finite number of possible values; typically nominal or ordinal variables

	Variable
	Categorical		Quantitative
Level	Nominal	Ordinal	Interval	Ratio
Defining Feature	Distinct Categories	Ordered Categories	Meaningful Distances	Absolute Zero

In biostatistics, an independent variable (also called predictor or experimental) is a variable that is observed or manipulated in order to determine its relationship with the dependent variable (also called outcome, output, or effect of interest).

Discrete variables (also called categorical) have a finite number of possible responses, which are typically categories:
1. Nominal variables fall into two or more named categories. For example, city is a nominal variable. Possible categories could include Atlanta, Boston, Cleveland, Dallas, and Phoenix.
  1. Dichotomous variables (also called binary) are nominal variables that fall into only two categories. For example, a coin toss can be categorized as heads or tails.
2. Ordinal variables are clearly ordered categories. For example, the answer to a survey question may be categorized as low, medium, or high; highest level of education may be categorized as some high school, completed high school, some college, completed bachelors degree, some graduate school, completed graduate degree.

Continuous variables can take on any value in a certain range:
1. Interval variables have meaningful intervals between measurements. The difference between a temperature of 70 and 80 degrees, for example, is the same difference as between 80 and 90 degrees.
2. Ratio variables include the value 0. Weight, height, and enzyme activity are examples.

BEWARE! Sometimes, ordinal data is treated as continuous data. A common example of this is the Likert agreement scale (i.e., strongly agree to strongly disagree). While some researchers argue that the scale can be treated as continuous under certain conditions, others argue that it should never be treated as a continuous variable.

BEWARE! Generally, the amount of information captured by data increases as you move from: nominal → ordinal → interval → ratio. Restructuring continuous data into categorical data is akin to throwing data away. For example, recoding an exam score measured on a 100 point scale to Pass/Fail for the purposes of analysis reduces the amount of information in the data.

The following table presents adherence data broken down by patient characteristics. Multiple types of data are used. Consider the following questions:

1. What is the dependent variable? What are the independent variables?

2. What scale of measurement is used for each variable?

Patient Characteristics by Category of Adherence^a

Characteristics	Adherent to Three Classes	Adherent to Two Classes	Adherent to One Class	Nonadherent to Any Classes	P Value
Number of patients	201, 459	134, 694	77, 696	79, 760	—
Age, y (mean ± SD)	71.9 (8.9)	71.4 (9.4)	71.1 (9.7)	69.9 (10.5)	< .001
< 65, %	12.7	15.2	16.7	20.9
65-74, %	49.1	47.0	46.3	45.4
≥ 75, %	38.2	37.8	37.1	33.8
Female, %	55.6	59.4	61.1	59.8	< .001
Race, %					<.001
White	68.6	66.0	62.6	57.9
Black	12.8	15.0	17.2	20.8
Hispanic	6.5	8.5	10.0	11.0
Other	12.0	10.6	10.2	10.4
CCI (mean ± SD)	0.73 (1.54)	1.04 (1.83)	1.22 (1.99)	1.30 (2.07)	< .001

CCI, Deyo-adapted Charlson Cormorbidity Index.
^a Adherence was defined as proportion of days covered ≥ 80%.

Dependent variable: category of adherence

Independent variables: patient characteristics (i.e. age, gender, race, CCI)

Nominal: gender, race

Ordinal: age, category of adherence

Ratio: number of patients, age, CCI

Notice that age is represented on two scales: 1) as a continuous variable (as represented by mean +/- SD) and 2) as a categorical variable (as %s in < 65, 65 – 74, and > = 75)

For more information

Table 1: Yang Y, et al. Medication nonadherence and the risks of hospitalization, emergency department visits, and death among Medicare Part D enrollees with diabetes. Drug Benefit Trends. 2009; 21(330): 8.