What is an instrumental variable?
An instrumental variable (IV) is a tool used in regression analysis to solve the problem of endogeneity — when an explanatory variable is correlated with the error term.
A good instrument must satisfy two key conditions:
- Relevance: It must be strongly correlated with the endogenous explanatory variable.
- Exogeneity: It must not be correlated with the error term in the regression model — meaning it only affects the dependent variable through the endogenous regressor.
Example: Education and Income
You suspect that Education
is endogenous when explaining Income
. A potential instrumental variable could be:
- Distance to the nearest college:
- Relevant: People who live closer are more likely to attend college.
- Exogenous: Distance doesn’t directly affect income (assuming people don’t choose where to live based on future earnings).
How IV is used (Two-Stage Least Squares — 2SLS):
- First stage: Regress the endogenous variable (e.g.,
Education
) on the instrument (e.g.,Distance
):
2. Second stage: Regress the dependent variable (e.g., Income
) on the predicted values of education from the first stage:
This gives you a consistent estimate of the causal effect.