What is an instrumental variable?

An instrumental variable (IV) is a tool used in regression analysis to solve the problem of endogeneity — when an explanatory variable is correlated with the error term.

A good instrument must satisfy two key conditions:

  1. Relevance: It must be strongly correlated with the endogenous explanatory variable.
  2. Exogeneity: It must not be correlated with the error term in the regression model — meaning it only affects the dependent variable through the endogenous regressor.

Example: Education and Income

You suspect that Education is endogenous when explaining Income. A potential instrumental variable could be:

  • Distance to the nearest college:
    • Relevant: People who live closer are more likely to attend college.
    • Exogenous: Distance doesn’t directly affect income (assuming people don’t choose where to live based on future earnings).

How IV is used (Two-Stage Least Squares — 2SLS):

  1. First stage: Regress the endogenous variable (e.g., Education) on the instrument (e.g., Distance):
Education = π₀ + π₁ * Distance + v

2. Second stage: Regress the dependent variable (e.g., Income) on the predicted values of education from the first stage:

Income = β₀ + β₁ * Predicted_Education + ε

This gives you a consistent estimate of the causal effect.

Leave a Reply

Your email address will not be published. Required fields are marked *