2SLS and 3SLS

Two-stage Least Squares

G7 needs and has no special command for ordinary two-stage least squares. Consider the following system of equations:

y1 = f(y2, x1, x3)
y2 = g(y1, x2, x3)

To obtain the 2SLS estimates, first do:

r y1 = x1, x2, x3
rename predic z1
r y2 = x1, x2, x3
rename predic z2

The z’s now are the second stage regressors, so we do:

r y1 = z2, x1, x3
r y2 = z1, x2, x3

or, if we are building a model and want to have the correct names on the independent variables in the second stage regression, we can replace the last two lines by

rename z1 y1
rename z2 y2
r y1 = y2, x1, x3
r y2 = y1, x2, x3

Systemic Two-Stage Least Squares

The bane of two-stage least squares has been that there usually are so many exogenous variables and lagged values of endogenous variables in the system that the first-stage regressions fit so closely that there is no substantial difference between the OLS and the 2SLS estimates, especially if nonlinearities in the model are considered by including powers and cross products of the exogenous variables. G7 and Build offer a new way to deal with this problem. The procedure is simple. Estimate the equations of the model by OLS or other single-equation method. Build the model with these equations and simulate it over the historical period. The simulated values of the endogenous variables are not influenced by the errors in the structural equations. They therefore may be used as regressors in the second stage to obtain estimates free of simultaneous equation bias. There are two variants of this procedure. In the first, the simulation is done using the simulated values for lagged values of endogenous variables; the other uses the actual values of these variables.

To perform the first variant, perform the simulation without any special command, copy the bank of simulation results to the G7 workspace, start G7, select the ‘b’ option on the opening menu, do “bank bws” to put the actual values in the assigned bank, give the command “second on” so that f commands are ignored, and re-estimate the equations with the same add files as originally used. With “second on”, the dependent variable of the regression will be taken from the assigned bank; all others come from the workspace, which contains the simulated values. To turn off the second feature, the command is “second off”.

For the second variant, give the command first to the Run program before giving it the run command. Run will now store the simulated values in the workspace as before, but it will use actual values of all lagged variables. You may now proceed as before to start G7 and give the second command. You will then be using simulated values of all independent variables. Alternatively, you can choose to use actual values of all lagged variables. To do so, proceed as before, but place an “a.” in front of the name of every lagged endogenous variable in an r command. The “a.” in front of a variable name causes G7 to look for the variable only in the assigned bank, where it will find the actual value.

The command second off will restore G7 to its normal condition in which it processes f commands.

Three Stage Least Squares

G7 performs three stage least squares by combining the two stage procedure explained above with the seemingly unrelated regression procedure described in the help topic for SUR. Consider again the system of equations:

y1 = f(y2, x1, x3)
y2 = g(y1, x2, x3)

Obtain the 2SLS instrumental variables by:

r y1 = x1, x2, x3
rename predic z1
r y2 = x1, x2, x3
rename predic z2

Insert the second stage regressors (z’s) into the system and estimate with SUR:

title Two-stage and Three-stage Least Squares Estimates
sur
r y1 = z2, x1, x3
r y2 = z1, x2, x3
do

The first set of regression results are the 2SLS estimates, the second set are the 3SLS estimates. To graph the results use the same statements as for sur.