Ordinary Regression

Here are some commands related to the running of an ordinary regression:

(ti)tle <the title you want for your regression or graph>

Supplies the title for the display of your regression results.

(lim)its <date1> <date2> [date3]

Specifies the starting date for the fit, <date1>, the ending date for the fit, <date2>, and the ending date for the test or forecast period, <date3>. If <date3> is omitted, then it is assumed to be the same as <date2>.

mode <test | forecast>
mode <t | f>
Determines what is to be done between <date2> and <date3>. The default mode is “test.” In that case, the values predicted by the equation for the period from <date2> to <date3> are compared to the actual data and a SEE and MAPE are computed for the test period. For “forecast” mode to work, the series for all independent variables except lagged values of the independent variable must extend to <date3>. Then, in forecast mode, two forecasts are made for the period from <date2> to <date3>. The first simply is the predicted value; the second is the predicted value plus the “rho adjustment” to take account of the error in the last period of fit.
cmtype <yes | no>

If “yes”, the simple correlation matrix and the standard deviation of each variable will be displayed in the results window before the regression results. After the regression, the variance-covariance matrix of the regression coefficients is displayed, as is the matrix of derivatives of the regression coefficients with respect to one another. For each coefficient, this matrix shows how all of the other coefficients would change if the given coefficient forcibly were changed by a (con)straint command. The default value for cmtype is “no.”

See also the Regression Tests topic.

OLS Regression Command

r <y> = <x1>, <x2>, <x3>, …, <xn>
r <y> = ! <x1>, <x2>, <x3>, …, <xn>
Both forms regress the dependent variable <y> on the independent variables <x1>, …, <xn>. The first form automatically supplies a constant term, but the second does not. The total number of variables, including the constant, must not exceed 500. Independent variables may be expressions, as on the right side of an f command. Dependent variable must be a single variable. If the line ends with a ‘,’ the command will continue on the next line.

If lagged values of the dependent variable occur among the explanatory variables, they automatically will be supplied from previously calculated values when forecasting. Regression coefficients and other values are stored in the “rcoef” variable in the workspace bank, the dependant variable is stored as series “depvar”, predicted values are stored as the series “predic”, and residuals are stored in series “resid.”

The series “rcoef” is created by the regression command. The series begins in the first period of the workspace bank; this period is set as the base period in the G.CFG file. It is displayed with the wsinfo command. The following table displays the data stored in the “rcoef” series, for a regression on k independent variables.

ORDER OF THE RCOEF TIME SERIES

Number of coeficients (k)

Coefficient 1

.…

Coeficient k

Mexval 1

.…

Mexval k

Std. Error of Estimate (SEE)

R-Squared

Rho [(2-DW)/2]

Adjusted R-Squared

Mean Absolute Percentage Error (MAPE)/100

Number of observations (N)

Use gr * to graph regression results. When you have the graph of the regression and want to look back at the regression results, just move to the regression results window by navigating through the open windows of G7.

Use “gr rcoef :3 8” to graph regression coefficients 3 through 8. This especially is useful with distributed lags.

Use “gr resid” to graph the residuals. Use “gr lever” to graph the “leverage” series, which shows the derivative of the predicted value of each observation with respect to the observed value of that observation. It is useful for spotting outlying observations.

Terms Appearing on the Regression Display

These are characteristics of the entire regression. These appear at the top.

SEE:

Standard error of estimate, or the square root of the average of the squares of the misses or “residuals” of the equation.

RSQ:

(pronounced r-square) Coefficient of multiple determination.

RHO:

(Greek rho) Autocorrelation coefficient of the residuals.

Obser:

Number of observations.

SEE+1:

The SEE for forecasts one period ahead using rho adjustment.

RBSQ:

Coefficient of multiple determination adjusted for degrees of freedom.

DW:

Durbin-Watson statistic; contains same information as does RHO.

DH:

Durbin H statistic; replaces DW if a lagged value of the dependent variable is present.

DoFree:

(Degrees of freedom) = Obser - number of independent variables.

From:

To:

The dates covered by the regression.

MAPE:

Mean Absolute Percentage Error.

JarqBer:

A test of normality due to Jarque and Bera.

Next are values relating to individual variables. In the Regression | Settings you may specify which of these are to be displayed. The default display is:

Name:

Code name of the variable.

Reg-coef:

Regression coefficient for this variable.

Mexval:

(Marginal explanatory value) The percentage increase in SEE if this variable is left out of the regression. A factual alternative to the t statistic.

T-value:

Student’s t values.

Elas:

Elasticity of the dependant variable with respect to this variable, evaluated at the means of both.

NorRes:

Normalized residuals. The ratio of the sum of squares residuals after the introduction of this variable to the sum of squared residuals after all variables have been introduced. A factual alternative to F statistics.

Mean:

The mean value of this variable.

Beta:

What the regression coefficient would be if both the independent and dependent variables were scaled so that they had unit standard deviations.

Fstat:

The Fisher F statistic for testing the significance of this and the following independent variables.

Additionally, if you give the command cmtype y, G7 will display the correlation matrix of the original variables and their standard deviations, the matrix of variances and covariance of the regression coefficients, and a matrix of the derivatives of one regression coefficient with respect to another. The derivatives with respect to variable 3, for example, show the rate of change of each other coefficient if the coefficient on variable 3 is changed by a constraint. They show how the estimate of one coefficient depends on the estimate of another. You can turn the display of the t-statistic on or off with showt y or showt n.

Recursive OLS Regression

limit <date1> <date2> <date3>
recur <y> = <x1>, <x2>, <x3>, …, <xn>
recur <y> = ! <x1>, <x2>, <x3>, …, <xn>
The indicated regression will be done from <date2> to <date3>, then from <date2>-1 (the period before date2) to <date3>, then from <date2>-2 to <date3>, and so on back to <date1> to <date3>. The regression coefficients are made into time series and stored in the workspace. “b1” is the series for the first regression coefficient; “b2”, for the second; and so on. The regression coefficients are stored in the date corresponding to the first date in their regression. Thus, “b1{date1}” would be “b1” for the regression over the period <date1> to <date3>. Similarly, the standard errors of the regression coefficients are stored in “s1”, “s2”, “s3”, etc.

If a zip command preceeds the recur command, no regressions will be displayed, but the zip command will be turned off at the end of the recur command for you surely will want to view graphs after the recursive regression and they cannot be done with zip on.

Example:

zip
limit 1980.1 2005.1 2010.2
recur gnp$ = g$, v$, fe$, fi$
gdates 1980.1 2005.1
fadd BOUNDS.ADD (1 - 5)

where “BOUNDS.ADD” is the file

ti Estimate and 2-Sigma Limits for Coefficient %1
f upper = b%1 + 2*s%1
f lower = b%1 - 2*s%1
gr b%1 upper lower
intvar <prefix> <date1> <date2> [<date3> <date4> … <dateN>]

The intvar command makes up linear interpolation functions between the given dates. Each linear interpolation function begins at 0 and remains at 0 until its particular time interval comes; then it rises by 1 each period until the end of its interval. After that, it remains constant at whatever value it has reached. For example, if the prefix is “tax”, and we have 6 dates, 6 linear interpolation functions will be created, with names “tax1”, “tax2”, …, “tax6”. Then, if we regress a policy variable, such as the federal tax rate, on these functions, the predicted values from the regression take the shape of a set of spliced line segments, approximating the curve of the left hand side variable.

For example, to approximate the federal personal tax rate in the Quest model, use:

r pitfBR = tax1, tax2, tax3, tax4, tax5, tax6

See also the following topics:

  • Distributed lags

  • Soft Constraints

  • Seemingly Unrelated Regression (SUR)

  • Homogeneity and normality tests

  • Non-linear regression