Ordinary Least SquaresΒΆ

In linear regression, the model specification is that the dependent variable is a linear combination of the parameters. The residual is the difference between the value of the dependent variable predicted by the model, and the true value of the dependent variable. Ordinary least squares obtains parameter estimates that minimize the sum of squared residuals, SSE (also denoted RSS).

The ordinary least squares (OLS) estimator is consistent when the independent variables are exogenous and there is no multicollinearity, and optimal in the class of linear unbiased estimators when the errors are homoscedastic and serially uncorrelated. Under these conditions, the method of OLS provides minimum-variance mean-unbiased estimation when the errors have finite variances.

from miml import datasets
from miml.regression import OLS

fn = os.path.join(datasets.get_data_home(), 'weka', 'regression',
    '2dplanes.arff')
ds = datasets.load_arff(fn, 10)
x = ds.x
y = ds.y

model = OLS()
model.fit(x, y)

r = model.predict(x[:10,:])

print r
>>> run script...
array([5.073347387304948])
>>> model
Linear Model:

Residuals:
           Min              1Q          Median              3Q             Max
       -8.5260         -1.6514         -0.0049          1.6755          7.8116


Coefficients:
            Estimate        Std. Error        t value        Pr(>|t|)
Intercept    -0.0148            0.0118        -1.2503          0.2112

Var 1             2.9730            0.0118       251.7998          0.0000 ***

Var 2             1.5344            0.0145       105.8468          0.0000 ***

Var 3             1.0357            0.0144        71.7815          0.0000 ***

Var 4             0.5281            0.0145        36.4827          0.0000 ***

Var 5             1.4766            0.0144       102.2472          0.0000 ***

Var 6             1.0044            0.0144        69.5380          0.0000 ***

Var 7             0.5238            0.0145        36.1696          0.0000 ***

Var 8            -0.0011            0.0145        -0.0750          0.9402

Var 9             0.0024            0.0145         0.1649          0.8690

Var 10           -0.0278            0.0145        -1.9239          0.0544 .

---------------------------------------------------------------------
Significance codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.3838 on 40757 degrees of freedom

Multiple R-squared: 0.7056,    Adjusted R-squared: 0.7055

F-statistic: 9766.9504 on 10 and 40757 DF,  p-value: 0.000