Ridge Regression

Coefficient estimates for multiple linear regression models rely on the independence of the model terms. When terms are correlated and the columns of the design matrix X have an approximate linear dependence, the matrix X’X becomes close to singular. As a result, the least-squares estimate becomes highly sensitive to random errors in the observed response Y, producing a large variance.

Ridge regression is one method to address these issues. In ridge regression, the matrix X’X is perturbed so as to make its determinant appreciably different from 0.

Ridge regression is a kind of Tikhonov regularization, which is the most commonly used method of regularization of ill-posed problems. Ridge regression shrinks the regression coefficients by imposing a penalty on their size. By allowing a small amount of bias in the estimates, more reasonable coefficients may often be obtained. Often, small amounts of bias lead to dramatic reductions in the variance of the estimated model coefficients.

Another interpretation of ridge regression is available through Bayesian estimation. In this setting the belief that weight should be small is coded into a prior distribution.

from miml import datasets
from miml.regression import RidgeRegression

x = array([[234.289,      235.6,        159.0,    107.608, 1947,   60.323],
        [259.426,      232.5,        145.6,    108.632, 1948,   61.122],
        [258.054,      368.2,        161.6,    109.773, 1949,   60.171],
        [284.599,      335.1,        165.0,    110.929, 1950,   61.187],
        [328.975,      209.9,        309.9,    112.075, 1951,   63.221],
        [346.999,      193.2,        359.4,    113.270, 1952,   63.639],
        [365.385,      187.0,        354.7,    115.094, 1953,   64.989],
        [363.112,      357.8,        335.0,    116.219, 1954,   63.761],
        [397.469,      290.4,        304.8,    117.388, 1955,   66.019],
        [419.180,      282.2,        285.7,    118.734, 1956,   67.857],
        [442.769,      293.6,        279.8,    120.445, 1957,   68.169],
        [444.546,      468.1,        263.7,    121.950, 1958,   66.513],
        [482.704,      381.3,        255.2,    123.366, 1959,   68.655],
        [502.601,      393.1,        251.4,    125.368, 1960,   69.564],
        [518.173,      480.6,        257.2,    127.852, 1961,   69.331],
        [554.894,      400.7,        282.7,    130.081, 1962,   70.551]])
y = array([83.0,  88.5,  88.2,  89.5,  96.2,  98.1,  99.0, 100.0, 101.2,
           104.6, 108.4, 110.8, 112.6, 114.2, 115.7, 116.9])
model = RidgeRegression(0.0057)
model.fit(x, y)

print(model.predict(x[:10,:]))
>>> run script...
array([83.71913397911655])
>>> model
Ridge Regression:

Residuals:
           Min              1Q          Median              3Q             Max
       -2.0691         -0.5736          0.2619          0.4844          1.6328


Coefficients:
            Estimate        Std. Error        t value        Pr(>|t|)
Intercept  -247.2810                NA             NA              NA

Var 1             0.1789            7.8561         0.0228          0.9823

Var 2             0.0197            2.0319         0.0097          0.9925

Var 3             0.0066            0.8647         0.0076          0.9941

Var 4            -1.3433            4.2777        -0.3140          0.7607

Var 5             0.2216            9.7525         0.0227          0.9824

Var 6            -0.0575            3.7635        -0.0153          0.9881

---------------------------------------------------------------------
Significance codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.2361 on 9 degrees of freedom

Multiple R-squared: 0.9921,    Adjusted R-squared: 0.9869

F-statistic: 189.0534 on 6 and 9 DF,  p-value: 6.011e-09