AMMI notes - Ordinary least square solution
Solution to the optimization of the squarred error with accompanying code.
Linear Regression Exercise (Closed Form Solution)
In statistics, linear regression is a linear approach to modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables) [Wikipedia]. The closed form solution to finding the parameter $\theta$ of a linear regression model is given by $$\theta = (X^TX)^{-1}X^TY$$ where $X$ are your features and $Y$ is your target.
Let d be the number of features, n the number of examples. The dimensions are as follow:
- $\theta$ is (d,1)
- $X$ is (n,d)
- $Y$ is (n,1)
Prediction is done using:
- $Y = X \theta$
We are trying to find the value of $\theta$ that minimizes the squared error which means finding the solution to: $\underset{\theta}{argmin} \|{X \theta - Y}\|_2^2$
In order to find that value of theta, since the squared error is convex, we can find the derivative of the expression and find the value of $\theta$ that makes it 0.
First let's expand $\|{X \theta - Y}\|_2^2$ $$ \begin{aligned} \\ \\ \|{X \theta - Y}\|_2^2 &= (X \theta - Y)^T(X \theta - Y) \\ & = (\theta^T X^T - Y^T)(X \theta - Y) \\ & = \theta^T X^T X \theta - Y^T X \theta - \theta^T X^T Y - Y^T Y \\ & = \theta^T X^T X \theta - (\theta^T X^T Y)^T - \theta^T X^T Y - Y^T Y \\ & = \theta^T X^T X \theta - 2\theta^T X^T Y - Y^T Y \ because \ \theta^T X^T Y \ is \ a \ scalar \\ \\ \\ \frac{\partial \|{X \theta - Y}\|_2^2}{\partial \theta} & = 2 X^T X \theta - 2 X^T Y \\ \\ \\ \end{aligned} $$ By equating the derivative to 0 we get: $$ \begin{aligned} 2 X^T X \theta - 2 X^T Y & = 0 \\ X^T X \theta - X^T Y & = 0 \\ X^T X \theta & = X^T Y \\ \theta & = (X^T X)^{-1} X^T Y \\ \end{aligned} $$
Here is an implementation using numpy and the wine quality dataset from this dataset repo mcu dataset.
import pandas as pd
import numpy as np
!wget https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv -P data
!ls data
data = pd.read_csv("data/winequality-red.csv",sep=';')
data.head(3)
cols = ["fixed acidity", "volatile acidity", "citric acid", "residual sugar", "chlorides", "free sulfur dioxide", "total sulfur dioxide", "density", "pH", "sulphates", "alcohol"]
target = "quality"
data = data.sample(frac=1)
X = data[cols].values
Y = data[[target]].values
X.shape, Y.shape
We also implement the bias parameter by adding a feature with fixed value one to every data point. By doing so we get: $$ \sum_{i=1}^{n-1}(\theta_i \cdot x_i) + \theta_n \cdot 1 $$ $\theta_n$ will be the bias parameter
def add_ones(X): return np.hstack([X,np.ones((X.shape[0],1))])
class LinearReg:
"""
Basic linear regression implemetation using numpy
"""
def __init__(self, bias=False):
"""
Initialization of theta and a boolean to determine whether to use a bias or not
"""
self.theta = None
self.bias = bias
def fit(self,X,Y):
"""
Fit function. Uses the normal equation to compute theta
"""
if self.bias:
X = add_ones(X)
A = X.T @ X
B = X.T @ Y
self.theta = np.linalg.solve(A,B)
#self.theta = np.linalg.inv(A) @ B
def predict(self,X):
"""
prediction function
"""
if self.bias:
X = add_ones(X)
return X @ self.theta
@staticmethod
def mse(y_hat,y):
"""
Static method implementing the mean squared error
"""
return np.mean((y-y_hat)**2)
model1 = LinearReg()
model1.fit(X,Y)
LinearReg.mse(model1.predict(X),Y)
model2 = LinearReg(bias=True)
model2.fit(X,Y)
LinearReg.mse(model2.predict(X),Y)