Lecture Week 4 Mon 10/21#
Problem#
Given the training dataset \(x_i\in\mathbb{R}\), \(y_i\in\mathbb{R}\), \(i= 1,2,..., N\), we want to find the linear function
that fits the relations between \(x_i\) and \(y_i\). So that given any \(x\), we can make the prediction
Loss Function and Optimization#
With the training dataset, define the loss function \(L(w,b)\) of parameter \(w\) and \(b\), which is also called mean squared error (MSE)
where \(\hat{y}^{(i)}\) denotes the predicted value of y at \(x_i\), i.e. \(\hat{y}^{(i)} = wx_i+b\).
Our goal is to find the optimal \(w\) and \(b\) that minimize the loss function \(L(w,b)\), i.e.
This is a function of \(w\) and \(b\), and we can analytically solve \(\partial_{w}L = \partial_{b}L =0\), and yields
where \(\bar{X}\) and \(\bar{Y}\) are the mean of \(x\) and of \(y\), and \(\text{Cov}(X,Y)\) denotes the estimated covariance (or called sample covariance) between \(X\) and \(Y\), \(\text{Var}(Y)\) denotes the sample variance of \(Y\).
What is the best constant model?#
What is the best constant \(b\) that minimizes the loss function \(L(b)\)?
We can analytically solve \(\partial_{b}L = 0\), and yields
where \(\bar{y}\) is the mean of \(y\).
Evaluating the model#
MSE: The smaller MSE indicates better performance.
MSE depends on the unit. For example, if we replace y[m] by y[mm], than MSE will be \(1000^2\) times the original MSE.
R-Squared: The larger \(R^{2}\) (closer to 1) indicates better performance. Compared with MSE, R-squared is dimensionless, not dependent on the units of variable.
Intuitively, MSE is the “unexplained” variance, and \(R^{2}\) is the proportion of the variance that is explained by the model: If our model is perfect, then MSE is 0, and \(R^{2}\) is 1. If we are using a constant model, then our best prediction is the mean of \(y\), and MSE is the variance of \(y\), and \(R^{2}\) is 0;
import numpy as np
x = np.array([0,1,2,3])
y = np.array([0,2,0,6])
poll: What is the optimal slope and intercept for the data?#
class myLinearRegression:
'''
The single-variable linear regression estimator.
This serves as an example of the regression models from sklearn, with methods fit, predict, and score.
'''
def __init__(self):
'''
'''
self.w = None
self.b = None
def fit(self, x, y):
# covariance matrix,
# bias = True makes the factor 1/N, otherwise 1/(N-1)
# but it doesn't matter here, since the factor will be cancelled out in the calculation of w
cov_mat = np.cov(x, y, bias=True)
# cov_mat[0, 1] is the covariance of x and y, and cov_mat[0, 0] is the variance of x
self.w = cov_mat[0, 1] / cov_mat[0, 0]
self.b = np.mean(y) - self.w * np.mean(x)
# :.3f means 3 decimal places
print(f'w = {self.w:.3f}, b = {self.b:.3f}')
def predict(self, x):
'''
Predict the output values for the input value x, based on trained parameters
'''
ypred = self.w * x + self.b
return ypred
def score(self, x, y):
'''
Calculate the R^2 score of the model
'''
mse = np.mean((y - self.predict(x))**2)
var = np.mean((y - np.mean(y))**2)
Rsquare = 1 - mse / var
return Rsquare
lm = myLinearRegression()
lm.fit(x, y)
w = 1.600, b = -0.400
score = lm.score(x, y)
print(f'R^2 score = {score:.3f}')
R^2 score = 0.533
Use the following command to install scikit-learn
conda install scikit-learn
or
pip install scikit-learn
from sklearn.linear_model import LinearRegression
reg = LinearRegression()
# For sklearn, the input x has to be n-by-d, where n is the number of samples and d is the number of features
# x.reshape(-1, 1) reshapes the 1D array x to a 2D array with 1 column
#
reg.fit(x.reshape(-1, 1), y)
print(f'w = {reg.coef_[0]:.3f}, b = {reg.intercept_:.3f}')
score = reg.score(x.reshape(-1, 1), y)
print(f'score = {score:.3f}')
w = 1.600, b = -0.400
score = 0.533
# side note on numpy.reshape
x = np.array([0,1,2,3,4,5])
# reshape to 2D array with 2 rows and 3 columns
print(x.reshape(2, 3))
# if we set the number of rows to -1, numpy will automatically calculate the number of rows
print(x.reshape(-1, 3))
# reshape to n-by-1 array
print(x.reshape(-1, 1))
[[0 1 2]
[3 4 5]]
[[0 1 2]
[3 4 5]]
[[0]
[1]
[2]
[3]
[4]
[5]]