Logistic Regression for Multiclass Classification

Logistic Regression for Multiclass Classification#

In this part, we consider logistic regression for K > 2 classes. We have data (xi,yi) for i = 1, 2, …, N, where xiRp is the input/feature and yi is the output/label, which indicates the class of the input.

We treat the output yi as a categorical variable, which indicates the class of the input.

Model#

We consider the augmented data xi=[1,xi1,xi2,...,xip] for i = 1, 2, …, N, where xij is the j-th feature of the i-th input, and we assume that the output yi can take K different values, 1, …, K

We assume the the probability of the input x belonging to class 1 to K is given by a vector of probabilities.

f(x)=[f1(x)f2(x)fK(x)]=1k=1Kexp(wkTx)[exp(w1Tx)exp(w2Tx)exp(wKTx)]
  • wi=[wi0,wi1,wi2,...,wip] is the p+1 dimensional vector of coefficients for class i

  • wiTx=wi0+wi1x1+wi2x2+...+wipxp. As in linear regression and binary logistic regression, we are taking a linear combination of the features, and the cofficients are learned from the data.

  • fj(x;W) is the probability of the input x belonging to class j. By construction, j=1Kfj(x;W)=1 for all x. That is, the probabilities of the input x belonging to class 1 to K sum to 1.

  • W is the matrix of all the coefficients wi for i = 1, 2, …, K.

W=[w1Tw2TwKT]=[w10w11w1pw20w21w2pwK0wK1wKp]

where wij is the j-th coefficient for class i.

  • We also write f(x;W) to indicate the dependence of the probabilities on the coefficients W.

Cross-entropy loss#

Define the indicator variable yik as

yik={1if yi is class k0otherwise

Another way to think about this is that we encode the categorical variable yi as a vector in RK with a 1 at the k-th position and 0 elsewhere. For example, if there is 3 class and the output yi is class 2, then yi=[0,1,0].

The cross-entropy loss is given by

L(W)=i=1Nk=1Kyiklog(fk(xi;W))

The summation over i is adding up the error from each sample. For the summation over k, we are adding up the error from each class. By our definition of yik, only the term corresponding to the true class yi contributes to the loss. For example, if there is 3 classes and x1 belong to class 2, then

k=1Kyiklog(fk(xi;W))=log(f2(x1;W))

where f2(x1;W) is the predicted probability of x1 belonging to class 2. If it is close to 100%, then the loss contributed by sample 1 is close to 0. If f2(x1;W) is close to 0%, then the contributed loss is large.

The optimal weight matrix W is obtained by minimizing the loss function L(W).

Exercises: show that when K=2, this reduces to the binary logistic regression loss.

Visualization#

import matplotlib.pyplot as plt
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.inspection import DecisionBoundaryDisplay

# Set random seed for reproducibility
np.random.seed(0)

# Number of samples per class
N = 100

# Generate data for three classes, each class has a different mean
x_class1 = np.random.multivariate_normal([1, 1], np.eye(2), N)
x_class2 = np.random.multivariate_normal([-1, -1], np.eye(2), N)
x_class3 = np.random.multivariate_normal([0, 3], np.eye(2), N)

# Combine into a single dataset
X = np.vstack((x_class1, x_class2, x_class3))
y = np.concatenate((np.zeros(N), np.ones(N), 2*np.ones(N)))

# Create a logistic regression classifier with multinomial option for multi-class
clf = LogisticRegression(multi_class='multinomial', solver='lbfgs', max_iter=1000)
clf.fit(X, y)

# Plot the decision boundaries using DecisionBoundaryDisplay
fig, ax = plt.subplots()
db_display = DecisionBoundaryDisplay.from_estimator(
    clf,
    X,
    grid_resolution=200,
    response_method="predict",  # Can be "predict_proba" for probability contours
    cmap='coolwarm',
    alpha=0.5,
    ax=ax
)
# Scatter plot of the data points
scatter = ax.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k', cmap='coolwarm')

# Adding title and labels
ax.set_title('3-Class Logistic Regression Decision Boundary')
ax.set_xlabel('Feature 1')
ax.set_ylabel('Feature 2')

# Show plot
plt.show()
/Users/Ray/opt/anaconda3/envs/math10/lib/python3.12/site-packages/sklearn/linear_model/_logistic.py:1247: FutureWarning: 'multi_class' was deprecated in version 1.5 and will be removed in 1.7. From then on, it will always use 'multinomial'. Leave it to its default value to avoid this warning.
  warnings.warn(
../_images/00e3a78eedc1b06a6ca53d6c6b8291ba9b854feb299b6ea1d5efe9eded015028.png

Classification using penguins dataset#

import pandas as pd
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score, confusion_matrix
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler


# Load the dataset
df = sns.load_dataset('penguins')

# Drop rows with missing values
df.dropna(inplace=True)

features = ['bill_length_mm', 'bill_depth_mm']

# Select features
X = df[features]
y = df['species']


# Initialize and train the logistic regression model
clf = LogisticRegression()
clf.fit(X, y)

# Calculate the training and test accuracy
score = clf.score(X, y)
print(f"Accuracy: {score:.2f}")


# Predict on the test set
y_pred = clf.predict(X)

# Evaluate the model
conf_matrix = confusion_matrix(y, y_pred)
print("Confusion Matrix:\n", conf_matrix)

# Plotting the confusion matrix
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues', xticklabels=clf.classes_, yticklabels=clf.classes_)
plt.xlabel('Predicted Labels')
plt.ylabel('True Labels')
plt.title('Confusion Matrix of Species Classification')
plt.show()
Training accuracy: 0.96
Confusion Matrix:
 [[149   2   0]
 [  4  60   4]
 [  0   2 121]]
../_images/aad1c78efa3f1278c13b0265c633bc5df2fea3a61e9e53305fa3683deb74a379.png
import matplotlib.pyplot as plt
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.inspection import DecisionBoundaryDisplay

fig, ax = plt.subplots()
db_display = DecisionBoundaryDisplay.from_estimator(
    clf,
    X,
    grid_resolution=200,
    response_method="predict",  # Can be "predict_proba" for probability contours
    cmap='coolwarm',
    alpha=0.5,
    ax=ax
)

# Scatter plot of the data points
scatter = sns.scatterplot(data=df, x='bill_length_mm', y='bill_depth_mm', hue='species')

# Adding title and labels
ax.set_title('3-Class Logistic Regression Decision Boundary')
ax.set_xlabel(features[0])
ax.set_ylabel(features[1])

# Show plot
plt.show()
../_images/1f1d03ff2e8e3ddc62b80c4a3c221a67cace51bfbeeb4f80792ef23c816910bc.png