OOP and Gradient Descent#

Class and Object#

A class combines data/attribute and functionality/method together

An object is an instance of a class

from math import sqrt

class Vector():
    # custom type for 2d vectors

    # class attribute shared by all objects of this class
    dim = 2
    
    def __init__(self, input_x, input_y):
        # constructor, called when new object is created, Vector(x,y)
        # self is the object being created
        # .x and .y are object attributes
        self.x = input_x
        self.y = input_y
    
    def length(self):
        # length method, returns length of vector
        l = sqrt(self.x**2+self.y**2)
        return l
    
    def scale(self, c):
        # method that scales the vector by a constant c
        # changes the object itself
        # no return value, so returns None
        self.x = self.x * c
        self.y = self.y * c

    def __repr__(self):
        # string representation of the object
        # without this method, it prints the memory address
        return f'({self.x},{self.y})'
            
    
    def add(self, other_vector):
        # method that adds another vector to this vector, returns new vector
        x_new = self.x + other_vector.x
        y_new = self.y + other_vector.y
        return Vector(x_new, y_new)
        
    def __add__(self, other_vector):
        # special method that overloads the + operator
        # without this method, vector + vector would raise an error
        x_new = self.x + other_vector.x
        y_new = self.y + other_vector.y
        return Vector(x_new, y_new)
    
    def normalize(self):
        # method that scales the vector to unit length
        # can call other methods of the same object
        l = self.length()
        self.scale(1/l)
v = Vector(3,4)
v.x
print(v.x)
v.normalize()
print(v.length())
print(v)
3
1.0
(0.6000000000000001,0.8)
v = Vector(1,1)
w = Vector(2,3)

# custom add method
p = v.add(w)
print(p)

# this is actually calling the __add__ method
q = v + w
print(q)
(3,4)
(3,4)

Gradient Descent (GD)#

Problem: minimize f(x), \(x \in \mathbb{R}^d\)

Suppose \(\nabla f(x) = [f'(x_1), f'(x_2), ... , f'(x_d)]^T\) is the gradient of the function \(f(x)\) at point \(x\), and we have some initial guess \(x_0\).

Our goal is to generate a sequence of numbers \(x_0\), \(x_1\), \(x_2\) … that iteratively approaches a local minimizer of the function.

Idea: the gradient of a function at a point gives the direction of the steepest ascent of the function at that point. So, if we move in the opposite direction of the gradient, we should be moving towards a local minimum.

\[x_{i+1} = x_i - \alpha \nabla f(x_i)\]

where \(\alpha\) is the step size or learning rate. We repeat this process until \(|f'(x_i)|\) is close to 0 for some \(i\).

Note:

  • This algorithm tends to go to the closest local minimum.

  • If \(\nabla f(x_i)=0\), then we are at stationary point so we won’t move.

  • In the simplest form \(\alpha\) is fixed by the user. For more advanced versions, it will be adaptive or randomized.

Interactive Visualization#

1D visualization

2D visualization

Implementing GD using the Vector class#

import numpy as np
import matplotlib.pyplot as plt


# test function 1, simple quadratic
# f = lambda x, y: x**2 + y**2 
# grad_f = lambda x, y: Vector(2 * x , 2 * y)

# test function 2, more complex landscape
f = lambda x, y: x**2 + y**2 + 10 * np.sin(x) + 10 * np.sin(y)
grad_f = lambda x, y: Vector(2 * x + 10 * np.cos(x), 2 * y + 10 * np.cos(y))

# initial guess and learning rate, this is x0
xi = Vector(1,0) 
learning_rate = 0.1
trajectory = [xi]
n_steps = 10

# Gradient descent algorithm
for _ in range(n_steps):  # Fewer iterations for clarity

    # compute gradient vector
    grad = grad_f(xi.x, xi.y)

    # this is -learning_rate * grad
    grad.scale(-learning_rate)

    # this is x_{i+1} = x_i  - learning_rate * grad
    xi = xi + grad

    # append new point to trajectory
    trajectory.append(xi)

# why not write "xi - learning_rate * grad" ?
# because our custom class does not define subtraction and multiplication
# we only defined addition and scaling


### the following code is for visualizing the contour of the objective function f(x,y)
# see the documentation and example for contour plot
# https://matplotlib.org/stable/gallery/images_contours_and_fields/contour_demo.html#sphx-glr-gallery-images-contours-and-fields-contour-demo-py
# Create a grid of x, y trajectory for the contour plot
x = np.linspace(-3, 3, 400)
y = np.linspace(-3, 3, 400)
X, Y = np.meshgrid(x, y)
Z = f(X, Y)
# Contour plot
plt.contour(X, Y, Z, levels=10)  # Adjust the number of levels for more detail
### end of plotting contour

# plot trajectory
x_coords = [p.x for p in trajectory]
y_coords = [p.y for p in trajectory]
plt.plot(x_coords, y_coords, '-o')

plt.title('Gradient Descent on f(x, y)')
plt.xlabel('x')
plt.ylabel('y')
plt.show()

# print final objective value and final point
print(f'Final objective value: {f(xi.x, xi.y)} at point {xi}')
../_images/a369e207ecdd4bedf8e999269ec104053615a593fea19ef4aadc4bf25bab075e.png
Final objective value: -15.891646751230518 at point (-1.306439918487552,-1.3064400276166004)