Analysis of Tesla Stock Price in the most recnet 6 months

Analysis of Tesla Stock Price in the most recnet 6 months#

Author: [Chenxi Yang]

Course Project, UC Irvine, Math 10, S24

I would like to post my notebook on the course’s website. [Yes]

The dataset I am analyzing is the 6-month data of Tesla stocks downloaded from NASDAQ website. I will be analyzing the dataset using relevant knowledge learned from Math 10.

1 Importing and Cleaning Data

In this step, I just imported some tools and imported the dataset to the notebook so that I can access it. I checked the basic information of the data, to make sure there is nothing wroing with it. It is important to mention, that since the data of Volume is too big, it is necessary to make sure the format matches that of the notebook.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, confusion_matrix

data = pd.read_csv('tesla_stock.csv')
print(data)
pd.options.display.float_format = '{:.2f}'.format
           Date  Close/Last       Volume    Open    High     Low
0     6/12/2024      177.29   90389450.0  171.12  180.55  169.80
1     6/11/2024      170.66   64761930.0  173.92  174.75  167.41
2     6/10/2024      173.79   50869680.0  176.06  178.57  173.17
3      6/7/2024      177.48   56244930.0  176.13  179.35  175.58
4      6/6/2024      177.94   69887020.0  174.60  179.73  172.73
..          ...         ...          ...     ...     ...     ...
121  12/18/2023      252.08  116416500.0  253.78  258.74  251.36
122  12/15/2023      253.50  135932800.0  251.21  254.13  248.30
123  12/14/2023      251.05  160829200.0  241.22  253.88  240.79
124  12/13/2023      239.29  146286300.0  234.19  240.30  228.20
125  12/12/2023      237.01   95328310.0  238.55  238.99  233.87

[126 rows x 6 columns]
## Here, I checked for any possible missing values, since when saving, there could be data lost due to the format of the documents.
data['Date'] = pd.to_datetime(data['Date'])
print(data.isnull().sum())
print(data.describe())
print(data.info())
Date          0
Close/Last    0
Volume        0
Open          0
High          0
Low           0
dtype: int64
                      Date  Close/Last       Volume   Open   High    Low
count                  126      126.00       126.00 126.00 126.00 126.00
mean   2024-03-13 09:20:00      191.54  98652550.56 191.70 195.12 188.08
min    2023-12-12 00:00:00      142.05  50869680.00 140.56 144.44 138.80
25%    2024-01-29 06:00:00      174.14  81622607.50 173.64 176.38 170.96
50%    2024-03-13 12:00:00      180.06  96402245.00 182.05 184.59 177.25
75%    2024-04-28 06:00:00      201.52 111163200.00 201.68 205.09 198.39
max    2024-06-12 00:00:00      261.44 243869700.00 263.66 265.13 257.52
std                    NaN       28.21  27664733.79  28.56  28.53  28.11
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 126 entries, 0 to 125
Data columns (total 6 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   Date        126 non-null    datetime64[ns]
 1   Close/Last  126 non-null    float64       
 2   Volume      126 non-null    float64       
 3   Open        126 non-null    float64       
 4   High        126 non-null    float64       
 5   Low         126 non-null    float64       
dtypes: datetime64[ns](1), float64(5)
memory usage: 6.0 KB
None

2 Most Basic Descriptions of the Dataset

Here, I analyze the most basic concept to prepare for further analysis later on

## Here, I will analyze from the basic concept: the main descriptive information of the dataset
summary = data[['Close/Last', 'Volume', 'Open', 'High', 'Low']].describe()
# Display summary statistics
print("Summary Statistics:")
print(summary)
## Here, we have the most basic information about the stock price of Tesla from the most recent six months. However, it would not have any use for us 
## when analyzing price. I can still tell from the statistics that in the recent six months, the average price of Tesla is $191, ranging between the
## interval [142.05,261.44]
Summary Statistics:
       Close/Last       Volume   Open   High    Low
count      126.00       126.00 126.00 126.00 126.00
mean       191.54  98652550.56 191.70 195.12 188.08
std         28.21  27664733.79  28.56  28.53  28.11
min        142.05  50869680.00 140.56 144.44 138.80
25%        174.14  81622607.50 173.64 176.38 170.96
50%        180.06  96402245.00 182.05 184.59 177.25
75%        201.52 111163200.00 201.68 205.09 198.39
max        261.44 243869700.00 263.66 265.13 257.52
# Median for numeric columns
medians = data[['Close/Last', 'Volume', 'Open', 'High', 'Low']].median()

# Display medians
print("\nMedians:")
print(medians)

# Variance
variance = data[['Close/Last', 'Volume', 'Open', 'High', 'Low']].var()

# Standard deviation
std_dev = data[['Close/Last', 'Volume', 'Open', 'High', 'Low']].std()

# Display variance
print("\nVariance:")
print(variance)

# Display standard deviation
print("\nStandard Deviation:")
print(std_dev)
Medians:
Close/Last        180.06
Volume       96402245.00
Open              182.05
High              184.59
Low               177.25
dtype: float64

Variance:
Close/Last               795.98
Volume       765337495676689.88
Open                     815.68
High                     813.88
Low                      790.04
dtype: float64

Standard Deviation:
Close/Last         28.21
Volume       27664733.79
Open               28.56
High               28.53
Low                28.11
dtype: float64

3 Linear Regression on Price and Volume

I apply Linear Regression here to explore the relationship between the Price and the Volume of Tesla Stocks, since one of the commonly known pattern is that the Volume confirms price and then the price goes up further. I plan to prove this and see if it happens on Tesla as well. Therefore, I have the scatter plot of the Price and Volume of Tesla. One thing to mention here is that I intentionally put Price on the y-axis so that it will be easier to see, and I view Volume here as a independent variable, when Price is treated as the dependent variable.

# Scatter plot of Volume vs Close/Last
plt.figure(figsize=(10, 6))
plt.scatter(data['Volume'], data['Close/Last'], color='skyblue', alpha=0.6)
plt.title('Scatter Plot of Volume vs Close/Last')
plt.xlabel('Volume')
plt.ylabel('Close/Last')
plt.grid(True)
plt.tight_layout()
plt.show()
../_images/949f4fdc91a8b35271ab1bf8b000313c23219bb213c6e86f9fbe63d0d7bb4166.png

Linear Regression

After the visualization, it is clear that they are linearly related. As the Volume goes up, most of the time the Price goes up as well. However, the slope is not influential or obvious enough for me to utilize this pattern.

# Define X and y 
X = data['Volume']
y = data['Close/Last']

# Add a constant for the intercept
X = sm.add_constant(X)

# Fit the linear regression model
model = sm.OLS(y, X)
results = model.fit()

# Visualizing the Linear Regression
plt.figure(figsize=(10, 6))
plt.scatter(data['Volume'], data['Close/Last'], color='skyblue', alpha=0.6, label='Data Points')
plt.plot(data['Volume'], results.predict(X), color='red', linewidth=2, label='Linear Regression')
plt.title('Linear Regression: Volume vs Close/Last')
plt.xlabel('Volume')
plt.ylabel('Close/Last')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()
../_images/e9f31e37cc21ad2ebf9f741d86248ebf31310033a65fb83f10b88a2b8a013040.png

4 K-NN Classification on the Dataset

# Create a new column 'Target' based on the change in 'Close/Last'
data['Target'] = (data['Close/Last'].shift(-1) >= data['Close/Last']).astype(int)

# Drop rows with NaN values (last row where there's no next day data)
data.dropna(inplace=True)

# Separate (X) and (y)
X = data[['Volume']]  # Features: Only using 'Volume' as an example
y = data['Target']     # Target: 'Target' column indicating price increase or decrease

# Split data into training and testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
k = 5  # Example value of K
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)

print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))
Confusion Matrix:
 [[ 4 16]
 [ 8 10]]

Classification Report:
               precision    recall  f1-score   support

           0       0.33      0.20      0.25        20
           1       0.38      0.56      0.45        18

    accuracy                           0.37        38
   macro avg       0.36      0.38      0.35        38
weighted avg       0.36      0.37      0.35        38

It seems like the precision and accuracy of K-NN Classifier on Tesla dataset does not work very well. The percentage is low, and that it is useless to calculate hte Price using K-NN Classifier.

5 Reference: The dataset is downloaded from https://www.nasdaq.com/market-activity/stocks/tsla/historical?page=13&rows_per_page=10&timeline=m6