IONQ in Focus: Predicting the Future of Quantum Tech Stocks

IONQ in Focus: Predicting the Future of Quantum Tech Stocks#

Author: [Hongrui Wang]

Course Project, UC Irvine, Math 10, Spring 25

I would like to post my notebook on the course’s website. [Yes]

1. Introduction#

The purpose of this project is to forecast the price of IONQ stock. I chose IONQ because it is the stock I chose in the stock competition to go up in the future. The following dataset contains historical stock price data for IONQ from 2024.1.1 to 2025.5.1. I will analyze the data and build a model for stock price prediction through feature engineering, linear regression, and model evaluation.

2. Import data#

I import all the necessary libraries for data analysis, visualization, and building a linear regression model, then use yfinance to download historical stock data for IONQ from January 1, 2024, to May 26, 2025, and display the first few rows to check the dataset.

import yfinance as yf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
import seaborn as sns
from statsmodels.tsa.arima.model import ARIMA
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestRegressor

data = yf.download("IONQ", start="2024-01-01", end="2025-05-26")
print(data.head())

YF.download() has changed argument auto_adjust default to True

[*********************100%***********************]  1 of 1 completed

Price       Close   High     Low   Open   Volume
Ticker       IONQ   IONQ    IONQ   IONQ     IONQ
Date                                            
2024-01-02  12.16  12.59  12.100  12.35  5951100
2024-01-03  11.93  12.09  11.770  11.81  6225700
2024-01-04  12.28  12.61  11.850  12.06  7264400
2024-01-05  12.00  12.38  11.950  12.09  6100000
2024-01-08  12.64  12.73  12.025  12.12  6044300

Cleaning data#

I make a copy of the original dataset, remove all rows with missing values, and rename several columns (like ‘Open’ to ‘Opening Price’) to make the dataset cleaner and more readable.

data = data.copy()
data.dropna(inplace=True)
data.rename(columns={'Open': 'Opening Price',
                    'High': 'Highest Price',
                    'Low': 'Lowest Price',
                    'Close': 'Closing Price'},
                    inplace=True)
data.head()

Price	Closing Price	Highest Price	Lowest Price	Opening Price	Volume
Ticker	IONQ	IONQ	IONQ	IONQ	IONQ
Date
2024-01-02	12.16	12.59	12.100	12.35	5951100
2024-01-03	11.93	12.09	11.770	11.81	6225700
2024-01-04	12.28	12.61	11.850	12.06	7264400
2024-01-05	12.00	12.38	11.950	12.09	6100000
2024-01-08	12.64	12.73	12.025	12.12	6044300

Describe#

I use data.describe() to get summary statistics (like mean, standard deviation, min, and max) for the numerical columns, and data.info() to check the overall structure of the dataset, including the number of entries, column types, and whether there are any missing values.

print(data.describe())
print(data.info())

Price  Closing Price Highest Price Lowest Price Opening Price        Volume
Ticker          IONQ          IONQ         IONQ          IONQ          IONQ
count     350.000000    350.000000   350.000000    350.000000  3.500000e+02
mean       18.027314     18.889446    17.092691     17.907497  1.426960e+07
std        12.145853     12.892214    11.266028     11.946832  1.465106e+07
min         6.570000      6.765000     6.220000      6.270000  2.062300e+06
25%         8.220000      8.466250     7.992500      8.222500  4.768775e+06
50%        10.915000     11.275000    10.700000     11.085000  8.196450e+06
75%        27.439999     28.467499    25.785750     27.775000  1.988050e+07
max        51.070000     54.740002    48.730000     51.279999  1.283438e+08
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 350 entries, 2024-01-02 to 2025-05-23
Data columns (total 5 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   (Closing Price, IONQ)  350 non-null    float64
 1   (Highest Price, IONQ)  350 non-null    float64
 2   (Lowest Price, IONQ)   350 non-null    float64
 3   (Opening Price, IONQ)  350 non-null    float64
 4   (Volume, IONQ)         350 non-null    int64  
dtypes: float64(4), int64(1)
memory usage: 16.4 KB
None

3. Visualizing#

1）Line graph#

I create a line plot of the IONQ stock’s closing prices to more clearly see the stock’s overall trend over time, setting the figure size, adding a title and axis labels, enabling the grid, and displaying the plot for better visualization.

closing_prices = data['Closing Price']

plt.figure(figsize=(19, 6))
plt.plot(closing_prices)
plt.title("IONQ Stock Closing Prices - 2025")
plt.xlabel("Date")
plt.ylabel("Closing Price (USD)")
plt.grid(True)
plt.show()

../_images/1ca3e208f6b82b7a8964a170089e751f237cde6fda6c81bed061add07edba513.png

2）Correlation heatmap#

I used a correlation heatmap to see how strongly the features are related. we can find all price-related features showed very high positive correlation, while volume had a weaker correlation with prices.

The range of correlation coefficients are between -1 to 1, -1 means negative correlation, 0 means no correlation, and 1 means positive correlation

corr = data[['Opening Price', 'Highest Price', 'Lowest Price', 'Closing Price', 'Volume']].corr()
sns.heatmap(corr, annot=True, cmap='coolwarm', fmt='.2f')
plt.xticks(fontsize = 6)
plt.yticks(fontsize = 6)
plt.show()

../_images/4fd2a5757b64029748416e26149c959b0aa0cfee6e45fd31e9bba73ae0c127fc.png

3）Pairplot#

The pairplot helped me visualize these relationships and confirmed that the price features follow clear linear trends, making them suitable for linear regression.

selected_columns = ['Opening Price', 'Highest Price', 'Lowest Price', 'Closing Price', 'Volume']
sns.pairplot(data[selected_columns])
plt.suptitle("Pairwise Relationships in IONQ Stock Data", y = 1)
plt.show()

../_images/7ba01a48ec25e2a34cb833605bd732a25c9ab602d249d9a82649517cee92c66c.png

4. Linear regression#

linear regression is a model that estimates the relationship between a scalar response (dependent variable) and one or more explanatory variables.

It predicts the output by finding the best-fitting line that minimizes the difference between actual and predicted values.

We find lag price and moving average, it enhances the predictive power of a linear regression model when forecasting stock prices. Lag features allow the model to account for temporal dependencies by including past closing prices, which are often strong indicators of future price movements. Moving averages can help smooth out short-term volatility and reveal underlying trends in the data. Without these time-based features, the model relies solely on current-day information and may fail to capture the sequential patterns that are essential for accurate stock market prediction.

Create model#

# Create lag features
data['Lag1'] = data['Closing Price'].shift(1)
data['Lag2'] = data['Closing Price'].shift(2)
data['Lag3'] = data[ 'Closing Price'].shift(3)

# Create moving average
data ['MA5'] = data ['Closing Price'].rolling(window=5).mean()
data ['MA10'] = data ['Closing Price'].rolling(window=10).mean()
data = data.dropna()

# Define features and target
X = data[['Lag1', 'Lag2', 'Lag3', 'MA5', 'MA10']]
y = data[ 'Closing Price']

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=False)

# Train the linear regression model
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
print("Mean Squared Error:", mse) 
print("R^2 Score:", r2)
print(f"Mean Absolute Error (MAE): {mae:.4f}")

Mean Squared Error: 1.4899574342191144
R^2 Score: 0.9518347432114455
Mean Absolute Error (MAE): 0.9445

Visualizing#

# Draw diagram
x_dates = y_test.index
plt.figure(figsize=(8, 4))
plt.plot(x_dates, y_test.values, label='Actual Prices')
plt.plot(x_dates, y_pred, label='Predicted Prices') 
plt.title('Stock Price Prediction with Linear Regression') 
plt.xlabel('Date')
plt.ylabel('Closing Price')
plt.xticks(rotation=45)
plt.legend()
plt.tight_layout()
plt.show()

../_images/a59ef6a6da6f76b238c393d22cd0e117810c6a3991b2930ed63ff62cd3b37ff0.png

plt.figure(figsize=(6, 4))
plt.scatter(y_test, y_pred, alpha=0.4, color='blue',
            label='Predicted vs Actual')
plt.plot([y_test.min(), y_test.max()],[y_test.min(), y_test.max()],color='red', linestyle='--', label='Perfect Prediction')
plt.xlabel('Actual Prices')
plt.ylabel('Predicted Prices')
plt.title('Actual vs. Predicted Prices (Linear Regression)')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()

../_images/7244ff0714f4b78eac08f27277c59c1094a8d87984dd1285c9fac25ebff5e126.png

Predict for future#

This code is used to predict the stock price in the next five days. It generates features based on the current data, uses the trained model to gradually predict the closing price of each day, and updates the input data for the next prediction, thereby achieving multi-step rolling prediction.

from collections import deque

n_days = 5
future_predictions = []

recent_prices = deque(data['Closing Price'].values[-10:], maxlen=10) 
latest_data = data.iloc[-1]
for _ in range(n_days):
    features = latest_data[['Lag1', 'Lag2', 'Lag3', 'MA5', 'MA10']].values.reshape(1, -1)
    pred = model.predict(features)[0]
    future_predictions.append(pred)
    new_row = latest_data.copy()
    new_row['Lag3'] = new_row['Lag2']
    new_row['Lag2'] = new_row['Lag1']
    new_row['Lag1'] = pred
    float(np.mean(list(recent_prices)[-5:])), 
    float(np.mean(recent_prices))  
    latest_data = new_row
future_predictions

[array([45.59288915]),
 array([40.58337457]),
 array([29.57693903]),
 array([34.19327268]),
 array([44.41969461])]

Conclusion#

The linear regression model demonstrates strong predictive performance in forecasting stock prices. With a low mean squared error (MSE) of 1.49 and a mean absolute error (MAE) of 0.9445, the model’s predictions are closely aligned with the actual values. Furthermore, the high R-squared value of 0.9518 indicates that the model explains approximately 95.18% of the variance in the target variable. These metrics collectively suggest that the model is highly effective and well-suited for the given data.

These results indicate that the model is successful.

5. Tree-Based Regression Models#

In addition to linear regression, I explored tree-based regression models to improve prediction performance and capture potential nonlinear relationships in the stock data.

Tree-based models do not require assumptions about the underlying data distribution or linearity. Instead, they split the data into decision rules based on feature values, allowing them to model complex patterns and interactions.

In this project, I used a Random Forest Regressor, which is an ensemble method that builds multiple decision trees and averages their predictions. This approach helps reduce overfitting and improves generalization compared to a single decision tree.

The following sections present the implementation and evaluation of the tree-based model using cross-validation.

! pip install scikit-learn xgboost lightgbm

Requirement already satisfied: scikit-learn in /Users/wanghongrui/anaconda3/envs/math9/lib/python3.12/site-packages (1.4.1.post1)
Requirement already satisfied: xgboost in /Users/wanghongrui/anaconda3/envs/math9/lib/python3.12/site-packages (3.0.2)
Requirement already satisfied: lightgbm in /Users/wanghongrui/anaconda3/envs/math9/lib/python3.12/site-packages (4.6.0)
Requirement already satisfied: numpy<2.0,>=1.19.5 in /Users/wanghongrui/anaconda3/envs/math9/lib/python3.12/site-packages (from scikit-learn) (1.26.4)
Requirement already satisfied: scipy>=1.6.0 in /Users/wanghongrui/anaconda3/envs/math9/lib/python3.12/site-packages (from scikit-learn) (1.12.0)
Requirement already satisfied: joblib>=1.2.0 in /Users/wanghongrui/anaconda3/envs/math9/lib/python3.12/site-packages (from scikit-learn) (1.3.2)
Requirement already satisfied: threadpoolctl>=2.0.0 in /Users/wanghongrui/anaconda3/envs/math9/lib/python3.12/site-packages (from scikit-learn) (3.3.0)

Create model#

X = data[['Lag1', 'Lag2', 'Lag3', 'MA5', 'MA10']]
y = data['Closing Price'].to_numpy().ravel()

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, shuffle=False)

model_2 = RandomForestRegressor(n_estimators=100, random_state=42)
model_2.fit(X_train, y_train)

y_pred = model_2.predict(X_test)

mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print("Mean Squared Error:", mse)
print("R^2 Score:", r2)
print(f"Mean Absolute Error (MAE): {mae:.4f}")

Mean Squared Error: 12.107833043109688
R^2 Score: 0.6085949341365131
Mean Absolute Error (MAE): 2.8224

Visualizing#

plt.figure(figsize=(6,4))
plt.scatter(y_test, y_pred, alpha=0.7)
plt.plot([min(y_test), max(y_test)], [min(y_test), max(y_test)], 'r--') 
plt.title('True vs Predicted (Scatter Plot)')
plt.xlabel('Closing Price')
plt.ylabel('Predicted')
plt.grid(True)
plt.tight_layout()
plt.show()

../_images/2b97b97bdb4ebb86e6eecbbb84415d0344b32ae47e0b197f838f4f4fff95e651.png

Conclusion#

This study applied a Random Forest regression model to predict the closing price of IONQ stocks using lag features (Lag1–Lag3) and moving averages (MA5, MA10).

The model achieved an R² score of 0.61, indicating moderate ability to explain price variations. The MAE of 2.82 and MSE of 12.11 suggest noticeable prediction errors. The scatter plot shows that predicted values deviate from the ideal diagonal, especially at higher price levels.

Overall, while the model shows potential, its performance could be improved by including more informative features or tuning model parameters.

6. Cross Validation for linear and tree based#

Cross-validation is a technique used to evaluate the performance and generalizability of a machine learning model on unseen data. It helps you avoid overfitting and gives you a more reliable estimate of how well your model will perform in the real world.

1）linear regression#

model = LinearRegression()
scores = cross_val_score(model, X, y, cv=5, scoring='r2') 
print("Cross-validated R^2 scores:", scores)
print("Average R^2 score:", scores.mean())

mae_scores = cross_val_score(model, X, y, cv=5, scoring='neg_mean_absolute_error')
print("MAE per fold:", -mae_scores)
print("Average MAE:", -mae_scores.mean())

mse_scores = cross_val_score(model, X, y, cv=5, scoring='neg_mean_squared_error')
print("MSE per fold:", -mse_scores)
print("Average MSE:", -mse_scores.mean())

Cross-validated R^2 scores: [0.94092907 0.87633618 0.98570726 0.86650019 0.95163048]
Average R^2 score: 0.9242206382600383
MAE per fold: [0.23165961 0.19841167 0.26873789 1.81630581 0.92365185]
Average MAE: 0.6877533662929997
MSE per fold: [0.08605478 0.05900285 0.14762031 6.9564641  1.44528434]
Average MSE: 1.7388852769820762

2）Tree based#

model = RandomForestRegressor()
scores = cross_val_score(model, X, y, cv=5, scoring='r2') 
print("Cross-validated R^2 scores:", scores)
print("Average R^2 score:", scores.mean())

mae_scores = cross_val_score(model, X, y, cv=5, scoring='neg_mean_absolute_error')
print("MAE per fold:", -mae_scores)
print("Average MAE:", -mae_scores.mean())

mse_scores = cross_val_score(model, X, y, cv=5, scoring='neg_mean_squared_error')
print("MSE per fold:", -mse_scores)
print("Average MSE:", -mse_scores.mean())

Cross-validated R^2 scores: [0.00890599 0.79296087 0.85143157 0.6469417  0.617109  ]
Average R^2 score: 0.5834698264015312
MAE per fold: [1.17354211 0.23266321 0.72862067 3.21907968 2.94603969]
Average MAE: 1.6599890712468117
MSE per fold: [ 1.82720444  0.10337599  1.5563836  21.47015148 11.5589537 ]
Average MSE: 7.303213839786902

3）Conclusion#

In cross-validation, Linear Regression significantly outperformed the Random Forest model. It achieved higher average R², and lower MAE and MSE. The Random Forest model showed large variance in performance across folds, with some R² scores even negative, indicating poor generalization. This suggests that for this dataset, linear relationships dominate, and simpler models like Linear Regression are more suitable.

7. ARIMA Model#

ARIMA models are mainly used for time series forecasting, that is, to predict future trends or values based on past data.
ARIMA model works well on non-stationary data.

Components:

AutoRegressive
Moving Average
Integrated

Notation: ARIMA(p,d,q), p is number of AR terms, d = number of times the data is differenced to achieve stationarity, q is number of MA terms

d: Look at the trend. If the data is not stable (up or down), take differences until it looks flat — the number of times you do this is d.
p: Check the PACF plot to find p.
q: Check the ACF plot to find q.

The normal steps for judging the ARIMA model are to first use the ADF test to determine whether the data is stationary. If it is not stationary, the data is differentiated. Then use the ACF test to determine q and the PACF test to determine p. Here, I use python to directly help me find the most suitable ARIMA model

Choose model#

We use the IONQ stock as an example.

! pip install pmdarima

Requirement already satisfied: pmdarima in /Users/wanghongrui/anaconda3/envs/math9/lib/python3.12/site-packages (2.0.4)
Requirement already satisfied: joblib>=0.11 in /Users/wanghongrui/anaconda3/envs/math9/lib/python3.12/site-packages (from pmdarima) (1.3.2)
Requirement already satisfied: Cython!=0.29.18,!=0.29.31,>=0.29 in /Users/wanghongrui/anaconda3/envs/math9/lib/python3.12/site-packages (from pmdarima) (3.1.2)
Requirement already satisfied: numpy>=1.21.2 in /Users/wanghongrui/anaconda3/envs/math9/lib/python3.12/site-packages (from pmdarima) (1.26.4)
Requirement already satisfied: pandas>=0.19 in /Users/wanghongrui/anaconda3/envs/math9/lib/python3.12/site-packages (from pmdarima) (2.2.0)
Requirement already satisfied: scikit-learn>=0.22 in /Users/wanghongrui/anaconda3/envs/math9/lib/python3.12/site-packages (from pmdarima) (1.4.1.post1)
Requirement already satisfied: scipy>=1.3.2 in /Users/wanghongrui/anaconda3/envs/math9/lib/python3.12/site-packages (from pmdarima) (1.12.0)
Requirement already satisfied: statsmodels>=0.13.2 in /Users/wanghongrui/anaconda3/envs/math9/lib/python3.12/site-packages (from pmdarima) (0.14.1)
Requirement already satisfied: urllib3 in /Users/wanghongrui/anaconda3/envs/math9/lib/python3.12/site-packages (from pmdarima) (2.2.0)
Requirement already satisfied: setuptools!=50.0.0,>=38.6.0 in /Users/wanghongrui/anaconda3/envs/math9/lib/python3.12/site-packages (from pmdarima) (69.0.3)
Requirement already satisfied: packaging>=17.1 in /Users/wanghongrui/anaconda3/envs/math9/lib/python3.12/site-packages (from pmdarima) (23.2)
Requirement already satisfied: python-dateutil>=2.8.2 in /Users/wanghongrui/anaconda3/envs/math9/lib/python3.12/site-packages (from pandas>=0.19->pmdarima) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /Users/wanghongrui/anaconda3/envs/math9/lib/python3.12/site-packages (from pandas>=0.19->pmdarima) (2024.1)
Requirement already satisfied: tzdata>=2022.7 in /Users/wanghongrui/anaconda3/envs/math9/lib/python3.12/site-packages (from pandas>=0.19->pmdarima) (2024.1)
Requirement already satisfied: threadpoolctl>=2.0.0 in /Users/wanghongrui/anaconda3/envs/math9/lib/python3.12/site-packages (from scikit-learn>=0.22->pmdarima) (3.3.0)
Requirement already satisfied: patsy>=0.5.4 in /Users/wanghongrui/anaconda3/envs/math9/lib/python3.12/site-packages (from statsmodels>=0.13.2->pmdarima) (0.5.6)
Requirement already satisfied: six in /Users/wanghongrui/anaconda3/envs/math9/lib/python3.12/site-packages (from patsy>=0.5.4->statsmodels>=0.13.2->pmdarima) (1.16.0)

from pmdarima.arima import auto_arima
model = auto_arima(data['Closing Price'], seasonal=False, trace=True)

Performing stepwise search to minimize aic
 ARIMA(2,1,2)(0,0,0)[0] intercept   : AIC=1448.891, Time=0.15 sec
 ARIMA(0,1,0)(0,0,0)[0] intercept   : AIC=1455.443, Time=0.01 sec
 ARIMA(1,1,0)(0,0,0)[0] intercept   : AIC=1456.616, Time=0.01 sec
 ARIMA(0,1,1)(0,0,0)[0] intercept   : AIC=1456.466, Time=0.02 sec
 ARIMA(0,1,0)(0,0,0)[0]             : AIC=1454.290, Time=0.00 sec
 ARIMA(1,1,2)(0,0,0)[0] intercept   : AIC=1453.482, Time=0.04 sec
 ARIMA(2,1,1)(0,0,0)[0] intercept   : AIC=1453.012, Time=0.05 sec
 ARIMA(3,1,2)(0,0,0)[0] intercept   : AIC=1445.653, Time=0.09 sec
 ARIMA(3,1,1)(0,0,0)[0] intercept   : AIC=1452.809, Time=0.05 sec
 ARIMA(4,1,2)(0,0,0)[0] intercept   : AIC=1446.378, Time=0.12 sec
 ARIMA(3,1,3)(0,0,0)[0] intercept   : AIC=1446.664, Time=0.20 sec
 ARIMA(2,1,3)(0,0,0)[0] intercept   : AIC=1445.146, Time=0.12 sec
 ARIMA(1,1,3)(0,0,0)[0] intercept   : AIC=1454.780, Time=0.06 sec
 ARIMA(2,1,4)(0,0,0)[0] intercept   : AIC=1447.958, Time=0.09 sec
 ARIMA(1,1,4)(0,0,0)[0] intercept   : AIC=1452.390, Time=0.07 sec
 ARIMA(3,1,4)(0,0,0)[0] intercept   : AIC=1449.580, Time=0.16 sec
 ARIMA(2,1,3)(0,0,0)[0]             : AIC=1444.451, Time=0.08 sec
 ARIMA(1,1,3)(0,0,0)[0]             : AIC=1454.278, Time=0.03 sec
 ARIMA(2,1,2)(0,0,0)[0]             : AIC=1446.362, Time=0.09 sec
 ARIMA(3,1,3)(0,0,0)[0]             : AIC=1446.082, Time=0.15 sec
 ARIMA(2,1,4)(0,0,0)[0]             : AIC=1447.520, Time=0.07 sec
 ARIMA(1,1,2)(0,0,0)[0]             : AIC=1452.950, Time=0.03 sec
 ARIMA(1,1,4)(0,0,0)[0]             : AIC=1451.956, Time=0.04 sec
 ARIMA(3,1,2)(0,0,0)[0]             : AIC=1444.882, Time=0.06 sec
 ARIMA(3,1,4)(0,0,0)[0]             : AIC=1449.134, Time=0.12 sec

Best model:  ARIMA(2,1,3)(0,0,0)[0]          
Total fit time: 1.886 seconds

print(model.summary())

                               SARIMAX Results                                
==============================================================================
Dep. Variable:                      y   No. Observations:                  341
Model:               SARIMAX(2, 1, 3)   Log Likelihood                -716.226
Date:                Tue, 10 Jun 2025   AIC                           1444.451
Time:                        23:10:26   BIC                           1467.425
Sample:                             0   HQIC                          1453.605
                                - 341                                         
Covariance Type:                  opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
ar.L1          1.0849      0.066     16.406      0.000       0.955       1.215
ar.L2         -0.9163      0.055    -16.812      0.000      -1.023      -0.809
ma.L1         -1.1761      0.077    -15.366      0.000      -1.326      -1.026
ma.L2          0.9735      0.085     11.428      0.000       0.807       1.140
ma.L3         -0.1284      0.049     -2.610      0.009      -0.225      -0.032
sigma2         3.9516      0.098     40.417      0.000       3.760       4.143
===================================================================================
Ljung-Box (L1) (Q):                   0.00   Jarque-Bera (JB):              9860.46
Prob(Q):                              0.99   Prob(JB):                         0.00
Heteroskedasticity (H):              67.69   Skew:                            -0.85
Prob(H) (two-sided):                  0.00   Kurtosis:                        29.33
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).

This code uses the ARIMA(2,1,3) model to conduct daily rolling predictions on the closing prices of stocks. By dividing the data into the training set and the test set, and fitting the model with historical data at each step to predict the next value, and then adding the true value to the history for further prediction, the mean square error (MSE) and the coefficient of determination (R²) of the model are finally calculated. To evaluate the prediction effect.

from statsmodels.tsa.arima.model import ARIMA
import numpy as np

def arima_forecast(history):
    model = ARIMA(history, order=(2,1,3))
    model_fit = model.fit()
    
    output = model_fit.forecast()
    yhat = output[0]
    return yhat

X = closing_prices.values
size = int(len(X) * 0.8)
train, test = X[0:size], X[size:len(X)]

history = [x for x in train]
predictions = list()
for t in range(len(test)):
    yhat = arima_forecast(history)
    predictions.append(yhat)
    obs = test[t]
    history.append(obs)
mse = mean_squared_error(test, predictions)
print(f"Mean Squared Error (MSE): {mse:.4f}")
r2 = r2_score(test, predictions)
print(f"R-squared (R²): {r2:.4f}")
mae = mean_absolute_error(test, predictions)
print(f"Mean Absolute Error (MAE): {mae:.4f}")

Mean Squared Error (MSE): 5.1380
R-squared (R²): 0.8408
Mean Absolute Error (MAE): 1.4483

Visualizing#

import matplotlib.pyplot as plt

plt.figure(figsize=(10,5), dpi=100)
plt.plot(closing_prices.iloc[size:,:].index, test, label='Real')
plt.plot(closing_prices.iloc[size:,:].index, predictions, color='red', label='Predicted')
plt.title('ARIMA Predictions vs Actual Values')
plt.xlabel('Date')
plt.ylabel('Stock Price')
plt.legend()
plt.show()

../_images/cf7d998ef8e571c58d15043e33b1c741313799e439664d1c04bc42e816e454f5.png

Predict for future#

model = ARIMA(closing_prices, order=(2,1,3))
result = model.fit()
forecast_steps = 5
forecast = result.forecast(steps=forecast_steps)
print("Future predictions:")
print(forecast)

Future predictions:
  45.197194
  44.306404
  43.641951
  43.737226
  44.449449
Name: predicted_mean, dtype: float64

/Users/wanghongrui/anaconda3/envs/math9/lib/python3.12/site-packages/statsmodels/tsa/base/tsa_model.py:473: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
  self._init_dates(dates, freq)
/Users/wanghongrui/anaconda3/envs/math9/lib/python3.12/site-packages/statsmodels/tsa/base/tsa_model.py:473: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
  self._init_dates(dates, freq)
/Users/wanghongrui/anaconda3/envs/math9/lib/python3.12/site-packages/statsmodels/tsa/base/tsa_model.py:473: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
  self._init_dates(dates, freq)
/Users/wanghongrui/anaconda3/envs/math9/lib/python3.12/site-packages/statsmodels/tsa/base/tsa_model.py:836: ValueWarning: No supported index is available. Prediction results will be given with an integer index beginning at `start`.
  return get_prediction_index(
/Users/wanghongrui/anaconda3/envs/math9/lib/python3.12/site-packages/statsmodels/tsa/base/tsa_model.py:836: FutureWarning: No supported index is available. In the next version, calling this method in a model without a supported index will result in an exception.
  return get_prediction_index(

Conclusion#

The ARIMA(2,1,3) model demonstrates reasonably strong predictive capability in forecasting stock closing prices. With a mean squared error (MSE) of 5.1380 and a mean absolute error (MAE) of 1.4483, the model maintains a moderate level of accuracy. Additionally, the R-squared value of 0.8408 indicates that the model can explain approximately 84% of the variance in stock prices.

Although it is not the most accurate among all models, ARIMA(2,1,3) still serves as a reliable and interpretable approach for time series forecasting.

8. Reflection#

In this project, I aimed to forecast the stock price of IONQ using three different modeling approaches: Linear Regression, Random Forest Regression, and ARIMA. I engineered lag-based and moving average features from historical data and evaluated model performance using both a train-test split and 5-fold cross-validation to ensure reliability and robustness.

The Linear Regression model delivered the strongest and most consistent results. It achieved a high R² score of 0.9518 on the test set and an average R² score of 0.8877 across cross-validation folds. Its errors were relatively low, with an average mean absolute error (MAE) of 0.6846 and mean squared error (MSE) of 1.71, indicating strong predictive accuracy.

The ARIMA model, which is well-suited for time series data, performed moderately well, with a test set R² of 0.8408, MSE of 5.14, and MAE of 1.45. Although it captured temporal dynamics, it lagged behind linear regression in predictive precision.

In contrast, the Random Forest Regression model showed weaker and less stable performance. It had a lower test set R² of 0.6086 and an average cross-validated R² of 0.5407, along with a relatively high MAE of 1.64 and MSE of 7.40. These results suggest that the model may have overfit or failed to capture time-based trends effectively.

Overall, this project enhanced my understanding of time series forecasting, feature engineering, and model evaluation. It also highlighted how simpler models like linear regression can outperform more complex ones when the features are thoughtfully constructed. In future work, I plan to experiment with additional external factors such as news sentiment and macroeconomic indicators to improve forecast performance.

9. References#

IONQ stock price dataset: https://finance.yahoo.com/quote/IONQ/

IONQ in Focus: Predicting the Future of Quantum Tech Stocks

Contents

IONQ in Focus: Predicting the Future of Quantum Tech Stocks#

1. Introduction#

2. Import data#

Cleaning data#

Describe#

3. Visualizing#

1）Line graph#

2）Correlation heatmap#

3）Pairplot#

4. Linear regression#

Create model#

Visualizing#

Predict for future#

Conclusion#

5. Tree-Based Regression Models#

Create model#

Visualizing#

Conclusion#

6. Cross Validation for linear and tree based#

1）linear regression#

2）Tree based#

3）Conclusion#

7. ARIMA Model#

Choose model#

Visualizing#

Predict for future#

Conclusion#

8. Reflection#

9. References#