{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Prediction of High School Students' Academic Performance\n", "Author: Zhang Zhang\n", "\n", "Course Project, UC Irvine, Math 10, Fall 24\n", "\n", "I would like to post my notebook on the course's website. [Yes]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction\n", "\n", "Students' Academic performance can be influenced by multiple factors. In this project, we aim to do a multi-classification of a large set of high school student's grade base on their behaviors. Different models, such as Logistic Regression, Support Vector Machine, Random Forest, KNN, Neural Network is attempted in this project. The target variable, **GradeClass**, categorizes students' grades, making the dataset ideal for educational research, predictive modeling, and statistical analysis. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data Pre-processing\n", "\n", "The [dataset](https://www.kaggle.com/datasets/rabieelkharoua/students-performance-dataset/data) chosed in this project contains comprehensive information of **2392 high school students**. It gives both numerical and categorical data of each student to reflect their school performance related aspects. In this section, we will pre-process the data, including loading, cleaning, and transforming the raw data into a format suitable for model training and analysis. Also, we will have a overall impression on the dataset. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we first load the dataset and drop all null values. For convenience, we remapped the GradeClass variable from 0.0-4.0 to 1-5, while 1 means A class and 5 points to F class." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "GradeClass\n", "5.0 1211\n", "4.0 414\n", "3.0 391\n", "2.0 269\n", "1.0 107\n", "Name: count, dtype: int64\n" ] }, { "data": { "text/html": [ "
\n", " | StudentID | \n", "Age | \n", "Gender | \n", "Ethnicity | \n", "ParentalEducation | \n", "StudyTimeWeekly | \n", "Absences | \n", "Tutoring | \n", "ParentalSupport | \n", "Extracurricular | \n", "Sports | \n", "Music | \n", "Volunteering | \n", "GPA | \n", "GradeClass | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "1001 | \n", "17 | \n", "1 | \n", "0 | \n", "2 | \n", "19.833723 | \n", "7 | \n", "1 | \n", "2 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "2.929196 | \n", "3.0 | \n", "
1 | \n", "1002 | \n", "18 | \n", "0 | \n", "0 | \n", "1 | \n", "15.408756 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "3.042915 | \n", "2.0 | \n", "
2 | \n", "1003 | \n", "15 | \n", "0 | \n", "2 | \n", "3 | \n", "4.210570 | \n", "26 | \n", "0 | \n", "2 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0.112602 | \n", "5.0 | \n", "
3 | \n", "1004 | \n", "17 | \n", "1 | \n", "0 | \n", "3 | \n", "10.028829 | \n", "14 | \n", "0 | \n", "3 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "2.054218 | \n", "4.0 | \n", "
4 | \n", "1005 | \n", "17 | \n", "1 | \n", "0 | \n", "2 | \n", "4.672495 | \n", "17 | \n", "1 | \n", "3 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1.288061 | \n", "5.0 | \n", "
5 | \n", "1006 | \n", "18 | \n", "0 | \n", "0 | \n", "1 | \n", "8.191219 | \n", "0 | \n", "0 | \n", "1 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "3.084184 | \n", "2.0 | \n", "
6 | \n", "1007 | \n", "15 | \n", "0 | \n", "1 | \n", "1 | \n", "15.601680 | \n", "10 | \n", "0 | \n", "3 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "2.748237 | \n", "3.0 | \n", "
7 | \n", "1008 | \n", "15 | \n", "1 | \n", "1 | \n", "4 | \n", "15.424496 | \n", "22 | \n", "1 | \n", "1 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "1.360143 | \n", "5.0 | \n", "
8 | \n", "1009 | \n", "17 | \n", "0 | \n", "0 | \n", "0 | \n", "4.562008 | \n", "1 | \n", "0 | \n", "2 | \n", "0 | \n", "1 | \n", "0 | \n", "1 | \n", "2.896819 | \n", "3.0 | \n", "
9 | \n", "1010 | \n", "16 | \n", "1 | \n", "0 | \n", "1 | \n", "18.444466 | \n", "0 | \n", "0 | \n", "3 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "3.573474 | \n", "1.0 | \n", "
\n", " | Age | \n", "Gender | \n", "Ethnicity | \n", "ParentalEducation | \n", "StudyTimeWeekly | \n", "Absences | \n", "Tutoring | \n", "ParentalSupport | \n", "Extracurricular | \n", "Sports | \n", "Music | \n", "Volunteering | \n", "GPA | \n", "GradeClass | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "17 | \n", "1 | \n", "0 | \n", "2 | \n", "19.833723 | \n", "7 | \n", "1 | \n", "2 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "2.929196 | \n", "3.0 | \n", "
1 | \n", "18 | \n", "0 | \n", "0 | \n", "1 | \n", "15.408756 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "3.042915 | \n", "2.0 | \n", "
2 | \n", "15 | \n", "0 | \n", "2 | \n", "3 | \n", "4.210570 | \n", "26 | \n", "0 | \n", "2 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0.112602 | \n", "5.0 | \n", "
3 | \n", "17 | \n", "1 | \n", "0 | \n", "3 | \n", "10.028829 | \n", "14 | \n", "0 | \n", "3 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "2.054218 | \n", "4.0 | \n", "
4 | \n", "17 | \n", "1 | \n", "0 | \n", "2 | \n", "4.672495 | \n", "17 | \n", "1 | \n", "3 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1.288061 | \n", "5.0 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
2387 | \n", "18 | \n", "1 | \n", "0 | \n", "3 | \n", "10.680555 | \n", "2 | \n", "0 | \n", "4 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "3.455509 | \n", "1.0 | \n", "
2388 | \n", "17 | \n", "0 | \n", "0 | \n", "1 | \n", "7.583217 | \n", "4 | \n", "1 | \n", "4 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "3.279150 | \n", "5.0 | \n", "
2389 | \n", "16 | \n", "1 | \n", "0 | \n", "2 | \n", "6.805500 | \n", "20 | \n", "0 | \n", "2 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "1.142333 | \n", "3.0 | \n", "
2390 | \n", "16 | \n", "1 | \n", "1 | \n", "0 | \n", "12.416653 | \n", "17 | \n", "0 | \n", "2 | \n", "0 | \n", "1 | \n", "1 | \n", "0 | \n", "1.803297 | \n", "2.0 | \n", "
2391 | \n", "16 | \n", "1 | \n", "0 | \n", "2 | \n", "17.819907 | \n", "13 | \n", "0 | \n", "2 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "2.140014 | \n", "2.0 | \n", "
2392 rows × 14 columns
\n", "