{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# An Analysis on the Likeliness of People Changing Their Occupation\n", "\n", "Author: Nikolina Sentovich\n", "\n", "Course Project, UC Irvine, Math 10, F24\n", "\n", "I would like to post my notebook on the course's website. [Yes]\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this project, I will analyze what causes people to change jobs. As a college student who is not 100% set on what I want to do, I am curious to find out what factors have the biggest impact on prompting a career switch. Are there certain fields of study that are more likely to ? What about certain occupations? Are there other important factors? \n", " \n", "The goal of this project is to perform data cleaning and explore the data set by creating different models in order to get a better understanding of industry, the job market, and likelihood of changing occupation. I found a dataset on Kaggle which, according to the creator, has 30,000+ records and 22 features. \n", " \n", "[Kaggle Dataset: Field Of Study vs Occupation](https://www.kaggle.com/datasets/jahnavipaliwal/field-of-study-vs-occupation)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Libraries and Imports" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import seaborn as sns\n", "import matplotlib.pyplot as plt\n", "from IPython.display import display\n", "from sklearn.preprocessing import OneHotEncoder\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.linear_model import Lasso\n", "import numpy as np\n", "from sklearn.preprocessing import StandardScaler\n", "from sklearn.metrics import mean_squared_error\n", "from sklearn.ensemble import RandomForestClassifier\n", "from sklearn.metrics import accuracy_score\n", "from sklearn.model_selection import cross_val_score\n", "from sklearn.metrics import ConfusionMatrixDisplay, confusion_matrix\n", "from pandas.plotting import scatter_matrix" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dataset" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv('career_change_prediction_dataset.csv').copy()" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | Field of Study | \n", "Current Occupation | \n", "Age | \n", "Gender | \n", "Years of Experience | \n", "Education Level | \n", "Industry Growth Rate | \n", "Job Satisfaction | \n", "Work-Life Balance | \n", "Job Opportunities | \n", "... | \n", "Skills Gap | \n", "Family Influence | \n", "Mentorship Available | \n", "Certifications | \n", "Freelancing Experience | \n", "Geographic Mobility | \n", "Professional Networks | \n", "Career Change Events | \n", "Technology Adoption | \n", "Likely to Change Occupation | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "Medicine | \n", "Business Analyst | \n", "48 | \n", "Male | \n", "7 | \n", "High School | \n", "High | \n", "7 | \n", "10 | \n", "83 | \n", "... | \n", "8 | \n", "High | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "2 | \n", "0 | \n", "1 | \n", "0 | \n", "
1 | \n", "Education | \n", "Economist | \n", "44 | \n", "Male | \n", "26 | \n", "Master's | \n", "Low | \n", "10 | \n", "3 | \n", "55 | \n", "... | \n", "3 | \n", "Medium | \n", "0 | \n", "0 | \n", "1 | \n", "1 | \n", "2 | \n", "1 | \n", "9 | \n", "0 | \n", "
2 | \n", "Education | \n", "Biologist | \n", "21 | \n", "Female | \n", "27 | \n", "Master's | \n", "Low | \n", "8 | \n", "3 | \n", "78 | \n", "... | \n", "4 | \n", "Low | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "2 | \n", "1 | \n", "2 | \n", "0 | \n", "
3 | \n", "Education | \n", "Business Analyst | \n", "33 | \n", "Male | \n", "14 | \n", "PhD | \n", "Medium | \n", "7 | \n", "9 | \n", "62 | \n", "... | \n", "2 | \n", "Medium | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "9 | \n", "0 | \n", "1 | \n", "0 | \n", "
4 | \n", "Arts | \n", "Doctor | \n", "28 | \n", "Female | \n", "0 | \n", "PhD | \n", "Low | \n", "3 | \n", "1 | \n", "8 | \n", "... | \n", "5 | \n", "Low | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "2 | \n", "0 | \n", "7 | \n", "1 | \n", "
5 rows × 23 columns
\n", "\n", " | Age | \n", "Years of Experience | \n", "Job Satisfaction | \n", "Work-Life Balance | \n", "Job Opportunities | \n", "Salary | \n", "Job Security | \n", "Career Change Interest | \n", "Skills Gap | \n", "Mentorship Available | \n", "... | \n", "Education Level_High School | \n", "Education Level_Master's | \n", "Education Level_PhD | \n", "Industry Growth Rate_High | \n", "Industry Growth Rate_Low | \n", "Industry Growth Rate_Medium | \n", "Family Influence_High | \n", "Family Influence_Low | \n", "Family Influence_Medium | \n", "Family Influence_nan | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "48 | \n", "7 | \n", "7 | \n", "10 | \n", "83 | \n", "198266 | \n", "8 | \n", "0 | \n", "8 | \n", "0 | \n", "... | \n", "1.0 | \n", "0.0 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "
1 | \n", "44 | \n", "26 | \n", "10 | \n", "3 | \n", "55 | \n", "96803 | \n", "9 | \n", "0 | \n", "3 | \n", "0 | \n", "... | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "
2 | \n", "21 | \n", "27 | \n", "8 | \n", "3 | \n", "78 | \n", "65920 | \n", "4 | \n", "0 | \n", "4 | \n", "0 | \n", "... | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "0.0 | \n", "
3 | \n", "33 | \n", "14 | \n", "7 | \n", "9 | \n", "62 | \n", "85591 | \n", "5 | \n", "0 | \n", "2 | \n", "1 | \n", "... | \n", "0.0 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "
4 | \n", "28 | \n", "0 | \n", "3 | \n", "1 | \n", "8 | \n", "43986 | \n", "3 | \n", "0 | \n", "5 | \n", "0 | \n", "... | \n", "0.0 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "0.0 | \n", "
5 rows × 50 columns
\n", "\n", " | Age | \n", "Years of Experience | \n", "Job Satisfaction | \n", "Job Opportunities | \n", "Salary | \n", "Career Change Interest | \n", "Mentorship Available | \n", "Certifications | \n", "Freelancing Experience | \n", "Geographic Mobility | \n", "... | \n", "Current Occupation_Lawyer | \n", "Current Occupation_Mechanical Engineer | \n", "Current Occupation_Software Developer | \n", "Current Occupation_Teacher | \n", "Gender_Female | \n", "Gender_Male | \n", "Education Level_High School | \n", "Industry Growth Rate_Medium | \n", "Family Influence_High | \n", "Likely to Change Occupation | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "48 | \n", "7 | \n", "7 | \n", "83 | \n", "198266 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "... | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "1.0 | \n", "1.0 | \n", "0.0 | \n", "1.0 | \n", "0 | \n", "
1 | \n", "44 | \n", "26 | \n", "10 | \n", "55 | \n", "96803 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "1 | \n", "... | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0 | \n", "
2 | \n", "21 | \n", "27 | \n", "8 | \n", "78 | \n", "65920 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0 | \n", "
3 | \n", "33 | \n", "14 | \n", "7 | \n", "62 | \n", "85591 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "0 | \n", "
4 | \n", "28 | \n", "0 | \n", "3 | \n", "8 | \n", "43986 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "... | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "1 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
38439 | \n", "24 | \n", "34 | \n", "8 | \n", "92 | \n", "117728 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "0.0 | \n", "1 | \n", "
38440 | \n", "21 | \n", "24 | \n", "2 | \n", "73 | \n", "132500 | \n", "1 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "0.0 | \n", "1 | \n", "
38441 | \n", "35 | \n", "21 | \n", "4 | \n", "77 | \n", "55301 | \n", "0 | \n", "1 | \n", "1 | \n", "1 | \n", "0 | \n", "... | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "0.0 | \n", "1 | \n", "
38442 | \n", "35 | \n", "11 | \n", "9 | \n", "63 | \n", "171459 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "... | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "0 | \n", "
38443 | \n", "37 | \n", "23 | \n", "6 | \n", "49 | \n", "189967 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "1 | \n", "... | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "0 | \n", "
38444 rows × 29 columns
\n", "