{ "cells": [ { "cell_type": "markdown", "id": "41b4b1fb", "metadata": {}, "source": [ "# Phone cost prediction \n", "
\n", "Author: Presnyakov Oleg\n", "
\n", "Course Project, UC Irvine, Math 10, S24\n", "
\n", "I would like to post my notebook on the course’s website. Yes" ] }, { "cell_type": "markdown", "id": "4ccf31fc", "metadata": {}, "source": [ "## Introduction:\n", "In this project I am aimimg to build a model that will predict a price for a phone using it's features. \n", "
\n", "The data was taken from a relatively new dataset on kaggle:
\n", "https://www.kaggle.com/datasets/dewangmoghe/mobile-phone-price-prediction/data" ] }, { "cell_type": "markdown", "id": "5e4fa5b6", "metadata": {}, "source": [ "## Importing and setting up the data" ] }, { "cell_type": "code", "execution_count": 303, "id": "0d70f9e2", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import altair as alt\n", "import numpy as np\n", "import re\n", "import matplotlib.pyplot as plt\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.linear_model import LinearRegression\n", "from sklearn.metrics import mean_squared_error\n", "from sklearn.tree import DecisionTreeClassifier, plot_tree\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.linear_model import LogisticRegression\n", "from sklearn.metrics import mean_squared_error, classification_report\n", "from sklearn.preprocessing import StandardScaler\n", "from sklearn.neighbors import KNeighborsClassifier\n", "from sklearn.cluster import KMeans\n", "from sklearn.ensemble import RandomForestRegressor\n", "from sklearn.model_selection import KFold\n", "from sklearn.metrics import r2_score\n", "from sklearn.impute import SimpleImputer" ] }, { "cell_type": "code", "execution_count": 304, "id": "fa049b5d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Unnamed: 0NameRatingSpec_scoreNo_of_simRamBatteryDisplayCameraExternal_MemoryAndroid_versionPricecompanyInbuilt_memoryfast_chargingScreen_resolutionProcessorProcessor_name
00Samsung Galaxy F14 5G4.6568Dual Sim, 3G, 4G, 5G, VoLTE,4 GB RAM6000 mAh Battery6.6 inches50 MP + 2 MP Dual Rear & 13 MP Front CameraMemory Card Supported, upto 1 TB139,999Samsung128 GB inbuilt25W Fast Charging2408 x 1080 px Display with Water Drop NotchOcta Core ProcessorExynos 1330
11Samsung Galaxy A114.2063Dual Sim, 3G, 4G, VoLTE,2 GB RAM4000 mAh Battery6.4 inches13 MP + 5 MP + 2 MP Triple Rear & 8 MP Fro...Memory Card Supported, upto 512 GB109,990Samsung32 GB inbuilt15W Fast Charging720 x 1560 px Display with Punch Hole1.8 GHz ProcessorOcta Core
22Samsung Galaxy A134.3075Dual Sim, 3G, 4G, VoLTE,4 GB RAM5000 mAh Battery6.6 inches50 MP Quad Rear & 8 MP Front CameraMemory Card Supported, upto 1 TB1211,999Samsung64 GB inbuilt25W Fast Charging1080 x 2408 px Display with Water Drop Notch2 GHz ProcessorOcta Core
33Samsung Galaxy F234.1073Dual Sim, 3G, 4G, VoLTE,4 GB RAM6000 mAh Battery6.4 inches48 MP Quad Rear & 13 MP Front CameraMemory Card Supported, upto 1 TB1211,999Samsung64 GB inbuiltNaN720 x 1600 pxOcta CoreHelio G88
44Samsung Galaxy A03s (4GB RAM + 64GB)4.1069Dual Sim, 3G, 4G, VoLTE,4 GB RAM5000 mAh Battery6.5 inches13 MP + 2 MP + 2 MP Triple Rear & 5 MP Fro...Memory Card Supported, upto 1 TB1111,999Samsung64 GB inbuilt15W Fast Charging720 x 1600 px Display with Water Drop NotchOcta CoreHelio P35
\n", "
" ], "text/plain": [ " Unnamed: 0 Name Rating Spec_score \\\n", "0 0 Samsung Galaxy F14 5G 4.65 68 \n", "1 1 Samsung Galaxy A11 4.20 63 \n", "2 2 Samsung Galaxy A13 4.30 75 \n", "3 3 Samsung Galaxy F23 4.10 73 \n", "4 4 Samsung Galaxy A03s (4GB RAM + 64GB) 4.10 69 \n", "\n", " No_of_sim Ram Battery Display \\\n", "0 Dual Sim, 3G, 4G, 5G, VoLTE, 4 GB RAM 6000 mAh Battery 6.6 inches \n", "1 Dual Sim, 3G, 4G, VoLTE, 2 GB RAM 4000 mAh Battery 6.4 inches \n", "2 Dual Sim, 3G, 4G, VoLTE, 4 GB RAM 5000 mAh Battery 6.6 inches \n", "3 Dual Sim, 3G, 4G, VoLTE, 4 GB RAM 6000 mAh Battery 6.4 inches \n", "4 Dual Sim, 3G, 4G, VoLTE, 4 GB RAM 5000 mAh Battery 6.5 inches \n", "\n", " Camera \\\n", "0 50 MP + 2 MP Dual Rear & 13 MP Front Camera \n", "1 13 MP + 5 MP + 2 MP Triple Rear & 8 MP Fro... \n", "2 50 MP Quad Rear & 8 MP Front Camera \n", "3 48 MP Quad Rear & 13 MP Front Camera \n", "4 13 MP + 2 MP + 2 MP Triple Rear & 5 MP Fro... \n", "\n", " External_Memory Android_version Price company \\\n", "0 Memory Card Supported, upto 1 TB 13 9,999 Samsung \n", "1 Memory Card Supported, upto 512 GB 10 9,990 Samsung \n", "2 Memory Card Supported, upto 1 TB 12 11,999 Samsung \n", "3 Memory Card Supported, upto 1 TB 12 11,999 Samsung \n", "4 Memory Card Supported, upto 1 TB 11 11,999 Samsung \n", "\n", " Inbuilt_memory fast_charging \\\n", "0 128 GB inbuilt 25W Fast Charging \n", "1 32 GB inbuilt 15W Fast Charging \n", "2 64 GB inbuilt 25W Fast Charging \n", "3 64 GB inbuilt NaN \n", "4 64 GB inbuilt 15W Fast Charging \n", "\n", " Screen_resolution Processor \\\n", "0 2408 x 1080 px Display with Water Drop Notch Octa Core Processor \n", "1 720 x 1560 px Display with Punch Hole 1.8 GHz Processor \n", "2 1080 x 2408 px Display with Water Drop Notch 2 GHz Processor \n", "3 720 x 1600 px Octa Core \n", "4 720 x 1600 px Display with Water Drop Notch Octa Core \n", "\n", " Processor_name \n", "0 Exynos 1330 \n", "1 Octa Core \n", "2 Octa Core \n", "3 Helio G88 \n", "4 Helio P35 " ] }, "execution_count": 304, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.read_csv(\"/Users/olegpresnyakov/Desktop/prediction.csv\")\n", "df.head(5)" ] }, { "cell_type": "markdown", "id": "b26badb9", "metadata": {}, "source": [ "At first let's look at all mising values that we have in our data and try to figure out how we can fill them in. While finding solution for that problem, we also want to clear data to numerical, that will allows us to apply numerical algorithms on data" ] }, { "cell_type": "code", "execution_count": 305, "id": "94ab1591", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 305, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.isnull().values.any()" ] }, { "cell_type": "code", "execution_count": 306, "id": "72751ea9", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Columns with missing values:\n", "Index(['Android_version', 'Inbuilt_memory', 'fast_charging',\n", " 'Screen_resolution', 'Processor'],\n", " dtype='object')\n" ] } ], "source": [ "columns_with_missing_values = df.columns[df.isnull().any()]\n", "print(\"Columns with missing values:\")\n", "print(columns_with_missing_values)" ] }, { "cell_type": "code", "execution_count": 307, "id": "b3673d1e", "metadata": {}, "outputs": [], "source": [ "df['Ram'] = df['Ram'].str.extract('(\\d+)').astype(float)\n", "df['Battery'] = df['Battery'].str.extract('(\\d+)').astype(float)\n", "df['Display'] = df['Display'].str.extract('(\\d+.\\d+)').astype(float)\n", "df['Price'] = df['Price'].str.replace(',', '').astype(float)\n", "df['fast_charging'] = df['fast_charging'].str.extract('(\\d+)').astype(float)" ] }, { "cell_type": "code", "execution_count": 308, "id": "7c6ba81a", "metadata": {}, "outputs": [], "source": [ "df['fast_charging'].fillna('0', inplace=True)\n", "df['Rating'].fillna(df['Rating'].median(), inplace=True)\n", "df['Spec_score'].fillna(df['Spec_score'].median(), inplace=True)\n", "df['Ram'].fillna(df['Ram'].median(), inplace=True)\n", "df['Battery'].fillna(df['Battery'].median(), inplace=True)\n", "df['Display'].fillna(df['Display'].median(), inplace=True)\n", "df['Price'].fillna(df['Price'].median(), inplace=True)" ] }, { "cell_type": "markdown", "id": "bef7c678", "metadata": {}, "source": [ "## Working with a memory representation and screen resolution" ] }, { "cell_type": "code", "execution_count": 309, "id": "6e938a2f", "metadata": {}, "outputs": [], "source": [ "df['EXT_Memory_GB'] = df['External_Memory'].str.extract(r'(\\d+)').astype(float)\n", "df['Unit'] = df['External_Memory'].str.extract(r'(\\w+)$')\n", "\n", "conversion_factors = {'TB': 1024, 'GB': 1}\n", "df['EXT_Memory_GB'] *= df['Unit'].map(conversion_factors)\n", "df.drop(columns=['External_Memory', 'Unit'], inplace=True)" ] }, { "cell_type": "code", "execution_count": 310, "id": "5fda2b5d", "metadata": {}, "outputs": [], "source": [ "df['Memory_GB'] = df['Inbuilt_memory'].str.extract(r'(\\d+)').astype(float)\n", "df['Unit'] = df['Inbuilt_memory'].str.extract(r'(\\w+)$')\n", "df.loc[df['Unit'] == 'TB', 'Memory_GB'] *= 1024\n", "df.drop(columns=['Unit'], inplace=True)" ] }, { "cell_type": "code", "execution_count": 311, "id": "64e904be", "metadata": {}, "outputs": [], "source": [ "df[['px', 'Feature']] = df['Screen_resolution'].str.extract(r'(\\d+ x \\d+) px Display with (Water Drop Notch|Punch Hole)')\n", "df['Display with Water Drop Notch'] = df['Feature'].apply(lambda x: 1 if x == 'Water Drop Notch' else 0)\n", "df['Display with Punch Hole'] = df['Feature'].apply(lambda x: 1 if x == 'Punch Hole' else 0)\n", "df.drop(columns=['Screen_resolution', 'Feature'], inplace=True)\n", "df['No_of_sim_count'] = df['No_of_sim'].str.count(',') + 1\n", "df.drop(columns=['No_of_sim'], inplace=True)" ] }, { "cell_type": "code", "execution_count": 312, "id": "e3c67567", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Unnamed: 0NameRatingSpec_scoreRamBatteryDisplayCameraAndroid_versionPrice...Inbuilt_memoryfast_chargingProcessorProcessor_nameEXT_Memory_GBMemory_GBpxDisplay with Water Drop NotchDisplay with Punch HoleNo_of_sim_count
00Samsung Galaxy F14 5G4.65684.06000.06.650 MP + 2 MP Dual Rear & 13 MP Front Camera139999.0...128 GB inbuilt25.0Octa Core ProcessorExynos 13301024.0128.02408 x 1080106
11Samsung Galaxy A114.20632.04000.06.413 MP + 5 MP + 2 MP Triple Rear & 8 MP Fro...109990.0...32 GB inbuilt15.01.8 GHz ProcessorOcta Core512.032.0720 x 1560015
22Samsung Galaxy A134.30754.05000.06.650 MP Quad Rear & 8 MP Front Camera1211999.0...64 GB inbuilt25.02 GHz ProcessorOcta Core1024.064.01080 x 2408105
33Samsung Galaxy F234.10734.06000.06.448 MP Quad Rear & 13 MP Front Camera1211999.0...64 GB inbuilt0Octa CoreHelio G881024.064.0NaN005
44Samsung Galaxy A03s (4GB RAM + 64GB)4.10694.05000.06.513 MP + 2 MP + 2 MP Triple Rear & 5 MP Fro...1111999.0...64 GB inbuilt15.0Octa CoreHelio P351024.064.0720 x 1600105
\n", "

5 rows × 21 columns

\n", "
" ], "text/plain": [ " Unnamed: 0 Name Rating Spec_score Ram \\\n", "0 0 Samsung Galaxy F14 5G 4.65 68 4.0 \n", "1 1 Samsung Galaxy A11 4.20 63 2.0 \n", "2 2 Samsung Galaxy A13 4.30 75 4.0 \n", "3 3 Samsung Galaxy F23 4.10 73 4.0 \n", "4 4 Samsung Galaxy A03s (4GB RAM + 64GB) 4.10 69 4.0 \n", "\n", " Battery Display Camera \\\n", "0 6000.0 6.6 50 MP + 2 MP Dual Rear & 13 MP Front Camera \n", "1 4000.0 6.4 13 MP + 5 MP + 2 MP Triple Rear & 8 MP Fro... \n", "2 5000.0 6.6 50 MP Quad Rear & 8 MP Front Camera \n", "3 6000.0 6.4 48 MP Quad Rear & 13 MP Front Camera \n", "4 5000.0 6.5 13 MP + 2 MP + 2 MP Triple Rear & 5 MP Fro... \n", "\n", " Android_version Price ... Inbuilt_memory fast_charging \\\n", "0 13 9999.0 ... 128 GB inbuilt 25.0 \n", "1 10 9990.0 ... 32 GB inbuilt 15.0 \n", "2 12 11999.0 ... 64 GB inbuilt 25.0 \n", "3 12 11999.0 ... 64 GB inbuilt 0 \n", "4 11 11999.0 ... 64 GB inbuilt 15.0 \n", "\n", " Processor Processor_name EXT_Memory_GB Memory_GB px \\\n", "0 Octa Core Processor Exynos 1330 1024.0 128.0 2408 x 1080 \n", "1 1.8 GHz Processor Octa Core 512.0 32.0 720 x 1560 \n", "2 2 GHz Processor Octa Core 1024.0 64.0 1080 x 2408 \n", "3 Octa Core Helio G88 1024.0 64.0 NaN \n", "4 Octa Core Helio P35 1024.0 64.0 720 x 1600 \n", "\n", " Display with Water Drop Notch Display with Punch Hole No_of_sim_count \n", "0 1 0 6 \n", "1 0 1 5 \n", "2 1 0 5 \n", "3 0 0 5 \n", "4 1 0 5 \n", "\n", "[5 rows x 21 columns]" ] }, "execution_count": 312, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "markdown", "id": "46cfa0e8", "metadata": {}, "source": [ "## Working with camera" ] }, { "cell_type": "code", "execution_count": 313, "id": "509db050", "metadata": {}, "outputs": [], "source": [ "def sum_megapixels(camera_str):\n", " rear_megapixels = sum(map(int, re.findall(r'(\\d+)\\s*MP', camera_str.split('&')[0])))\n", " if len(camera_str.split('&')) > 1:\n", " front_megapixels = sum(map(int, re.findall(r'(\\d+)\\s*MP', camera_str.split('&')[1])))\n", " else:\n", " front_megapixels = 0\n", " return rear_megapixels, front_megapixels\n", "\n", "df['Rear'], df['Front'] = zip(*df['Camera'].apply(sum_megapixels))\n", "df.drop(columns=['Camera'], inplace=True)" ] }, { "cell_type": "code", "execution_count": 314, "id": "eca0d7fa", "metadata": {}, "outputs": [], "source": [ "def resolution_to_product(resolution_str):\n", " if isinstance(resolution_str, str):\n", " parts = resolution_str.split('x')\n", " if len(parts) == 2:\n", " try:\n", " width = int(parts[0])\n", " height = int(parts[1])\n", " return width * height\n", " except ValueError:\n", " return None\n", " else:\n", " return None\n", " elif isinstance(resolution_str, int):\n", " return resolution_str\n", " else:\n", " return None\n", "\n", "df['px'] = df['px'].apply(resolution_to_product)" ] }, { "cell_type": "code", "execution_count": 315, "id": "a85ad5c1", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Unnamed: 0NameRatingSpec_scoreRamBatteryDisplayAndroid_versionPricecompany...ProcessorProcessor_nameEXT_Memory_GBMemory_GBpxDisplay with Water Drop NotchDisplay with Punch HoleNo_of_sim_countRearFront
00Samsung Galaxy F14 5G4.65684.06000.06.6139999.0Samsung...Octa Core ProcessorExynos 13301024.0128.02600640.01065213
11Samsung Galaxy A114.20632.04000.06.4109990.0Samsung...1.8 GHz ProcessorOcta Core512.032.01123200.0015208
22Samsung Galaxy A134.30754.05000.06.61211999.0Samsung...2 GHz ProcessorOcta Core1024.064.02600640.0105508
33Samsung Galaxy F234.10734.06000.06.41211999.0Samsung...Octa CoreHelio G881024.064.0NaN0054813
44Samsung Galaxy A03s (4GB RAM + 64GB)4.10694.05000.06.51111999.0Samsung...Octa CoreHelio P351024.064.01152000.0105175
\n", "

5 rows × 22 columns

\n", "
" ], "text/plain": [ " Unnamed: 0 Name Rating Spec_score Ram \\\n", "0 0 Samsung Galaxy F14 5G 4.65 68 4.0 \n", "1 1 Samsung Galaxy A11 4.20 63 2.0 \n", "2 2 Samsung Galaxy A13 4.30 75 4.0 \n", "3 3 Samsung Galaxy F23 4.10 73 4.0 \n", "4 4 Samsung Galaxy A03s (4GB RAM + 64GB) 4.10 69 4.0 \n", "\n", " Battery Display Android_version Price company ... \\\n", "0 6000.0 6.6 13 9999.0 Samsung ... \n", "1 4000.0 6.4 10 9990.0 Samsung ... \n", "2 5000.0 6.6 12 11999.0 Samsung ... \n", "3 6000.0 6.4 12 11999.0 Samsung ... \n", "4 5000.0 6.5 11 11999.0 Samsung ... \n", "\n", " Processor Processor_name EXT_Memory_GB Memory_GB px \\\n", "0 Octa Core Processor Exynos 1330 1024.0 128.0 2600640.0 \n", "1 1.8 GHz Processor Octa Core 512.0 32.0 1123200.0 \n", "2 2 GHz Processor Octa Core 1024.0 64.0 2600640.0 \n", "3 Octa Core Helio G88 1024.0 64.0 NaN \n", "4 Octa Core Helio P35 1024.0 64.0 1152000.0 \n", "\n", " Display with Water Drop Notch Display with Punch Hole No_of_sim_count \\\n", "0 1 0 6 \n", "1 0 1 5 \n", "2 1 0 5 \n", "3 0 0 5 \n", "4 1 0 5 \n", "\n", " Rear Front \n", "0 52 13 \n", "1 20 8 \n", "2 50 8 \n", "3 48 13 \n", "4 17 5 \n", "\n", "[5 rows x 22 columns]" ] }, "execution_count": 315, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "code", "execution_count": 316, "id": "af03f5ea", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Unnamed: 0NameRatingSpec_scoreRamBatteryDisplayAndroid_versionPriceInbuilt_memory...Processor_name_Tiger T616Processor_name_Unisoc SC9836AProcessor_name_Unisoc SC9863AProcessor_name_Unisoc T606Processor_name_Unisoc SC9832EProcessor_name_Unisoc SC9863AProcessor_name_Unisoc T603Processor_name_Unisoc T606Processor_name_Unisoc T610Processor_name_Unisoc T612
00Samsung Galaxy F14 5G4.65684.06000.06.6139999.0128 GB inbuilt...0000000000
11Samsung Galaxy A114.20632.04000.06.4109990.032 GB inbuilt...0000000000
22Samsung Galaxy A134.30754.05000.06.61211999.064 GB inbuilt...0000000000
33Samsung Galaxy F234.10734.06000.06.41211999.064 GB inbuilt...0000000000
44Samsung Galaxy A03s (4GB RAM + 64GB)4.10694.05000.06.51111999.064 GB inbuilt...0000000000
\n", "

5 rows × 335 columns

\n", "
" ], "text/plain": [ " Unnamed: 0 Name Rating Spec_score Ram \\\n", "0 0 Samsung Galaxy F14 5G 4.65 68 4.0 \n", "1 1 Samsung Galaxy A11 4.20 63 2.0 \n", "2 2 Samsung Galaxy A13 4.30 75 4.0 \n", "3 3 Samsung Galaxy F23 4.10 73 4.0 \n", "4 4 Samsung Galaxy A03s (4GB RAM + 64GB) 4.10 69 4.0 \n", "\n", " Battery Display Android_version Price Inbuilt_memory ... \\\n", "0 6000.0 6.6 13 9999.0 128 GB inbuilt ... \n", "1 4000.0 6.4 10 9990.0 32 GB inbuilt ... \n", "2 5000.0 6.6 12 11999.0 64 GB inbuilt ... \n", "3 6000.0 6.4 12 11999.0 64 GB inbuilt ... \n", "4 5000.0 6.5 11 11999.0 64 GB inbuilt ... \n", "\n", " Processor_name_Tiger T616 Processor_name_Unisoc SC9836A \\\n", "0 0 0 \n", "1 0 0 \n", "2 0 0 \n", "3 0 0 \n", "4 0 0 \n", "\n", " Processor_name_Unisoc SC9863A Processor_name_Unisoc T606 \\\n", "0 0 0 \n", "1 0 0 \n", "2 0 0 \n", "3 0 0 \n", "4 0 0 \n", "\n", " Processor_name_Unisoc SC9832E Processor_name_Unisoc SC9863A \\\n", "0 0 0 \n", "1 0 0 \n", "2 0 0 \n", "3 0 0 \n", "4 0 0 \n", "\n", " Processor_name_Unisoc T603 Processor_name_Unisoc T606 \\\n", "0 0 0 \n", "1 0 0 \n", "2 0 0 \n", "3 0 0 \n", "4 0 0 \n", "\n", " Processor_name_Unisoc T610 Processor_name_Unisoc T612 \n", "0 0 0 \n", "1 0 0 \n", "2 0 0 \n", "3 0 0 \n", "4 0 0 \n", "\n", "[5 rows x 335 columns]" ] }, "execution_count": 316, "metadata": {}, "output_type": "execute_result" } ], "source": [ "columns_for_dummies = ['company', 'Processor', 'Processor_name']\n", "dummy_variables = pd.get_dummies(df[columns_for_dummies])\n", "df_with_dummies = pd.concat([df, dummy_variables], axis=1)\n", "df_with_dummies.drop(columns_for_dummies, axis=1, inplace=True)\n", "df = df_with_dummies\n", "\n", "df.head()" ] }, { "cell_type": "markdown", "id": "f0cd0895", "metadata": {}, "source": [ "Now let's finally apply algorithms to predict price for a phone:" ] }, { "cell_type": "markdown", "id": "93387aaf", "metadata": {}, "source": [ "#### Simple linear regression" ] }, { "cell_type": "code", "execution_count": 317, "id": "43300ec8", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train RMSE: 10665.29\n", "Test RMSE: 203643.32\n", "Train R^2: 0.88\n", "Test R^2: -45.00\n" ] }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "numeric_cols = df.select_dtypes(include=['number']).columns\n", "df_numeric = df[numeric_cols]\n", "\n", "imputer = SimpleImputer(strategy='mean')\n", "df_imputed = pd.DataFrame(imputer.fit_transform(df_numeric), columns=df_numeric.columns)\n", "\n", "X = df_imputed.drop(columns=['Price'])\n", "y = df_imputed['Price']\n", "\n", "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n", "\n", "model = LinearRegression()\n", "model.fit(X_train, y_train)\n", "\n", "train_rmse = mean_squared_error(y_train, model.predict(X_train), squared=False)\n", "test_rmse = mean_squared_error(y_test, model.predict(X_test), squared=False)\n", "r2_train = r2_score(y_train, model.predict(X_train))\n", "r2_test = r2_score(y_test, model.predict(X_test))\n", "\n", "print(f'Train RMSE: {train_rmse:.2f}')\n", "print(f'Test RMSE: {test_rmse:.2f}')\n", "print(f'Train R^2: {r2_train:.2f}')\n", "print(f'Test R^2: {r2_test:.2f}')\n", "\n", "y_pred = model.predict(X_test)\n", "plt.scatter(y_test, y_pred)\n", "plt.xlabel('Actual Price')\n", "plt.ylabel('Predicted Price')\n", "plt.title('Actual vs Predicted Price')\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "dcd2ce4e", "metadata": {}, "source": [ "#### Simple linear regression with KFold" ] }, { "cell_type": "code", "execution_count": 318, "id": "922917d7", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Mean Train R^2: 0.01\n", "Mean Test R^2: 0.01\n" ] }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "numeric_cols = df.select_dtypes(include=['number']).columns\n", "df_numeric = df[numeric_cols]\n", "\n", "imputer = SimpleImputer(strategy='mean')\n", "df_imputed = pd.DataFrame(imputer.fit_transform(df_numeric), columns=df_numeric.columns)\n", "\n", "y = df_imputed['Price']\n", "\n", "kf = KFold(n_splits=5, shuffle=True, random_state=42)\n", "train_r2_scores = []\n", "test_r2_scores = []\n", "\n", "for train_index, test_index in kf.split(X_pca):\n", " X_train, X_test = X_pca[train_index], X_pca[test_index]\n", " y_train, y_test = y.iloc[train_index], y.iloc[test_index]\n", " \n", " model = LinearRegression()\n", " model.fit(X_train, y_train)\n", " \n", " train_r2_scores.append(r2_score(y_train, model.predict(X_train)))\n", " test_r2_scores.append(r2_score(y_test, model.predict(X_test)))\n", "\n", "mean_train_r2 = np.mean(train_r2_scores)\n", "mean_test_r2 = np.mean(test_r2_scores)\n", "\n", "print(f'Mean Train R^2: {mean_train_r2:.2f}')\n", "print(f'Mean Test R^2: {mean_test_r2:.2f}')\n", "plt.scatter(y_test, model.predict(X_test), label='Actual vs Predicted Price')\n", "plt.plot(y_test, y_test, color='red', label='Perfect Predictions')\n", "plt.xlabel('Actual Price')\n", "plt.ylabel('Predicted Price')\n", "plt.title('Actual vs Predicted Price')\n", "plt.legend()\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "f9f74831", "metadata": {}, "source": [ "Notice that the model that was constructed using linear regression is very bad in terms of mean R^2. We can only explain 1% of the data using this method. This is why, lets try to implement more advanced structure on the dataset." ] }, { "cell_type": "markdown", "id": "a4462cd0", "metadata": {}, "source": [ "#### Random forest" ] }, { "cell_type": "code", "execution_count": 319, "id": "1419ac2a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train R^2: 0.86\n", "Test R^2: 0.12\n" ] } ], "source": [ "imputer = SimpleImputer(strategy='mean')\n", "X_train_imputed = imputer.fit_transform(X_train)\n", "X_test_imputed = imputer.transform(X_test)\n", "\n", "model = RandomForestRegressor(n_estimators=100, random_state=42)\n", "model.fit(X_train_imputed, y_train)\n", "\n", "train_r2 = r2_score(y_train, model.predict(X_train_imputed))\n", "test_r2 = r2_score(y_test, model.predict(X_test_imputed))\n", "\n", "print(f'Train R^2: {train_r2:.2f}')\n", "print(f'Test R^2: {test_r2:.2f}')\n" ] }, { "cell_type": "markdown", "id": "d6bd3ee5", "metadata": {}, "source": [ "#### Random forest using KFold " ] }, { "cell_type": "code", "execution_count": 320, "id": "65828285", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Mean Train R^2: 0.98\n", "Mean Test R^2: 0.83\n" ] }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "imputer = SimpleImputer(strategy='mean')\n", "X_imputed = imputer.fit_transform(X)\n", "\n", "model = RandomForestRegressor(n_estimators=100, random_state=42)\n", "\n", "kf = KFold(n_splits=5, shuffle=True, random_state=42)\n", "train_r2_scores = []\n", "test_r2_scores = []\n", "\n", "for train_index, test_index in kf.split(X_imputed):\n", " X_train, X_test = X_imputed[train_index], X_imputed[test_index]\n", " y_train, y_test = y[train_index], y[test_index]\n", " \n", " model.fit(X_train, y_train)\n", " \n", " train_r2_scores.append(r2_score(y_train, model.predict(X_train)))\n", " test_r2_scores.append(r2_score(y_test, model.predict(X_test)))\n", "\n", "mean_train_r2 = np.mean(train_r2_scores)\n", "mean_test_r2 = np.mean(test_r2_scores)\n", "\n", "print(f'Mean Train R^2: {mean_train_r2:.2f}')\n", "print(f'Mean Test R^2: {mean_test_r2:.2f}')\n", "\n", "plt.scatter(y_test, model.predict(X_test), label='Actual vs Predicted Price')\n", "plt.plot(y_test, y_test, color='red', label='Perfect Predictions')\n", "plt.xlabel('Actual Price')\n", "plt.ylabel('Predicted Price')\n", "plt.title('Actual vs Predicted Price')\n", "plt.legend()\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "b3422a51", "metadata": {}, "source": [ "## Summary\n", "We tackled the challenge of predicting smartphone prices using machine learning. Leveraging a dataset with key features, our algorithm, a Random Forest regressor with K-Fold cross-validation, showed promising results in accurately predicting prices. This approach sheds light on pricing dynamics and underscores the potential of data-driven decision-making in the smartphone industry." ] }, { "cell_type": "markdown", "id": "96e680b4", "metadata": {}, "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.19" } }, "nbformat": 4, "nbformat_minor": 5 }