{ "cells": [ { "cell_type": "markdown", "id": "17a64a70", "metadata": {}, "source": [ "# Analysis of Crime Reports in LA\n", "\n", "Author: Kent Hocaoglu\n", "\n", "Course Project, UC Irvine, Math 10, Fall 24\n", "\n", "I would like to post my notebook on the course's website. [Yes]" ] }, { "cell_type": "markdown", "id": "c6acfb21", "metadata": {}, "source": [ "The data used in this project was retrieved from Data from https://catalog.data.gov/dataset/crime-data-from-2020-to-present.\n" ] }, { "cell_type": "markdown", "id": "25a1bbe9", "metadata": {}, "source": [ "## Introduction:" ] }, { "cell_type": "markdown", "id": "a7b5da05", "metadata": {}, "source": [ "This project aims to apply data science techniques to analyze crime data, discover patterns, identify trends, and draw conclusions to improve public safety in the Los Angeles area.\n", "\n", "Using a dataset containing detailed information about crime incidents such as location, time, type of crime, and victim demographics we will leverage machine learning and statistical analysis to help create predictive models to answer key questions such as: \n", "- What locations are most affected by crime?\n", "- How does time and date influence crime rate?\n", "- Are there trends in victim demographics?\n" ] }, { "cell_type": "markdown", "id": "802ed347", "metadata": {}, "source": [ "### Let's begin by cleaning the data. \n", "\n", "First, we can get rid of the extraneous information that is not necessary for our analysis. Then, we can clean the data we want by removing the rows containing missing or filler values." ] }, { "cell_type": "code", "execution_count": 81, "id": "e8f2f085", "metadata": {}, "outputs": [], "source": [ "#import dependecies\n", "\n", "import pandas as pd\n", "import numpy as np\n", "from sklearn.cluster import KMeans\n", "from sklearn.datasets import make_blobs\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.linear_model import LinearRegression\n", "from sklearn.neighbors import KNeighborsRegressor\n", "from sklearn.cluster import KMeans\n", "from sklearn.metrics import mean_squared_error, r2_score, accuracy_score, mean_absolute_error, confusion_matrix\n", "import matplotlib.pyplot as plt\n", "from mpl_toolkits.mplot3d import Axes3D\n", "from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor\n", "from sklearn.preprocessing import LabelEncoder\n", "import seaborn as sns\n", "import warnings\n", "warnings.filterwarnings('ignore')" ] }, { "cell_type": "code", "execution_count": 82, "id": "fcd1be64", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | date | \n", "time | \n", "age | \n", "sex | \n", "lat | \n", "lon | \n", "hour | \n", "day_of_week | \n", "month | \n", "
---|---|---|---|---|---|---|---|---|---|
1 | \n", "2020-02-08 | \n", "1800 | \n", "47 | \n", "1 | \n", "34.0444 | \n", "-118.2628 | \n", "18 | \n", "5 | \n", "2 | \n", "
3 | \n", "2020-03-10 | \n", "2037 | \n", "19 | \n", "1 | \n", "34.1576 | \n", "-118.4387 | \n", "20 | \n", "1 | \n", "3 | \n", "
11 | \n", "2020-03-01 | \n", "1430 | \n", "27 | \n", "1 | \n", "34.0881 | \n", "-118.1877 | \n", "14 | \n", "6 | \n", "3 | \n", "
19 | \n", "2020-02-07 | \n", "1615 | \n", "23 | \n", "0 | \n", "34.1016 | \n", "-118.3370 | \n", "16 | \n", "4 | \n", "2 | \n", "
23 | \n", "2020-07-13 | \n", "2000 | \n", "41 | \n", "1 | \n", "34.1774 | \n", "-118.5387 | \n", "20 | \n", "0 | \n", "7 | \n", "