{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "provenance": [] }, "kernelspec": { "name": "python3", "display_name": "Python 3" }, "language_info": { "name": "python" } }, "cells": [ { "cell_type": "markdown", "source": [ "# DS 122 Discussion 8" ], "metadata": { "id": "eUStcwQUhsRa" } }, { "cell_type": "markdown", "source": [ "In this discussion, we will review the different concepts we have discussed in this course. This discussion contains 3 parts:\n", "- EDA using pandas on a dataset\n", "- Sampling using numpy\n", "- Sampling and Hypothesis testing on a dataset" ], "metadata": { "id": "27i57wCLcRcw" } }, { "cell_type": "code", "source": [ "#importing libraries\n", "import pandas as pd\n", "import numpy as np\n", "import seaborn as sns\n", "import scipy.stats as stats\n", "import matplotlib.pyplot as plt" ], "metadata": { "id": "ezfpLVrZ-D7J" }, "execution_count": 32, "outputs": [] }, { "cell_type": "markdown", "source": [ "## Part A: Review of pandas\n", "\n", "In this section, we will analyze data about players from the UCL. UCL is a football (soccer) league based in Europe. It has various Clubs (or teams) like Liverpool, Manchester United, Real Madrid and Bayern.\n", "\n", "You all must have heard about names like Ronaldo or Messi. This dataset contains the names of more players.\n", "\n", "The dataset is taken from Kaggle and can be found at the following link: https://www.kaggle.com/datasets/azminetoushikwasi/ucl-202122-uefa-champions-league?select=attacking.csv.\n", "\n", "We will begin working with files called `attacking.csv` and `goals.csv` to find out players that are good at attacking." ], "metadata": { "id": "vouobfXXkIFW" } }, { "cell_type": "markdown", "source": [ "### Loading the dataset\n", "\n", "The first step of a complete data analysis project is to acquire the data. Let's do this by loading the CSV files given to us." ], "metadata": { "id": "UP0PNRzctXhG" } }, { "cell_type": "code", "source": [ "attack = pd.read_csv('/content/attacking.csv')\n", "goals = pd.read_csv('/content/goals.csv')" ], "metadata": { "id": "yhRWQcqYpe9a" }, "execution_count": 2, "outputs": [] }, { "cell_type": "markdown", "source": [ "Let us now view our datasets to get an overview of the data." ], "metadata": { "id": "m1g4Ib3E0b5U" } }, { "cell_type": "code", "source": [ "#view attack\n", "attack.head()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 206 }, "id": "IDL7Ko2DphnH", "outputId": "c2a294ab-bbb2-4501-897f-5a5b01c7c47a" }, "execution_count": 3, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " serial player_name club position assists corner_taken \\\n", "0 1 Bruno Fernandes Man. United Midfielder 7 10 \n", "1 2 Vinícius Júnior Real Madrid Forward 6 3 \n", "2 2 Sané Bayern Midfielder 6 3 \n", "3 4 Antony Ajax Forward 5 3 \n", "4 5 Alexander-Arnold Liverpool Defender 4 36 \n", "\n", " offsides dribbles match_played \n", "0 2 7 7 \n", "1 4 83 13 \n", "2 3 32 10 \n", "3 4 28 7 \n", "4 0 9 9 " ], "text/html": [ "\n", " <div id=\"df-b5d3a1c4-f92e-4b50-8611-362dfab59743\" class=\"colab-df-container\">\n", " <div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>serial</th>\n", " <th>player_name</th>\n", " <th>club</th>\n", " <th>position</th>\n", " <th>assists</th>\n", " <th>corner_taken</th>\n", " <th>offsides</th>\n", " <th>dribbles</th>\n", " <th>match_played</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>1</td>\n", " <td>Bruno Fernandes</td>\n", " <td>Man. United</td>\n", " <td>Midfielder</td>\n", " <td>7</td>\n", " <td>10</td>\n", " <td>2</td>\n", " <td>7</td>\n", " <td>7</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>2</td>\n", " <td>Vinícius Júnior</td>\n", " <td>Real Madrid</td>\n", " <td>Forward</td>\n", " <td>6</td>\n", " <td>3</td>\n", " <td>4</td>\n", " <td>83</td>\n", " <td>13</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>2</td>\n", " <td>Sané</td>\n", " <td>Bayern</td>\n", " <td>Midfielder</td>\n", " <td>6</td>\n", " <td>3</td>\n", " <td>3</td>\n", " <td>32</td>\n", " <td>10</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>4</td>\n", " <td>Antony</td>\n", " <td>Ajax</td>\n", " <td>Forward</td>\n", " <td>5</td>\n", " <td>3</td>\n", " <td>4</td>\n", " <td>28</td>\n", " <td>7</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>5</td>\n", " <td>Alexander-Arnold</td>\n", " <td>Liverpool</td>\n", " <td>Defender</td>\n", " <td>4</td>\n", " <td>36</td>\n", " <td>0</td>\n", " <td>9</td>\n", " <td>9</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>\n", " <div class=\"colab-df-buttons\">\n", "\n", " <div class=\"colab-df-container\">\n", " <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-b5d3a1c4-f92e-4b50-8611-362dfab59743')\"\n", " title=\"Convert this dataframe to an interactive table.\"\n", " style=\"display:none;\">\n", "\n", " <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n", " <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n", " </svg>\n", " </button>\n", "\n", " <style>\n", " .colab-df-container {\n", " display:flex;\n", " gap: 12px;\n", " }\n", "\n", " .colab-df-convert {\n", " background-color: #E8F0FE;\n", " border: none;\n", " border-radius: 50%;\n", " cursor: pointer;\n", " display: none;\n", " fill: #1967D2;\n", " height: 32px;\n", " padding: 0 0 0 0;\n", " width: 32px;\n", " }\n", "\n", " .colab-df-convert:hover {\n", " background-color: #E2EBFA;\n", " box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n", " fill: #174EA6;\n", " }\n", "\n", " .colab-df-buttons div {\n", " margin-bottom: 4px;\n", " }\n", "\n", " [theme=dark] .colab-df-convert {\n", " background-color: #3B4455;\n", " fill: #D2E3FC;\n", " }\n", "\n", " [theme=dark] .colab-df-convert:hover {\n", " background-color: #434B5C;\n", " box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n", " filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n", " fill: #FFFFFF;\n", " }\n", " </style>\n", "\n", " <script>\n", " const buttonEl =\n", " document.querySelector('#df-b5d3a1c4-f92e-4b50-8611-362dfab59743 button.colab-df-convert');\n", " buttonEl.style.display =\n", " google.colab.kernel.accessAllowed ? 'block' : 'none';\n", "\n", " async function convertToInteractive(key) {\n", " const element = document.querySelector('#df-b5d3a1c4-f92e-4b50-8611-362dfab59743');\n", " const dataTable =\n", " await google.colab.kernel.invokeFunction('convertToInteractive',\n", " [key], {});\n", " if (!dataTable) return;\n", "\n", " const docLinkHtml = 'Like what you see? Visit the ' +\n", " '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n", " + ' to learn more about interactive tables.';\n", " element.innerHTML = '';\n", " dataTable['output_type'] = 'display_data';\n", " await google.colab.output.renderOutput(dataTable, element);\n", " const docLink = document.createElement('div');\n", " docLink.innerHTML = docLinkHtml;\n", " element.appendChild(docLink);\n", " }\n", " </script>\n", " </div>\n", "\n", "\n", "<div id=\"df-d28b4157-1cbd-43cc-a3bb-353892ff2939\">\n", " <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-d28b4157-1cbd-43cc-a3bb-353892ff2939')\"\n", " title=\"Suggest charts.\"\n", " style=\"display:none;\">\n", "\n", "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n", " width=\"24px\">\n", " <g>\n", " <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n", " </g>\n", "</svg>\n", " </button>\n", "\n", "<style>\n", " .colab-df-quickchart {\n", " --bg-color: #E8F0FE;\n", " --fill-color: #1967D2;\n", " --hover-bg-color: #E2EBFA;\n", " --hover-fill-color: #174EA6;\n", " --disabled-fill-color: #AAA;\n", " --disabled-bg-color: #DDD;\n", " }\n", "\n", " [theme=dark] .colab-df-quickchart {\n", " --bg-color: #3B4455;\n", " --fill-color: #D2E3FC;\n", " --hover-bg-color: #434B5C;\n", " --hover-fill-color: #FFFFFF;\n", " --disabled-bg-color: #3B4455;\n", " --disabled-fill-color: #666;\n", " }\n", "\n", " .colab-df-quickchart {\n", " background-color: var(--bg-color);\n", " border: none;\n", " border-radius: 50%;\n", " cursor: pointer;\n", " display: none;\n", " fill: var(--fill-color);\n", " height: 32px;\n", " padding: 0;\n", " width: 32px;\n", " }\n", "\n", " .colab-df-quickchart:hover {\n", " background-color: var(--hover-bg-color);\n", " box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n", " fill: var(--button-hover-fill-color);\n", " }\n", "\n", " .colab-df-quickchart-complete:disabled,\n", " .colab-df-quickchart-complete:disabled:hover {\n", " background-color: var(--disabled-bg-color);\n", " fill: var(--disabled-fill-color);\n", " box-shadow: none;\n", " }\n", "\n", " .colab-df-spinner {\n", " border: 2px solid var(--fill-color);\n", " border-color: transparent;\n", " border-bottom-color: var(--fill-color);\n", " animation:\n", " spin 1s steps(1) infinite;\n", " }\n", "\n", " @keyframes spin {\n", " 0% {\n", " border-color: transparent;\n", " border-bottom-color: var(--fill-color);\n", " border-left-color: var(--fill-color);\n", " }\n", " 20% {\n", " border-color: transparent;\n", " border-left-color: var(--fill-color);\n", " border-top-color: var(--fill-color);\n", " }\n", " 30% {\n", " border-color: transparent;\n", " border-left-color: var(--fill-color);\n", " border-top-color: var(--fill-color);\n", " border-right-color: var(--fill-color);\n", " }\n", " 40% {\n", " border-color: transparent;\n", " border-right-color: var(--fill-color);\n", " border-top-color: var(--fill-color);\n", " }\n", " 60% {\n", " border-color: transparent;\n", " border-right-color: var(--fill-color);\n", " }\n", " 80% {\n", " border-color: transparent;\n", " border-right-color: var(--fill-color);\n", " border-bottom-color: var(--fill-color);\n", " }\n", " 90% {\n", " border-color: transparent;\n", " border-bottom-color: var(--fill-color);\n", " }\n", " }\n", "</style>\n", "\n", " <script>\n", " async function quickchart(key) {\n", " const quickchartButtonEl =\n", " document.querySelector('#' + key + ' button');\n", " quickchartButtonEl.disabled = true; // To prevent multiple clicks.\n", " quickchartButtonEl.classList.add('colab-df-spinner');\n", " try {\n", " const charts = await google.colab.kernel.invokeFunction(\n", " 'suggestCharts', [key], {});\n", " } catch (error) {\n", " console.error('Error during call to suggestCharts:', error);\n", " }\n", " quickchartButtonEl.classList.remove('colab-df-spinner');\n", " quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n", " }\n", " (() => {\n", " let quickchartButtonEl =\n", " document.querySelector('#df-d28b4157-1cbd-43cc-a3bb-353892ff2939 button');\n", " quickchartButtonEl.style.display =\n", " google.colab.kernel.accessAllowed ? 'block' : 'none';\n", " })();\n", " </script>\n", "</div>\n", " </div>\n", " </div>\n" ] }, "metadata": {}, "execution_count": 3 } ] }, { "cell_type": "code", "source": [ "#print attack's size\n", "attack.shape" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "2-F095RPaCA9", "outputId": "5a83d359-0d15-4f97-9649-1499336cd65c" }, "execution_count": 4, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "(176, 9)" ] }, "metadata": {}, "execution_count": 4 } ] }, { "cell_type": "code", "source": [ "#view goals\n", "goals.head()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 206 }, "id": "coMjcjTP0hxm", "outputId": "090875bd-29ca-46f4-ca76-1b7c16d2ea5b" }, "execution_count": 5, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " serial player_name club position goals right_foot left_foot \\\n", "0 1 Benzema Real Madrid Forward 15 11 1 \n", "1 2 Lewandowski Bayern Forward 13 8 3 \n", "2 3 Haller Ajax Forward 11 3 4 \n", "3 4 Salah Liverpool Forward 8 0 8 \n", "4 5 Nkunku Leipzig Midfielder 7 3 1 \n", "\n", " headers others inside_area outside_areas penalties match_played \n", "0 3 0 13 2 3 12 \n", "1 1 1 13 0 3 10 \n", "2 3 1 11 0 1 8 \n", "3 0 0 7 1 1 13 \n", "4 3 0 7 0 0 6 " ], "text/html": [ "\n", " <div id=\"df-f05daf5c-43c1-4ff4-9938-7f890d679d76\" class=\"colab-df-container\">\n", " <div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>serial</th>\n", " <th>player_name</th>\n", " <th>club</th>\n", " <th>position</th>\n", " <th>goals</th>\n", " <th>right_foot</th>\n", " <th>left_foot</th>\n", " <th>headers</th>\n", " <th>others</th>\n", " <th>inside_area</th>\n", " <th>outside_areas</th>\n", " <th>penalties</th>\n", " <th>match_played</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>1</td>\n", " <td>Benzema</td>\n", " <td>Real Madrid</td>\n", " <td>Forward</td>\n", " <td>15</td>\n", " <td>11</td>\n", " <td>1</td>\n", " <td>3</td>\n", " <td>0</td>\n", " <td>13</td>\n", " <td>2</td>\n", " <td>3</td>\n", " <td>12</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>2</td>\n", " <td>Lewandowski</td>\n", " <td>Bayern</td>\n", " <td>Forward</td>\n", " <td>13</td>\n", " <td>8</td>\n", " <td>3</td>\n", " <td>1</td>\n", " <td>1</td>\n", " <td>13</td>\n", " <td>0</td>\n", " <td>3</td>\n", " <td>10</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>3</td>\n", " <td>Haller</td>\n", " <td>Ajax</td>\n", " <td>Forward</td>\n", " <td>11</td>\n", " <td>3</td>\n", " <td>4</td>\n", " <td>3</td>\n", " <td>1</td>\n", " <td>11</td>\n", " <td>0</td>\n", " <td>1</td>\n", " <td>8</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>4</td>\n", " <td>Salah</td>\n", " <td>Liverpool</td>\n", " <td>Forward</td>\n", " <td>8</td>\n", " <td>0</td>\n", " <td>8</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>7</td>\n", " <td>1</td>\n", " <td>1</td>\n", " <td>13</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>5</td>\n", " <td>Nkunku</td>\n", " <td>Leipzig</td>\n", " <td>Midfielder</td>\n", " <td>7</td>\n", " <td>3</td>\n", " <td>1</td>\n", " <td>3</td>\n", " <td>0</td>\n", " <td>7</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>6</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>\n", " <div class=\"colab-df-buttons\">\n", "\n", " <div class=\"colab-df-container\">\n", " <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-f05daf5c-43c1-4ff4-9938-7f890d679d76')\"\n", " title=\"Convert this dataframe to an interactive table.\"\n", " style=\"display:none;\">\n", "\n", " <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n", " <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n", " </svg>\n", " </button>\n", "\n", " <style>\n", " .colab-df-container {\n", " display:flex;\n", " gap: 12px;\n", " }\n", "\n", " .colab-df-convert {\n", " background-color: #E8F0FE;\n", " border: none;\n", " border-radius: 50%;\n", " cursor: pointer;\n", " display: none;\n", " fill: #1967D2;\n", " height: 32px;\n", " padding: 0 0 0 0;\n", " width: 32px;\n", " }\n", "\n", " .colab-df-convert:hover {\n", " background-color: #E2EBFA;\n", " box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n", " fill: #174EA6;\n", " }\n", "\n", " .colab-df-buttons div {\n", " margin-bottom: 4px;\n", " }\n", "\n", " [theme=dark] .colab-df-convert {\n", " background-color: #3B4455;\n", " fill: #D2E3FC;\n", " }\n", "\n", " [theme=dark] .colab-df-convert:hover {\n", " background-color: #434B5C;\n", " box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n", " filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n", " fill: #FFFFFF;\n", " }\n", " </style>\n", "\n", " <script>\n", " const buttonEl =\n", " document.querySelector('#df-f05daf5c-43c1-4ff4-9938-7f890d679d76 button.colab-df-convert');\n", " buttonEl.style.display =\n", " google.colab.kernel.accessAllowed ? 'block' : 'none';\n", "\n", " async function convertToInteractive(key) {\n", " const element = document.querySelector('#df-f05daf5c-43c1-4ff4-9938-7f890d679d76');\n", " const dataTable =\n", " await google.colab.kernel.invokeFunction('convertToInteractive',\n", " [key], {});\n", " if (!dataTable) return;\n", "\n", " const docLinkHtml = 'Like what you see? Visit the ' +\n", " '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n", " + ' to learn more about interactive tables.';\n", " element.innerHTML = '';\n", " dataTable['output_type'] = 'display_data';\n", " await google.colab.output.renderOutput(dataTable, element);\n", " const docLink = document.createElement('div');\n", " docLink.innerHTML = docLinkHtml;\n", " element.appendChild(docLink);\n", " }\n", " </script>\n", " </div>\n", "\n", "\n", "<div id=\"df-25308461-65a3-4836-817d-ea597ffe8612\">\n", " <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-25308461-65a3-4836-817d-ea597ffe8612')\"\n", " title=\"Suggest charts.\"\n", " style=\"display:none;\">\n", "\n", "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n", " width=\"24px\">\n", " <g>\n", " <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n", " </g>\n", "</svg>\n", " </button>\n", "\n", "<style>\n", " .colab-df-quickchart {\n", " --bg-color: #E8F0FE;\n", " --fill-color: #1967D2;\n", " --hover-bg-color: #E2EBFA;\n", " --hover-fill-color: #174EA6;\n", " --disabled-fill-color: #AAA;\n", " --disabled-bg-color: #DDD;\n", " }\n", "\n", " [theme=dark] .colab-df-quickchart {\n", " --bg-color: #3B4455;\n", " --fill-color: #D2E3FC;\n", " --hover-bg-color: #434B5C;\n", " --hover-fill-color: #FFFFFF;\n", " --disabled-bg-color: #3B4455;\n", " --disabled-fill-color: #666;\n", " }\n", "\n", " .colab-df-quickchart {\n", " background-color: var(--bg-color);\n", " border: none;\n", " border-radius: 50%;\n", " cursor: pointer;\n", " display: none;\n", " fill: var(--fill-color);\n", " height: 32px;\n", " padding: 0;\n", " width: 32px;\n", " }\n", "\n", " .colab-df-quickchart:hover {\n", " background-color: var(--hover-bg-color);\n", " box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n", " fill: var(--button-hover-fill-color);\n", " }\n", "\n", " .colab-df-quickchart-complete:disabled,\n", " .colab-df-quickchart-complete:disabled:hover {\n", " background-color: var(--disabled-bg-color);\n", " fill: var(--disabled-fill-color);\n", " box-shadow: none;\n", " }\n", "\n", " .colab-df-spinner {\n", " border: 2px solid var(--fill-color);\n", " border-color: transparent;\n", " border-bottom-color: var(--fill-color);\n", " animation:\n", " spin 1s steps(1) infinite;\n", " }\n", "\n", " @keyframes spin {\n", " 0% {\n", " border-color: transparent;\n", " border-bottom-color: var(--fill-color);\n", " border-left-color: var(--fill-color);\n", " }\n", " 20% {\n", " border-color: transparent;\n", " border-left-color: var(--fill-color);\n", " border-top-color: var(--fill-color);\n", " }\n", " 30% {\n", " border-color: transparent;\n", " border-left-color: var(--fill-color);\n", " border-top-color: var(--fill-color);\n", " border-right-color: var(--fill-color);\n", " }\n", " 40% {\n", " border-color: transparent;\n", " border-right-color: var(--fill-color);\n", " border-top-color: var(--fill-color);\n", " }\n", " 60% {\n", " border-color: transparent;\n", " border-right-color: var(--fill-color);\n", " }\n", " 80% {\n", " border-color: transparent;\n", " border-right-color: var(--fill-color);\n", " border-bottom-color: var(--fill-color);\n", " }\n", " 90% {\n", " border-color: transparent;\n", " border-bottom-color: var(--fill-color);\n", " }\n", " }\n", "</style>\n", "\n", " <script>\n", " async function quickchart(key) {\n", " const quickchartButtonEl =\n", " document.querySelector('#' + key + ' button');\n", " quickchartButtonEl.disabled = true; // To prevent multiple clicks.\n", " quickchartButtonEl.classList.add('colab-df-spinner');\n", " try {\n", " const charts = await google.colab.kernel.invokeFunction(\n", " 'suggestCharts', [key], {});\n", " } catch (error) {\n", " console.error('Error during call to suggestCharts:', error);\n", " }\n", " quickchartButtonEl.classList.remove('colab-df-spinner');\n", " quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n", " }\n", " (() => {\n", " let quickchartButtonEl =\n", " document.querySelector('#df-25308461-65a3-4836-817d-ea597ffe8612 button');\n", " quickchartButtonEl.style.display =\n", " google.colab.kernel.accessAllowed ? 'block' : 'none';\n", " })();\n", " </script>\n", "</div>\n", " </div>\n", " </div>\n" ] }, "metadata": {}, "execution_count": 5 } ] }, { "cell_type": "code", "source": [ "#print shape of goals\n", "goals.shape" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "GHMj4O-TaFD_", "outputId": "84a5b7d8-b74f-4b5a-faad-b7591779de8d" }, "execution_count": 6, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "(183, 13)" ] }, "metadata": {}, "execution_count": 6 } ] }, { "cell_type": "markdown", "source": [ "### EDA\n", "\n", "The next step of a data analysis project pipeline is to perform EDA or Exploratory Data Analysis. Our ultimate goal is to get a list of top 10 players from an offensive point of view.\n", "\n", "EDA includes cleaning the data. Think about the goal of the project. We are not interested in the foot a player used to score the goal. It would've been relevant if we were thinking about which side of the field we want the player to be in. But we just want to know who are good players.\n", "\n", "Keeping this in mind, let us first begin by cleaning the data." ], "metadata": { "id": "2rpZpnns0pWw" } }, { "cell_type": "code", "source": [ "#printing the names of the columns in all the dataframes\n", "print(attack.columns)\n", "print(goals.columns)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "grsU8hC30mI2", "outputId": "e58e827c-1ca7-4991-a2fb-27b6030bb308" }, "execution_count": 7, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Index(['serial', 'player_name', 'club', 'position', 'assists', 'corner_taken',\n", " 'offsides', 'dribbles', 'match_played'],\n", " dtype='object')\n", "Index(['serial', 'player_name', 'club', 'position', 'goals', 'right_foot',\n", " 'left_foot', 'headers', 'others', 'inside_area', 'outside_areas',\n", " 'penalties', 'match_played'],\n", " dtype='object')\n" ] } ] }, { "cell_type": "markdown", "source": [ "Player names, club, matches played and position are common in all the 4 datasets, so let's keep them.\n", "\n", "Let's first work with `attack`. Notice the different columns. They all provide some information on whether the player is good (assists etc) or bad (off-sides etc). So let's preserve them.\n", "\n", "We can delete columns like 'right_foot', 'left_foot', 'headers', 'others', 'inside_area', and 'outside_areas' from `goals`." ], "metadata": { "id": "eAhw6DLw3FkY" } }, { "cell_type": "code", "source": [ "#deleting 'serial' from attack\n", "attack = attack.drop(columns = ['serial'])" ], "metadata": { "id": "IvYphzPQ8UNt" }, "execution_count": 8, "outputs": [] }, { "cell_type": "code", "source": [ "attack.head()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 206 }, "id": "zKUQCmBa8sx4", "outputId": "354aebef-9787-426b-cbe6-dcf8d6d9f68e" }, "execution_count": 9, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " player_name club position assists corner_taken offsides \\\n", "0 Bruno Fernandes Man. United Midfielder 7 10 2 \n", "1 Vinícius Júnior Real Madrid Forward 6 3 4 \n", "2 Sané Bayern Midfielder 6 3 3 \n", "3 Antony Ajax Forward 5 3 4 \n", "4 Alexander-Arnold Liverpool Defender 4 36 0 \n", "\n", " dribbles match_played \n", "0 7 7 \n", "1 83 13 \n", "2 32 10 \n", "3 28 7 \n", "4 9 9 " ], "text/html": [ "\n", " <div id=\"df-c0fb41da-8287-4005-8c78-d295f8563a81\" class=\"colab-df-container\">\n", " <div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>player_name</th>\n", " <th>club</th>\n", " <th>position</th>\n", " <th>assists</th>\n", " <th>corner_taken</th>\n", " <th>offsides</th>\n", " <th>dribbles</th>\n", " <th>match_played</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>Bruno Fernandes</td>\n", " <td>Man. United</td>\n", " <td>Midfielder</td>\n", " <td>7</td>\n", " <td>10</td>\n", " <td>2</td>\n", " <td>7</td>\n", " <td>7</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>Vinícius Júnior</td>\n", " <td>Real Madrid</td>\n", " <td>Forward</td>\n", " <td>6</td>\n", " <td>3</td>\n", " <td>4</td>\n", " <td>83</td>\n", " <td>13</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>Sané</td>\n", " <td>Bayern</td>\n", " <td>Midfielder</td>\n", " <td>6</td>\n", " <td>3</td>\n", " <td>3</td>\n", " <td>32</td>\n", " <td>10</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>Antony</td>\n", " <td>Ajax</td>\n", " <td>Forward</td>\n", " <td>5</td>\n", " <td>3</td>\n", " <td>4</td>\n", " <td>28</td>\n", " <td>7</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>Alexander-Arnold</td>\n", " <td>Liverpool</td>\n", " <td>Defender</td>\n", " <td>4</td>\n", " <td>36</td>\n", " <td>0</td>\n", " <td>9</td>\n", " <td>9</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>\n", " <div class=\"colab-df-buttons\">\n", "\n", " <div class=\"colab-df-container\">\n", " <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-c0fb41da-8287-4005-8c78-d295f8563a81')\"\n", " title=\"Convert this dataframe to an interactive table.\"\n", " style=\"display:none;\">\n", "\n", " <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n", " <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n", " </svg>\n", " </button>\n", "\n", " <style>\n", " .colab-df-container {\n", " display:flex;\n", " gap: 12px;\n", " }\n", "\n", " .colab-df-convert {\n", " background-color: #E8F0FE;\n", " border: none;\n", " border-radius: 50%;\n", " cursor: pointer;\n", " display: none;\n", " fill: #1967D2;\n", " height: 32px;\n", " padding: 0 0 0 0;\n", " width: 32px;\n", " }\n", "\n", " .colab-df-convert:hover {\n", " background-color: #E2EBFA;\n", " box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n", " fill: #174EA6;\n", " }\n", "\n", " .colab-df-buttons div {\n", " margin-bottom: 4px;\n", " }\n", "\n", " [theme=dark] .colab-df-convert {\n", " background-color: #3B4455;\n", " fill: #D2E3FC;\n", " }\n", "\n", " [theme=dark] .colab-df-convert:hover {\n", " background-color: #434B5C;\n", " box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n", " filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n", " fill: #FFFFFF;\n", " }\n", " </style>\n", "\n", " <script>\n", " const buttonEl =\n", " document.querySelector('#df-c0fb41da-8287-4005-8c78-d295f8563a81 button.colab-df-convert');\n", " buttonEl.style.display =\n", " google.colab.kernel.accessAllowed ? 'block' : 'none';\n", "\n", " async function convertToInteractive(key) {\n", " const element = document.querySelector('#df-c0fb41da-8287-4005-8c78-d295f8563a81');\n", " const dataTable =\n", " await google.colab.kernel.invokeFunction('convertToInteractive',\n", " [key], {});\n", " if (!dataTable) return;\n", "\n", " const docLinkHtml = 'Like what you see? Visit the ' +\n", " '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n", " + ' to learn more about interactive tables.';\n", " element.innerHTML = '';\n", " dataTable['output_type'] = 'display_data';\n", " await google.colab.output.renderOutput(dataTable, element);\n", " const docLink = document.createElement('div');\n", " docLink.innerHTML = docLinkHtml;\n", " element.appendChild(docLink);\n", " }\n", " </script>\n", " </div>\n", "\n", "\n", "<div id=\"df-8abe4ed4-016c-491d-a36d-ad4b8d5b7299\">\n", " <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-8abe4ed4-016c-491d-a36d-ad4b8d5b7299')\"\n", " title=\"Suggest charts.\"\n", " style=\"display:none;\">\n", "\n", "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n", " width=\"24px\">\n", " <g>\n", " <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n", " </g>\n", "</svg>\n", " </button>\n", "\n", "<style>\n", " .colab-df-quickchart {\n", " --bg-color: #E8F0FE;\n", " --fill-color: #1967D2;\n", " --hover-bg-color: #E2EBFA;\n", " --hover-fill-color: #174EA6;\n", " --disabled-fill-color: #AAA;\n", " --disabled-bg-color: #DDD;\n", " }\n", "\n", " [theme=dark] .colab-df-quickchart {\n", " --bg-color: #3B4455;\n", " --fill-color: #D2E3FC;\n", " --hover-bg-color: #434B5C;\n", " --hover-fill-color: #FFFFFF;\n", " --disabled-bg-color: #3B4455;\n", " --disabled-fill-color: #666;\n", " }\n", "\n", " .colab-df-quickchart {\n", " background-color: var(--bg-color);\n", " border: none;\n", " border-radius: 50%;\n", " cursor: pointer;\n", " display: none;\n", " fill: var(--fill-color);\n", " height: 32px;\n", " padding: 0;\n", " width: 32px;\n", " }\n", "\n", " .colab-df-quickchart:hover {\n", " background-color: var(--hover-bg-color);\n", " box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n", " fill: var(--button-hover-fill-color);\n", " }\n", "\n", " .colab-df-quickchart-complete:disabled,\n", " .colab-df-quickchart-complete:disabled:hover {\n", " background-color: var(--disabled-bg-color);\n", " fill: var(--disabled-fill-color);\n", " box-shadow: none;\n", " }\n", "\n", " .colab-df-spinner {\n", " border: 2px solid var(--fill-color);\n", " border-color: transparent;\n", " border-bottom-color: var(--fill-color);\n", " animation:\n", " spin 1s steps(1) infinite;\n", " }\n", "\n", " @keyframes spin {\n", " 0% {\n", " border-color: transparent;\n", " border-bottom-color: var(--fill-color);\n", " border-left-color: var(--fill-color);\n", " }\n", " 20% {\n", " border-color: transparent;\n", " border-left-color: var(--fill-color);\n", " border-top-color: var(--fill-color);\n", " }\n", " 30% {\n", " border-color: transparent;\n", " border-left-color: var(--fill-color);\n", " border-top-color: var(--fill-color);\n", " border-right-color: var(--fill-color);\n", " }\n", " 40% {\n", " border-color: transparent;\n", " border-right-color: var(--fill-color);\n", " border-top-color: var(--fill-color);\n", " }\n", " 60% {\n", " border-color: transparent;\n", " border-right-color: var(--fill-color);\n", " }\n", " 80% {\n", " border-color: transparent;\n", " border-right-color: var(--fill-color);\n", " border-bottom-color: var(--fill-color);\n", " }\n", " 90% {\n", " border-color: transparent;\n", " border-bottom-color: var(--fill-color);\n", " }\n", " }\n", "</style>\n", "\n", " <script>\n", " async function quickchart(key) {\n", " const quickchartButtonEl =\n", " document.querySelector('#' + key + ' button');\n", " quickchartButtonEl.disabled = true; // To prevent multiple clicks.\n", " quickchartButtonEl.classList.add('colab-df-spinner');\n", " try {\n", " const charts = await google.colab.kernel.invokeFunction(\n", " 'suggestCharts', [key], {});\n", " } catch (error) {\n", " console.error('Error during call to suggestCharts:', error);\n", " }\n", " quickchartButtonEl.classList.remove('colab-df-spinner');\n", " quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n", " }\n", " (() => {\n", " let quickchartButtonEl =\n", " document.querySelector('#df-8abe4ed4-016c-491d-a36d-ad4b8d5b7299 button');\n", " quickchartButtonEl.style.display =\n", " google.colab.kernel.accessAllowed ? 'block' : 'none';\n", " })();\n", " </script>\n", "</div>\n", " </div>\n", " </div>\n" ] }, "metadata": {}, "execution_count": 9 } ] }, { "cell_type": "code", "source": [ "#deleting columns from goals\n", "del_col = ['right_foot', 'left_foot', 'headers', 'others', 'inside_area', 'outside_areas', 'serial']\n", "#delete columns here\n", "goals = goals.drop(columns = del_col)" ], "metadata": { "id": "KK5AEy9427Gj" }, "execution_count": 10, "outputs": [] }, { "cell_type": "markdown", "source": [ "Let us now merge the dataframes." ], "metadata": { "id": "8JLJhyAd8Gnk" } }, { "cell_type": "code", "source": [ "#merge ucl here\n", "ucl = attack.merge(goals, on= ['player_name', 'club', 'position', 'match_played'])\n", "ucl.head()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 206 }, "id": "np28EL647q38", "outputId": "2ac41298-2bdb-4aa3-d5a4-891a381d8b27" }, "execution_count": 11, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " player_name club position assists corner_taken offsides \\\n", "0 Vinícius Júnior Real Madrid Forward 6 3 4 \n", "1 Sané Bayern Midfielder 6 3 3 \n", "2 Antony Ajax Forward 5 3 4 \n", "3 De Bruyne Man. City Midfielder 4 18 0 \n", "4 Mbappé Paris Forward 4 4 8 \n", "\n", " dribbles match_played goals penalties \n", "0 83 13 4 0 \n", "1 32 10 6 0 \n", "2 28 7 2 0 \n", "3 14 10 2 0 \n", "4 43 8 6 0 " ], "text/html": [ "\n", " <div id=\"df-a1b17598-532c-469b-904a-d8da4c99d3b0\" class=\"colab-df-container\">\n", " <div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>player_name</th>\n", " <th>club</th>\n", " <th>position</th>\n", " <th>assists</th>\n", " <th>corner_taken</th>\n", " <th>offsides</th>\n", " <th>dribbles</th>\n", " <th>match_played</th>\n", " <th>goals</th>\n", " <th>penalties</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>Vinícius Júnior</td>\n", " <td>Real Madrid</td>\n", " <td>Forward</td>\n", " <td>6</td>\n", " <td>3</td>\n", " <td>4</td>\n", " <td>83</td>\n", " <td>13</td>\n", " <td>4</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>Sané</td>\n", " <td>Bayern</td>\n", " <td>Midfielder</td>\n", " <td>6</td>\n", " <td>3</td>\n", " <td>3</td>\n", " <td>32</td>\n", " <td>10</td>\n", " <td>6</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>Antony</td>\n", " <td>Ajax</td>\n", " <td>Forward</td>\n", " <td>5</td>\n", " <td>3</td>\n", " <td>4</td>\n", " <td>28</td>\n", " <td>7</td>\n", " <td>2</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>De Bruyne</td>\n", " <td>Man. City</td>\n", " <td>Midfielder</td>\n", " <td>4</td>\n", " <td>18</td>\n", " <td>0</td>\n", " <td>14</td>\n", " <td>10</td>\n", " <td>2</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>Mbappé</td>\n", " <td>Paris</td>\n", " <td>Forward</td>\n", " <td>4</td>\n", " <td>4</td>\n", " <td>8</td>\n", " <td>43</td>\n", " <td>8</td>\n", " <td>6</td>\n", " <td>0</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>\n", " <div class=\"colab-df-buttons\">\n", "\n", " <div class=\"colab-df-container\">\n", " <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-a1b17598-532c-469b-904a-d8da4c99d3b0')\"\n", " title=\"Convert this dataframe to an interactive table.\"\n", " style=\"display:none;\">\n", "\n", " <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n", " <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n", " </svg>\n", " </button>\n", "\n", " <style>\n", " .colab-df-container {\n", " display:flex;\n", " gap: 12px;\n", " }\n", "\n", " .colab-df-convert {\n", " background-color: #E8F0FE;\n", " border: none;\n", " border-radius: 50%;\n", " cursor: pointer;\n", " display: none;\n", " fill: #1967D2;\n", " height: 32px;\n", " padding: 0 0 0 0;\n", " width: 32px;\n", " }\n", "\n", " .colab-df-convert:hover {\n", " background-color: #E2EBFA;\n", " box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n", " fill: #174EA6;\n", " }\n", "\n", " .colab-df-buttons div {\n", " margin-bottom: 4px;\n", " }\n", "\n", " [theme=dark] .colab-df-convert {\n", " background-color: #3B4455;\n", " fill: #D2E3FC;\n", " }\n", "\n", " [theme=dark] .colab-df-convert:hover {\n", " background-color: #434B5C;\n", " box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n", " filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n", " fill: #FFFFFF;\n", " }\n", " </style>\n", "\n", " <script>\n", " const buttonEl =\n", " document.querySelector('#df-a1b17598-532c-469b-904a-d8da4c99d3b0 button.colab-df-convert');\n", " buttonEl.style.display =\n", " google.colab.kernel.accessAllowed ? 'block' : 'none';\n", "\n", " async function convertToInteractive(key) {\n", " const element = document.querySelector('#df-a1b17598-532c-469b-904a-d8da4c99d3b0');\n", " const dataTable =\n", " await google.colab.kernel.invokeFunction('convertToInteractive',\n", " [key], {});\n", " if (!dataTable) return;\n", "\n", " const docLinkHtml = 'Like what you see? Visit the ' +\n", " '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n", " + ' to learn more about interactive tables.';\n", " element.innerHTML = '';\n", " dataTable['output_type'] = 'display_data';\n", " await google.colab.output.renderOutput(dataTable, element);\n", " const docLink = document.createElement('div');\n", " docLink.innerHTML = docLinkHtml;\n", " element.appendChild(docLink);\n", " }\n", " </script>\n", " </div>\n", "\n", "\n", "<div id=\"df-689d3f77-f47b-465d-a5cb-bcec90d666f5\">\n", " <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-689d3f77-f47b-465d-a5cb-bcec90d666f5')\"\n", " title=\"Suggest charts.\"\n", " style=\"display:none;\">\n", "\n", "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n", " width=\"24px\">\n", " <g>\n", " <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n", " </g>\n", "</svg>\n", " </button>\n", "\n", "<style>\n", " .colab-df-quickchart {\n", " --bg-color: #E8F0FE;\n", " --fill-color: #1967D2;\n", " --hover-bg-color: #E2EBFA;\n", " --hover-fill-color: #174EA6;\n", " --disabled-fill-color: #AAA;\n", " --disabled-bg-color: #DDD;\n", " }\n", "\n", " [theme=dark] .colab-df-quickchart {\n", " --bg-color: #3B4455;\n", " --fill-color: #D2E3FC;\n", " --hover-bg-color: #434B5C;\n", " --hover-fill-color: #FFFFFF;\n", " --disabled-bg-color: #3B4455;\n", " --disabled-fill-color: #666;\n", " }\n", "\n", " .colab-df-quickchart {\n", " background-color: var(--bg-color);\n", " border: none;\n", " border-radius: 50%;\n", " cursor: pointer;\n", " display: none;\n", " fill: var(--fill-color);\n", " height: 32px;\n", " padding: 0;\n", " width: 32px;\n", " }\n", "\n", " .colab-df-quickchart:hover {\n", " background-color: var(--hover-bg-color);\n", " box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n", " fill: var(--button-hover-fill-color);\n", " }\n", "\n", " .colab-df-quickchart-complete:disabled,\n", " .colab-df-quickchart-complete:disabled:hover {\n", " background-color: var(--disabled-bg-color);\n", " fill: var(--disabled-fill-color);\n", " box-shadow: none;\n", " }\n", "\n", " .colab-df-spinner {\n", " border: 2px solid var(--fill-color);\n", " border-color: transparent;\n", " border-bottom-color: var(--fill-color);\n", " animation:\n", " spin 1s steps(1) infinite;\n", " }\n", "\n", " @keyframes spin {\n", " 0% {\n", " border-color: transparent;\n", " border-bottom-color: var(--fill-color);\n", " border-left-color: var(--fill-color);\n", " }\n", " 20% {\n", " border-color: transparent;\n", " border-left-color: var(--fill-color);\n", " border-top-color: var(--fill-color);\n", " }\n", " 30% {\n", " border-color: transparent;\n", " border-left-color: var(--fill-color);\n", " border-top-color: var(--fill-color);\n", " border-right-color: var(--fill-color);\n", " }\n", " 40% {\n", " border-color: transparent;\n", " border-right-color: var(--fill-color);\n", " border-top-color: var(--fill-color);\n", " }\n", " 60% {\n", " border-color: transparent;\n", " border-right-color: var(--fill-color);\n", " }\n", " 80% {\n", " border-color: transparent;\n", " border-right-color: var(--fill-color);\n", " border-bottom-color: var(--fill-color);\n", " }\n", " 90% {\n", " border-color: transparent;\n", " border-bottom-color: var(--fill-color);\n", " }\n", " }\n", "</style>\n", "\n", " <script>\n", " async function quickchart(key) {\n", " const quickchartButtonEl =\n", " document.querySelector('#' + key + ' button');\n", " quickchartButtonEl.disabled = true; // To prevent multiple clicks.\n", " quickchartButtonEl.classList.add('colab-df-spinner');\n", " try {\n", " const charts = await google.colab.kernel.invokeFunction(\n", " 'suggestCharts', [key], {});\n", " } catch (error) {\n", " console.error('Error during call to suggestCharts:', error);\n", " }\n", " quickchartButtonEl.classList.remove('colab-df-spinner');\n", " quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n", " }\n", " (() => {\n", " let quickchartButtonEl =\n", " document.querySelector('#df-689d3f77-f47b-465d-a5cb-bcec90d666f5 button');\n", " quickchartButtonEl.style.display =\n", " google.colab.kernel.accessAllowed ? 'block' : 'none';\n", " })();\n", " </script>\n", "</div>\n", " </div>\n", " </div>\n" ] }, "metadata": {}, "execution_count": 11 } ] }, { "cell_type": "code", "source": [ "#print shape of ucl\n", "ucl.shape" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "-tt6HzR8aKiE", "outputId": "6a4d81d1-f1bc-400f-93c0-1b5b4847e0ab" }, "execution_count": 12, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "(82, 10)" ] }, "metadata": {}, "execution_count": 12 } ] }, { "cell_type": "markdown", "source": [ "It seems like a lot of players were dropped because they didn't exist in both the datasets. Let us continue with a smaller dataset for the scope of this question." ], "metadata": { "id": "9ayGAYudZxPw" } }, { "cell_type": "markdown", "source": [ "Now, we have a final dataframe called `ucl` to work with. Let us now make a list of the top 10 players.\n", "\n", "Columns like `assists`, `corner_taken`, `dribbles` and `goals` give a positive score.\n", "\n", "Columns like `offsides` and `penalties` give a negative score.\n", "\n", "Let us first normalize all the columns using `match_played`, then simply add or subtract the positive and negative scores and then make a dataframe of the top 10 players.\n", "\n", "Remember, for more complex projects, we will normalize all columns to have a value between 0 and 1, which will not be the case for this project since columns like `dribble` have values like 83, 67 and so on." ], "metadata": { "id": "-1rXgrue85VV" } }, { "cell_type": "code", "source": [ "#normalizing the data and storing them in new columns\n", "ucl['assists_normalized'] = ucl['assists']/ucl['match_played']\n", "ucl['corner_taken_normalized'] = ucl['corner_taken']/ucl['match_played']\n", "ucl['dribbles_normalized'] = ucl['dribbles']/ucl['match_played']\n", "ucl['goals_normalized'] = ucl['goals']/ucl['match_played']\n", "ucl['offsides_normalized'] = ucl['offsides']/ucl['match_played']\n", "ucl['penalties_normalized'] = ucl['penalties']/ucl['match_played']" ], "metadata": { "id": "OKszPGlUoj4C" }, "execution_count": 13, "outputs": [] }, { "cell_type": "code", "source": [ "# Calculate overall score\n", "ucl['score'] = ucl['assists_normalized'] + ucl['corner_taken_normalized'] + ucl['dribbles_normalized'] + ucl['goals_normalized'] - ucl['offsides_normalized'] - ucl['penalties_normalized']" ], "metadata": { "id": "e797uAQool8v" }, "execution_count": 14, "outputs": [] }, { "cell_type": "code", "source": [ "#ranking the players\n", "top_10_players = ucl.nlargest(10, 'score')\n", "top_10_players.head(10)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 435 }, "id": "TJQwT7uToZqg", "outputId": "3f2d51b6-ca74-4dff-ea8e-b9a9b64e7ef6" }, "execution_count": 15, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " player_name club position assists corner_taken offsides \\\n", "9 Coman Bayern Forward 3 4 4 \n", "0 Vinícius Júnior Real Madrid Forward 6 3 4 \n", "51 Moumi Ngamaleu Young Boys Midfielder 1 1 0 \n", "4 Mbappé Paris Forward 4 4 8 \n", "15 Mahrez Man. City Midfielder 2 30 5 \n", "2 Antony Ajax Forward 5 3 4 \n", "11 Bellingham Dortmund Midfielder 3 1 1 \n", "1 Sané Bayern Midfielder 6 3 3 \n", "19 Pedro Gonçalves Sporting CP Midfielder 2 7 0 \n", "16 Ziyech Chelsea Midfielder 2 23 0 \n", "\n", " dribbles match_played goals penalties assists_normalized \\\n", "9 59 9 2 0 0.333333 \n", "0 83 13 4 0 0.461538 \n", "51 34 6 1 0 0.166667 \n", "4 43 8 6 0 0.500000 \n", "15 28 12 7 2 0.166667 \n", "2 28 7 2 0 0.714286 \n", "11 24 6 1 0 0.500000 \n", "1 32 10 6 0 0.600000 \n", "19 10 5 4 1 0.400000 \n", "16 12 9 1 0 0.222222 \n", "\n", " corner_taken_normalized dribbles_normalized goals_normalized \\\n", "9 0.444444 6.555556 0.222222 \n", "0 0.230769 6.384615 0.307692 \n", "51 0.166667 5.666667 0.166667 \n", "4 0.500000 5.375000 0.750000 \n", "15 2.500000 2.333333 0.583333 \n", "2 0.428571 4.000000 0.285714 \n", "11 0.166667 4.000000 0.166667 \n", "1 0.300000 3.200000 0.600000 \n", "19 1.400000 2.000000 0.800000 \n", "16 2.555556 1.333333 0.111111 \n", "\n", " offsides_normalized penalties_normalized score \n", "9 0.444444 0.000000 7.111111 \n", "0 0.307692 0.000000 7.076923 \n", "51 0.000000 0.000000 6.166667 \n", "4 1.000000 0.000000 6.125000 \n", "15 0.416667 0.166667 5.000000 \n", "2 0.571429 0.000000 4.857143 \n", "11 0.166667 0.000000 4.666667 \n", "1 0.300000 0.000000 4.400000 \n", "19 0.000000 0.200000 4.400000 \n", "16 0.000000 0.000000 4.222222 " ], "text/html": [ "\n", " <div id=\"df-e286e525-2e77-4968-9800-c635972b593f\" class=\"colab-df-container\">\n", " <div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>player_name</th>\n", " <th>club</th>\n", " <th>position</th>\n", " <th>assists</th>\n", " <th>corner_taken</th>\n", " <th>offsides</th>\n", " <th>dribbles</th>\n", " <th>match_played</th>\n", " <th>goals</th>\n", " <th>penalties</th>\n", " <th>assists_normalized</th>\n", " <th>corner_taken_normalized</th>\n", " <th>dribbles_normalized</th>\n", " <th>goals_normalized</th>\n", " <th>offsides_normalized</th>\n", " <th>penalties_normalized</th>\n", " <th>score</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>9</th>\n", " <td>Coman</td>\n", " <td>Bayern</td>\n", " <td>Forward</td>\n", " <td>3</td>\n", " <td>4</td>\n", " <td>4</td>\n", " <td>59</td>\n", " <td>9</td>\n", " <td>2</td>\n", " <td>0</td>\n", " <td>0.333333</td>\n", " <td>0.444444</td>\n", " <td>6.555556</td>\n", " <td>0.222222</td>\n", " <td>0.444444</td>\n", " <td>0.000000</td>\n", " <td>7.111111</td>\n", " </tr>\n", " <tr>\n", " <th>0</th>\n", " <td>Vinícius Júnior</td>\n", " <td>Real Madrid</td>\n", " <td>Forward</td>\n", " <td>6</td>\n", " <td>3</td>\n", " <td>4</td>\n", " <td>83</td>\n", " <td>13</td>\n", " <td>4</td>\n", " <td>0</td>\n", " <td>0.461538</td>\n", " <td>0.230769</td>\n", " <td>6.384615</td>\n", " <td>0.307692</td>\n", " <td>0.307692</td>\n", " <td>0.000000</td>\n", " <td>7.076923</td>\n", " </tr>\n", " <tr>\n", " <th>51</th>\n", " <td>Moumi Ngamaleu</td>\n", " <td>Young Boys</td>\n", " <td>Midfielder</td>\n", " <td>1</td>\n", " <td>1</td>\n", " <td>0</td>\n", " <td>34</td>\n", " <td>6</td>\n", " <td>1</td>\n", " <td>0</td>\n", " <td>0.166667</td>\n", " <td>0.166667</td>\n", " <td>5.666667</td>\n", " <td>0.166667</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>6.166667</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>Mbappé</td>\n", " <td>Paris</td>\n", " <td>Forward</td>\n", " <td>4</td>\n", " <td>4</td>\n", " <td>8</td>\n", " <td>43</td>\n", " <td>8</td>\n", " <td>6</td>\n", " <td>0</td>\n", " <td>0.500000</td>\n", " <td>0.500000</td>\n", " <td>5.375000</td>\n", " <td>0.750000</td>\n", " <td>1.000000</td>\n", " <td>0.000000</td>\n", " <td>6.125000</td>\n", " </tr>\n", " <tr>\n", " <th>15</th>\n", " <td>Mahrez</td>\n", " <td>Man. City</td>\n", " <td>Midfielder</td>\n", " <td>2</td>\n", " <td>30</td>\n", " <td>5</td>\n", " <td>28</td>\n", " <td>12</td>\n", " <td>7</td>\n", " <td>2</td>\n", " <td>0.166667</td>\n", " <td>2.500000</td>\n", " <td>2.333333</td>\n", " <td>0.583333</td>\n", " <td>0.416667</td>\n", " <td>0.166667</td>\n", " <td>5.000000</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>Antony</td>\n", " <td>Ajax</td>\n", " <td>Forward</td>\n", " <td>5</td>\n", " <td>3</td>\n", " <td>4</td>\n", " <td>28</td>\n", " <td>7</td>\n", " <td>2</td>\n", " <td>0</td>\n", " <td>0.714286</td>\n", " <td>0.428571</td>\n", " <td>4.000000</td>\n", " <td>0.285714</td>\n", " <td>0.571429</td>\n", " <td>0.000000</td>\n", " <td>4.857143</td>\n", " </tr>\n", " <tr>\n", " <th>11</th>\n", " <td>Bellingham</td>\n", " <td>Dortmund</td>\n", " <td>Midfielder</td>\n", " <td>3</td>\n", " <td>1</td>\n", " <td>1</td>\n", " <td>24</td>\n", " <td>6</td>\n", " <td>1</td>\n", " <td>0</td>\n", " <td>0.500000</td>\n", " <td>0.166667</td>\n", " <td>4.000000</td>\n", " <td>0.166667</td>\n", " <td>0.166667</td>\n", " <td>0.000000</td>\n", " <td>4.666667</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>Sané</td>\n", " <td>Bayern</td>\n", " <td>Midfielder</td>\n", " <td>6</td>\n", " <td>3</td>\n", " <td>3</td>\n", " <td>32</td>\n", " <td>10</td>\n", " <td>6</td>\n", " <td>0</td>\n", " <td>0.600000</td>\n", " <td>0.300000</td>\n", " <td>3.200000</td>\n", " <td>0.600000</td>\n", " <td>0.300000</td>\n", " <td>0.000000</td>\n", " <td>4.400000</td>\n", " </tr>\n", " <tr>\n", " <th>19</th>\n", " <td>Pedro Gonçalves</td>\n", " <td>Sporting CP</td>\n", " <td>Midfielder</td>\n", " <td>2</td>\n", " <td>7</td>\n", " <td>0</td>\n", " <td>10</td>\n", " <td>5</td>\n", " <td>4</td>\n", " <td>1</td>\n", " <td>0.400000</td>\n", " <td>1.400000</td>\n", " <td>2.000000</td>\n", " <td>0.800000</td>\n", " <td>0.000000</td>\n", " <td>0.200000</td>\n", " <td>4.400000</td>\n", " </tr>\n", " <tr>\n", " <th>16</th>\n", " <td>Ziyech</td>\n", " <td>Chelsea</td>\n", " <td>Midfielder</td>\n", " <td>2</td>\n", " <td>23</td>\n", " <td>0</td>\n", " <td>12</td>\n", " <td>9</td>\n", " <td>1</td>\n", " <td>0</td>\n", " <td>0.222222</td>\n", " <td>2.555556</td>\n", " <td>1.333333</td>\n", " <td>0.111111</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>4.222222</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>\n", " <div class=\"colab-df-buttons\">\n", "\n", " <div class=\"colab-df-container\">\n", " <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-e286e525-2e77-4968-9800-c635972b593f')\"\n", " title=\"Convert this dataframe to an interactive table.\"\n", " style=\"display:none;\">\n", "\n", " <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n", " <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n", " </svg>\n", " </button>\n", "\n", " <style>\n", " .colab-df-container {\n", " display:flex;\n", " gap: 12px;\n", " }\n", "\n", " .colab-df-convert {\n", " background-color: #E8F0FE;\n", " border: none;\n", " border-radius: 50%;\n", " cursor: pointer;\n", " display: none;\n", " fill: #1967D2;\n", " height: 32px;\n", " padding: 0 0 0 0;\n", " width: 32px;\n", " }\n", "\n", " .colab-df-convert:hover {\n", " background-color: #E2EBFA;\n", " box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n", " fill: #174EA6;\n", " }\n", "\n", " .colab-df-buttons div {\n", " margin-bottom: 4px;\n", " }\n", "\n", " [theme=dark] .colab-df-convert {\n", " background-color: #3B4455;\n", " fill: #D2E3FC;\n", " }\n", "\n", " [theme=dark] .colab-df-convert:hover {\n", " background-color: #434B5C;\n", " box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n", " filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n", " fill: #FFFFFF;\n", " }\n", " </style>\n", "\n", " <script>\n", " const buttonEl =\n", " document.querySelector('#df-e286e525-2e77-4968-9800-c635972b593f button.colab-df-convert');\n", " buttonEl.style.display =\n", " google.colab.kernel.accessAllowed ? 'block' : 'none';\n", "\n", " async function convertToInteractive(key) {\n", " const element = document.querySelector('#df-e286e525-2e77-4968-9800-c635972b593f');\n", " const dataTable =\n", " await google.colab.kernel.invokeFunction('convertToInteractive',\n", " [key], {});\n", " if (!dataTable) return;\n", "\n", " const docLinkHtml = 'Like what you see? Visit the ' +\n", " '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n", " + ' to learn more about interactive tables.';\n", " element.innerHTML = '';\n", " dataTable['output_type'] = 'display_data';\n", " await google.colab.output.renderOutput(dataTable, element);\n", " const docLink = document.createElement('div');\n", " docLink.innerHTML = docLinkHtml;\n", " element.appendChild(docLink);\n", " }\n", " </script>\n", " </div>\n", "\n", "\n", "<div id=\"df-02dcf5ed-8573-4642-8cf2-b1e04357a076\">\n", " <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-02dcf5ed-8573-4642-8cf2-b1e04357a076')\"\n", " title=\"Suggest charts.\"\n", " style=\"display:none;\">\n", "\n", "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n", " width=\"24px\">\n", " <g>\n", " <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n", " </g>\n", "</svg>\n", " </button>\n", "\n", "<style>\n", " .colab-df-quickchart {\n", " --bg-color: #E8F0FE;\n", " --fill-color: #1967D2;\n", " --hover-bg-color: #E2EBFA;\n", " --hover-fill-color: #174EA6;\n", " --disabled-fill-color: #AAA;\n", " --disabled-bg-color: #DDD;\n", " }\n", "\n", " [theme=dark] .colab-df-quickchart {\n", " --bg-color: #3B4455;\n", " --fill-color: #D2E3FC;\n", " --hover-bg-color: #434B5C;\n", " --hover-fill-color: #FFFFFF;\n", " --disabled-bg-color: #3B4455;\n", " --disabled-fill-color: #666;\n", " }\n", "\n", " .colab-df-quickchart {\n", " background-color: var(--bg-color);\n", " border: none;\n", " border-radius: 50%;\n", " cursor: pointer;\n", " display: none;\n", " fill: var(--fill-color);\n", " height: 32px;\n", " padding: 0;\n", " width: 32px;\n", " }\n", "\n", " .colab-df-quickchart:hover {\n", " background-color: var(--hover-bg-color);\n", " box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n", " fill: var(--button-hover-fill-color);\n", " }\n", "\n", " .colab-df-quickchart-complete:disabled,\n", " .colab-df-quickchart-complete:disabled:hover {\n", " background-color: var(--disabled-bg-color);\n", " fill: var(--disabled-fill-color);\n", " box-shadow: none;\n", " }\n", "\n", " .colab-df-spinner {\n", " border: 2px solid var(--fill-color);\n", " border-color: transparent;\n", " border-bottom-color: var(--fill-color);\n", " animation:\n", " spin 1s steps(1) infinite;\n", " }\n", "\n", " @keyframes spin {\n", " 0% {\n", " border-color: transparent;\n", " border-bottom-color: var(--fill-color);\n", " border-left-color: var(--fill-color);\n", " }\n", " 20% {\n", " border-color: transparent;\n", " border-left-color: var(--fill-color);\n", " border-top-color: var(--fill-color);\n", " }\n", " 30% {\n", " border-color: transparent;\n", " border-left-color: var(--fill-color);\n", " border-top-color: var(--fill-color);\n", " border-right-color: var(--fill-color);\n", " }\n", " 40% {\n", " border-color: transparent;\n", " border-right-color: var(--fill-color);\n", " border-top-color: var(--fill-color);\n", " }\n", " 60% {\n", " border-color: transparent;\n", " border-right-color: var(--fill-color);\n", " }\n", " 80% {\n", " border-color: transparent;\n", " border-right-color: var(--fill-color);\n", " border-bottom-color: var(--fill-color);\n", " }\n", " 90% {\n", " border-color: transparent;\n", " border-bottom-color: var(--fill-color);\n", " }\n", " }\n", "</style>\n", "\n", " <script>\n", " async function quickchart(key) {\n", " const quickchartButtonEl =\n", " document.querySelector('#' + key + ' button');\n", " quickchartButtonEl.disabled = true; // To prevent multiple clicks.\n", " quickchartButtonEl.classList.add('colab-df-spinner');\n", " try {\n", " const charts = await google.colab.kernel.invokeFunction(\n", " 'suggestCharts', [key], {});\n", " } catch (error) {\n", " console.error('Error during call to suggestCharts:', error);\n", " }\n", " quickchartButtonEl.classList.remove('colab-df-spinner');\n", " quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n", " }\n", " (() => {\n", " let quickchartButtonEl =\n", " document.querySelector('#df-02dcf5ed-8573-4642-8cf2-b1e04357a076 button');\n", " quickchartButtonEl.style.display =\n", " google.colab.kernel.accessAllowed ? 'block' : 'none';\n", " })();\n", " </script>\n", "</div>\n", " </div>\n", " </div>\n" ] }, "metadata": {}, "execution_count": 15 } ] }, { "cell_type": "code", "source": [ "#cleaning the column\n", "print(top_10_players.columns)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "bU2cahIJo84-", "outputId": "6cad1f6e-9dfa-4784-c8b4-629f75dfe147" }, "execution_count": 16, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Index(['player_name', 'club', 'position', 'assists', 'corner_taken',\n", " 'offsides', 'dribbles', 'match_played', 'goals', 'penalties',\n", " 'assists_normalized', 'corner_taken_normalized', 'dribbles_normalized',\n", " 'goals_normalized', 'offsides_normalized', 'penalties_normalized',\n", " 'score'],\n", " dtype='object')\n" ] } ] }, { "cell_type": "code", "source": [ "#delete\n", "del_col = ['assists', 'corner_taken',\n", " 'offsides', 'dribbles', 'match_played', 'goals', 'penalties',\n", " 'assists_normalized', 'corner_taken_normalized', 'dribbles_normalized',\n", " 'goals_normalized', 'offsides_normalized', 'penalties_normalized']\n", "top_10_players = top_10_players.drop(columns = del_col)" ], "metadata": { "id": "N2hwWcwvpDg-" }, "execution_count": 17, "outputs": [] }, { "cell_type": "code", "source": [ "top_10_players.head(10)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 363 }, "id": "HtP7mrFJpP9z", "outputId": "4cb62487-4865-4c25-bddd-086e3e5a703a" }, "execution_count": 18, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " player_name club position score\n", "9 Coman Bayern Forward 7.111111\n", "0 Vinícius Júnior Real Madrid Forward 7.076923\n", "51 Moumi Ngamaleu Young Boys Midfielder 6.166667\n", "4 Mbappé Paris Forward 6.125000\n", "15 Mahrez Man. City Midfielder 5.000000\n", "2 Antony Ajax Forward 4.857143\n", "11 Bellingham Dortmund Midfielder 4.666667\n", "1 Sané Bayern Midfielder 4.400000\n", "19 Pedro Gonçalves Sporting CP Midfielder 4.400000\n", "16 Ziyech Chelsea Midfielder 4.222222" ], "text/html": [ "\n", " <div id=\"df-660678a5-6d83-4c29-a64d-8006c51a6195\" class=\"colab-df-container\">\n", " <div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>player_name</th>\n", " <th>club</th>\n", " <th>position</th>\n", " <th>score</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>9</th>\n", " <td>Coman</td>\n", " <td>Bayern</td>\n", " <td>Forward</td>\n", " <td>7.111111</td>\n", " </tr>\n", " <tr>\n", " <th>0</th>\n", " <td>Vinícius Júnior</td>\n", " <td>Real Madrid</td>\n", " <td>Forward</td>\n", " <td>7.076923</td>\n", " </tr>\n", " <tr>\n", " <th>51</th>\n", " <td>Moumi Ngamaleu</td>\n", " <td>Young Boys</td>\n", " <td>Midfielder</td>\n", " <td>6.166667</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>Mbappé</td>\n", " <td>Paris</td>\n", " <td>Forward</td>\n", " <td>6.125000</td>\n", " </tr>\n", " <tr>\n", " <th>15</th>\n", " <td>Mahrez</td>\n", " <td>Man. City</td>\n", " <td>Midfielder</td>\n", " <td>5.000000</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>Antony</td>\n", " <td>Ajax</td>\n", " <td>Forward</td>\n", " <td>4.857143</td>\n", " </tr>\n", " <tr>\n", " <th>11</th>\n", " <td>Bellingham</td>\n", " <td>Dortmund</td>\n", " <td>Midfielder</td>\n", " <td>4.666667</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>Sané</td>\n", " <td>Bayern</td>\n", " <td>Midfielder</td>\n", " <td>4.400000</td>\n", " </tr>\n", " <tr>\n", " <th>19</th>\n", " <td>Pedro Gonçalves</td>\n", " <td>Sporting CP</td>\n", " <td>Midfielder</td>\n", " <td>4.400000</td>\n", " </tr>\n", " <tr>\n", " <th>16</th>\n", " <td>Ziyech</td>\n", " <td>Chelsea</td>\n", " <td>Midfielder</td>\n", " <td>4.222222</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>\n", " <div class=\"colab-df-buttons\">\n", "\n", " <div class=\"colab-df-container\">\n", " <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-660678a5-6d83-4c29-a64d-8006c51a6195')\"\n", " title=\"Convert this dataframe to an interactive table.\"\n", " style=\"display:none;\">\n", "\n", " <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n", " <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n", " </svg>\n", " </button>\n", "\n", " <style>\n", " .colab-df-container {\n", " display:flex;\n", " gap: 12px;\n", " }\n", "\n", " .colab-df-convert {\n", " background-color: #E8F0FE;\n", " border: none;\n", " border-radius: 50%;\n", " cursor: pointer;\n", " display: none;\n", " fill: #1967D2;\n", " height: 32px;\n", " padding: 0 0 0 0;\n", " width: 32px;\n", " }\n", "\n", " .colab-df-convert:hover {\n", " background-color: #E2EBFA;\n", " box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n", " fill: #174EA6;\n", " }\n", "\n", " .colab-df-buttons div {\n", " margin-bottom: 4px;\n", " }\n", "\n", " [theme=dark] .colab-df-convert {\n", " background-color: #3B4455;\n", " fill: #D2E3FC;\n", " }\n", "\n", " [theme=dark] .colab-df-convert:hover {\n", " background-color: #434B5C;\n", " box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n", " filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n", " fill: #FFFFFF;\n", " }\n", " </style>\n", "\n", " <script>\n", " const buttonEl =\n", " document.querySelector('#df-660678a5-6d83-4c29-a64d-8006c51a6195 button.colab-df-convert');\n", " buttonEl.style.display =\n", " google.colab.kernel.accessAllowed ? 'block' : 'none';\n", "\n", " async function convertToInteractive(key) {\n", " const element = document.querySelector('#df-660678a5-6d83-4c29-a64d-8006c51a6195');\n", " const dataTable =\n", " await google.colab.kernel.invokeFunction('convertToInteractive',\n", " [key], {});\n", " if (!dataTable) return;\n", "\n", " const docLinkHtml = 'Like what you see? Visit the ' +\n", " '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n", " + ' to learn more about interactive tables.';\n", " element.innerHTML = '';\n", " dataTable['output_type'] = 'display_data';\n", " await google.colab.output.renderOutput(dataTable, element);\n", " const docLink = document.createElement('div');\n", " docLink.innerHTML = docLinkHtml;\n", " element.appendChild(docLink);\n", " }\n", " </script>\n", " </div>\n", "\n", "\n", "<div id=\"df-bbbcc6fb-fa7e-4614-9183-fbe19b4403dc\">\n", " <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-bbbcc6fb-fa7e-4614-9183-fbe19b4403dc')\"\n", " title=\"Suggest charts.\"\n", " style=\"display:none;\">\n", "\n", "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n", " width=\"24px\">\n", " <g>\n", " <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n", " </g>\n", "</svg>\n", " </button>\n", "\n", "<style>\n", " .colab-df-quickchart {\n", " --bg-color: #E8F0FE;\n", " --fill-color: #1967D2;\n", " --hover-bg-color: #E2EBFA;\n", " --hover-fill-color: #174EA6;\n", " --disabled-fill-color: #AAA;\n", " --disabled-bg-color: #DDD;\n", " }\n", "\n", " [theme=dark] .colab-df-quickchart {\n", " --bg-color: #3B4455;\n", " --fill-color: #D2E3FC;\n", " --hover-bg-color: #434B5C;\n", " --hover-fill-color: #FFFFFF;\n", " --disabled-bg-color: #3B4455;\n", " --disabled-fill-color: #666;\n", " }\n", "\n", " .colab-df-quickchart {\n", " background-color: var(--bg-color);\n", " border: none;\n", " border-radius: 50%;\n", " cursor: pointer;\n", " display: none;\n", " fill: var(--fill-color);\n", " height: 32px;\n", " padding: 0;\n", " width: 32px;\n", " }\n", "\n", " .colab-df-quickchart:hover {\n", " background-color: var(--hover-bg-color);\n", " box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n", " fill: var(--button-hover-fill-color);\n", " }\n", "\n", " .colab-df-quickchart-complete:disabled,\n", " .colab-df-quickchart-complete:disabled:hover {\n", " background-color: var(--disabled-bg-color);\n", " fill: var(--disabled-fill-color);\n", " box-shadow: none;\n", " }\n", "\n", " .colab-df-spinner {\n", " border: 2px solid var(--fill-color);\n", " border-color: transparent;\n", " border-bottom-color: var(--fill-color);\n", " animation:\n", " spin 1s steps(1) infinite;\n", " }\n", "\n", " @keyframes spin {\n", " 0% {\n", " border-color: transparent;\n", " border-bottom-color: var(--fill-color);\n", " border-left-color: var(--fill-color);\n", " }\n", " 20% {\n", " border-color: transparent;\n", " border-left-color: var(--fill-color);\n", " border-top-color: var(--fill-color);\n", " }\n", " 30% {\n", " border-color: transparent;\n", " border-left-color: var(--fill-color);\n", " border-top-color: var(--fill-color);\n", " border-right-color: var(--fill-color);\n", " }\n", " 40% {\n", " border-color: transparent;\n", " border-right-color: var(--fill-color);\n", " border-top-color: var(--fill-color);\n", " }\n", " 60% {\n", " border-color: transparent;\n", " border-right-color: var(--fill-color);\n", " }\n", " 80% {\n", " border-color: transparent;\n", " border-right-color: var(--fill-color);\n", " border-bottom-color: var(--fill-color);\n", " }\n", " 90% {\n", " border-color: transparent;\n", " border-bottom-color: var(--fill-color);\n", " }\n", " }\n", "</style>\n", "\n", " <script>\n", " async function quickchart(key) {\n", " const quickchartButtonEl =\n", " document.querySelector('#' + key + ' button');\n", " quickchartButtonEl.disabled = true; // To prevent multiple clicks.\n", " quickchartButtonEl.classList.add('colab-df-spinner');\n", " try {\n", " const charts = await google.colab.kernel.invokeFunction(\n", " 'suggestCharts', [key], {});\n", " } catch (error) {\n", " console.error('Error during call to suggestCharts:', error);\n", " }\n", " quickchartButtonEl.classList.remove('colab-df-spinner');\n", " quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n", " }\n", " (() => {\n", " let quickchartButtonEl =\n", " document.querySelector('#df-bbbcc6fb-fa7e-4614-9183-fbe19b4403dc button');\n", " quickchartButtonEl.style.display =\n", " google.colab.kernel.accessAllowed ? 'block' : 'none';\n", " })();\n", " </script>\n", "</div>\n", " </div>\n", " </div>\n" ] }, "metadata": {}, "execution_count": 18 } ] }, { "cell_type": "markdown", "source": [ "## Part B - Review of NumPy\n", "\n", "Let us now review numpy." ], "metadata": { "id": "jVxrL9Xd8uzi" } }, { "cell_type": "markdown", "source": [ "**1. Create a 2-dimensional NumPy array with 3 rows and 4 columns, filled with random integers between 0 and 100 (inclusive).**" ], "metadata": { "id": "NSm0sKHH8y31" } }, { "cell_type": "code", "source": [ "array_2d = np.random.randint(0,101,(3,4))\n", "print(array_2d)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "wP8aUQ_s8yFp", "outputId": "4118d1c1-b617-4850-f264-047a79a8105c" }, "execution_count": 19, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "[[ 23 24 38 91]\n", " [100 92 80 54]\n", " [ 76 98 64 31]]\n" ] } ] }, { "cell_type": "markdown", "source": [ "**2. Generate an array of 20 random integers between 1 and 100 (inclusive). Calculate the mean, median, and mode of the generated array. Create a new array containing 100 random samples from a standard normal distribution (mean=0, std=1).**" ], "metadata": { "id": "mSo-9GZV87g1" } }, { "cell_type": "code", "source": [ "random_integers = np.random.randint(1,101, 20)\n", "\n", "mean_sample = np.mean(random_integers)\n", "median_sample = np.median(random_integers)\n", "unique_values, counts = np.unique(random_integers, return_counts = True)\n", "mode_sample = unique_values[counts.argmax()]\n", "\n", "standard_normal_samples = np.random.normal(0,1,100)\n", "\n", "print(random_integers)\n", "print(\"Mean:\", mean_sample)\n", "print(\"Median:\", median_sample)\n", "print(\"Mode:\", mode_sample)\n", "print(standard_normal_samples[:10])\n" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "joYkt9-k89ry", "outputId": "2db74cfe-adf5-46af-a4db-6894c7ab634b" }, "execution_count": 20, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "[ 21 62 55 50 69 30 47 15 75 6 37 93 94 58 39 26 100 40\n", " 92 50]\n", "Mean: 52.95\n", "Median: 50.0\n", "Mode: 50\n", "[ 0.04254721 -1.38797692 -0.23088044 0.27202176 0.93998854 1.43665346\n", " -0.47827033 -1.07138487 1.03319647 -0.78915187]\n" ] } ] }, { "cell_type": "markdown", "source": [ "**3. Create an array of 10 unique integers between 1 and 20 (inclusive) Sample 5 random integers from this array, allowing for replacement. Calculate the mean and standard deviation of the sampled integers.**" ], "metadata": { "id": "nOSnUcsh9AIB" } }, { "cell_type": "code", "source": [ "unique_integers = np.unique(np.random.randint(1,21,10))\n", "\n", "sample_with_replacement = np.random.choice(unique_integers, 5, replace = True)\n", "\n", "mean_sample_replacement = np.mean(sample_with_replacement)\n", "std_dev_sample_replacement = np.std(sample_with_replacement)\n", "\n", "print(unique_integers)\n", "print(sample_with_replacement)\n", "print(\"Mean:\", mean_sample_replacement)\n", "print(\"Standard Deviation:\", std_dev_sample_replacement)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "iMPWjo969CkJ", "outputId": "82d5d471-0c18-4eef-945c-e4e744cddb31" }, "execution_count": 21, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "[ 1 3 4 5 6 13 14 16]\n", "[14 6 3 4 3]\n", "Mean: 6.0\n", "Standard Deviation: 4.147288270665544\n" ] } ] }, { "cell_type": "markdown", "source": [ "**4. Create an array of 15 random integers between 1 and 10 (inclusive). Sort the array in ascending order. Find the unique values in the sorted array.**" ], "metadata": { "id": "UKdfUmnV9HXO" } }, { "cell_type": "code", "source": [ "array_15_random = np.random.randint(1,11,15)\n", "\n", "sorted_array = np.sort(array_15_random)\n", "\n", "unique_sorted_values = np.unique(sorted_array)\n", "\n", "print(array_15_random)\n", "print(sorted_array)\n", "print(unique_sorted_values)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "Rb3FfBRL9H2h", "outputId": "28074627-19d8-400c-a534-1ef40db80222" }, "execution_count": 22, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "[ 9 8 6 10 1 6 7 7 1 7 9 8 7 7 9]\n", "[ 1 1 6 6 7 7 7 7 7 8 8 9 9 9 10]\n", "[ 1 6 7 8 9 10]\n" ] } ] }, { "cell_type": "markdown", "source": [ "## Part C - Sampling and Hypothesis testing\n", "\n", "In this part, we will work with the `tips` dataset that can be downloaded from the `seaborn` library.\n", "\n", "The \"tips\" dataset contains information about restaurant tips, including total bills, tips, and other relevant details." ], "metadata": { "id": "gFjCQNBnfDZX" } }, { "cell_type": "code", "source": [ "#downloading the dataset into a dataframe\n", "tips = sns.load_dataset('tips')\n", "tips.head()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 206 }, "id": "rcg4FQdgwVQO", "outputId": "ec0583c3-da30-48ce-fd49-b56d70bfc480" }, "execution_count": 23, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " total_bill tip sex smoker day time size\n", "0 16.99 1.01 Female No Sun Dinner 2\n", "1 10.34 1.66 Male No Sun Dinner 3\n", "2 21.01 3.50 Male No Sun Dinner 3\n", "3 23.68 3.31 Male No Sun Dinner 2\n", "4 24.59 3.61 Female No Sun Dinner 4" ], "text/html": [ "\n", " <div id=\"df-190e1aee-7e18-43ca-8639-ae2d7e873361\" class=\"colab-df-container\">\n", " <div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>total_bill</th>\n", " <th>tip</th>\n", " <th>sex</th>\n", " <th>smoker</th>\n", " <th>day</th>\n", " <th>time</th>\n", " <th>size</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>16.99</td>\n", " <td>1.01</td>\n", " <td>Female</td>\n", " <td>No</td>\n", " <td>Sun</td>\n", " <td>Dinner</td>\n", " <td>2</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>10.34</td>\n", " <td>1.66</td>\n", " <td>Male</td>\n", " <td>No</td>\n", " <td>Sun</td>\n", " <td>Dinner</td>\n", " <td>3</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>21.01</td>\n", " <td>3.50</td>\n", " <td>Male</td>\n", " <td>No</td>\n", " <td>Sun</td>\n", " <td>Dinner</td>\n", " <td>3</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>23.68</td>\n", " <td>3.31</td>\n", " <td>Male</td>\n", " <td>No</td>\n", " <td>Sun</td>\n", " <td>Dinner</td>\n", " <td>2</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>24.59</td>\n", " <td>3.61</td>\n", " <td>Female</td>\n", " <td>No</td>\n", " <td>Sun</td>\n", " <td>Dinner</td>\n", " <td>4</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>\n", " <div class=\"colab-df-buttons\">\n", "\n", " <div class=\"colab-df-container\">\n", " <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-190e1aee-7e18-43ca-8639-ae2d7e873361')\"\n", " title=\"Convert this dataframe to an interactive table.\"\n", " style=\"display:none;\">\n", "\n", " <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n", " <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n", " </svg>\n", " </button>\n", "\n", " <style>\n", " .colab-df-container {\n", " display:flex;\n", " gap: 12px;\n", " }\n", "\n", " .colab-df-convert {\n", " background-color: #E8F0FE;\n", " border: none;\n", " border-radius: 50%;\n", " cursor: pointer;\n", " display: none;\n", " fill: #1967D2;\n", " height: 32px;\n", " padding: 0 0 0 0;\n", " width: 32px;\n", " }\n", "\n", " .colab-df-convert:hover {\n", " background-color: #E2EBFA;\n", " box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n", " fill: #174EA6;\n", " }\n", "\n", " .colab-df-buttons div {\n", " margin-bottom: 4px;\n", " }\n", "\n", " [theme=dark] .colab-df-convert {\n", " background-color: #3B4455;\n", " fill: #D2E3FC;\n", " }\n", "\n", " [theme=dark] .colab-df-convert:hover {\n", " background-color: #434B5C;\n", " box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n", " filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n", " fill: #FFFFFF;\n", " }\n", " </style>\n", "\n", " <script>\n", " const buttonEl =\n", " document.querySelector('#df-190e1aee-7e18-43ca-8639-ae2d7e873361 button.colab-df-convert');\n", " buttonEl.style.display =\n", " google.colab.kernel.accessAllowed ? 'block' : 'none';\n", "\n", " async function convertToInteractive(key) {\n", " const element = document.querySelector('#df-190e1aee-7e18-43ca-8639-ae2d7e873361');\n", " const dataTable =\n", " await google.colab.kernel.invokeFunction('convertToInteractive',\n", " [key], {});\n", " if (!dataTable) return;\n", "\n", " const docLinkHtml = 'Like what you see? Visit the ' +\n", " '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n", " + ' to learn more about interactive tables.';\n", " element.innerHTML = '';\n", " dataTable['output_type'] = 'display_data';\n", " await google.colab.output.renderOutput(dataTable, element);\n", " const docLink = document.createElement('div');\n", " docLink.innerHTML = docLinkHtml;\n", " element.appendChild(docLink);\n", " }\n", " </script>\n", " </div>\n", "\n", "\n", "<div id=\"df-5be367d3-7746-423a-8ac0-72da0fab7eb7\">\n", " <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-5be367d3-7746-423a-8ac0-72da0fab7eb7')\"\n", " title=\"Suggest charts.\"\n", " style=\"display:none;\">\n", "\n", "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n", " width=\"24px\">\n", " <g>\n", " <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n", " </g>\n", "</svg>\n", " </button>\n", "\n", "<style>\n", " .colab-df-quickchart {\n", " --bg-color: #E8F0FE;\n", " --fill-color: #1967D2;\n", " --hover-bg-color: #E2EBFA;\n", " --hover-fill-color: #174EA6;\n", " --disabled-fill-color: #AAA;\n", " --disabled-bg-color: #DDD;\n", " }\n", "\n", " [theme=dark] .colab-df-quickchart {\n", " --bg-color: #3B4455;\n", " --fill-color: #D2E3FC;\n", " --hover-bg-color: #434B5C;\n", " --hover-fill-color: #FFFFFF;\n", " --disabled-bg-color: #3B4455;\n", " --disabled-fill-color: #666;\n", " }\n", "\n", " .colab-df-quickchart {\n", " background-color: var(--bg-color);\n", " border: none;\n", " border-radius: 50%;\n", " cursor: pointer;\n", " display: none;\n", " fill: var(--fill-color);\n", " height: 32px;\n", " padding: 0;\n", " width: 32px;\n", " }\n", "\n", " .colab-df-quickchart:hover {\n", " background-color: var(--hover-bg-color);\n", " box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n", " fill: var(--button-hover-fill-color);\n", " }\n", "\n", " .colab-df-quickchart-complete:disabled,\n", " .colab-df-quickchart-complete:disabled:hover {\n", " background-color: var(--disabled-bg-color);\n", " fill: var(--disabled-fill-color);\n", " box-shadow: none;\n", " }\n", "\n", " .colab-df-spinner {\n", " border: 2px solid var(--fill-color);\n", " border-color: transparent;\n", " border-bottom-color: var(--fill-color);\n", " animation:\n", " spin 1s steps(1) infinite;\n", " }\n", "\n", " @keyframes spin {\n", " 0% {\n", " border-color: transparent;\n", " border-bottom-color: var(--fill-color);\n", " border-left-color: var(--fill-color);\n", " }\n", " 20% {\n", " border-color: transparent;\n", " border-left-color: var(--fill-color);\n", " border-top-color: var(--fill-color);\n", " }\n", " 30% {\n", " border-color: transparent;\n", " border-left-color: var(--fill-color);\n", " border-top-color: var(--fill-color);\n", " border-right-color: var(--fill-color);\n", " }\n", " 40% {\n", " border-color: transparent;\n", " border-right-color: var(--fill-color);\n", " border-top-color: var(--fill-color);\n", " }\n", " 60% {\n", " border-color: transparent;\n", " border-right-color: var(--fill-color);\n", " }\n", " 80% {\n", " border-color: transparent;\n", " border-right-color: var(--fill-color);\n", " border-bottom-color: var(--fill-color);\n", " }\n", " 90% {\n", " border-color: transparent;\n", " border-bottom-color: var(--fill-color);\n", " }\n", " }\n", "</style>\n", "\n", " <script>\n", " async function quickchart(key) {\n", " const quickchartButtonEl =\n", " document.querySelector('#' + key + ' button');\n", " quickchartButtonEl.disabled = true; // To prevent multiple clicks.\n", " quickchartButtonEl.classList.add('colab-df-spinner');\n", " try {\n", " const charts = await google.colab.kernel.invokeFunction(\n", " 'suggestCharts', [key], {});\n", " } catch (error) {\n", " console.error('Error during call to suggestCharts:', error);\n", " }\n", " quickchartButtonEl.classList.remove('colab-df-spinner');\n", " quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n", " }\n", " (() => {\n", " let quickchartButtonEl =\n", " document.querySelector('#df-5be367d3-7746-423a-8ac0-72da0fab7eb7 button');\n", " quickchartButtonEl.style.display =\n", " google.colab.kernel.accessAllowed ? 'block' : 'none';\n", " })();\n", " </script>\n", "</div>\n", " </div>\n", " </div>\n" ] }, "metadata": {}, "execution_count": 23 } ] }, { "cell_type": "code", "source": [ "print(tips.shape)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "2rPv5olHtHv7", "outputId": "9a85e329-8c61-4308-f21b-b3659cbc6191" }, "execution_count": 24, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "(244, 7)\n" ] } ] }, { "cell_type": "markdown", "source": [ "**1. You want to analyze the tipping behavior of a sample of customers. Perform simple random sampling to select a sample of 30 customers from the \"tips\" dataset. Calculate the sample mean and sample variance of the \"tip\" column for this sample.**" ], "metadata": { "id": "kqi0F-DZw3tX" } }, { "cell_type": "code", "source": [ "sample_size = 30\n", "sample = tips.sample(n = 30, random_state = 42)" ], "metadata": { "id": "EEHL4DaOw3C6" }, "execution_count": 25, "outputs": [] }, { "cell_type": "code", "source": [ "sample_mean = sample['tip'].mean()\n", "sample_variance = sample['tip'].var()" ], "metadata": { "id": "9wcN7RZ6x4XM" }, "execution_count": 26, "outputs": [] }, { "cell_type": "code", "source": [ "print(\"Sample Mean:\", sample_mean)\n", "print(\"Sample Variance:\", sample_variance)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "E25a6rpsx6rB", "outputId": "77c4f0c6-0b57-46af-f601-0fcabb7bd0fd" }, "execution_count": 27, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Sample Mean: 2.832333333333333\n", "Sample Variance: 1.3560667816091954\n" ] } ] }, { "cell_type": "markdown", "source": [ "**2. You suspect that the tipping behavior varies by the day of the week. Perform stratified sampling to select a sample of 20 customers from each day (Thursday, Friday, Saturday, and Sunday) from the \"tips\" dataset. Calculate the sample mean and sample variance of the \"tip\" column for each stratum.**" ], "metadata": { "id": "SScNR11Lwxb5" } }, { "cell_type": "code", "source": [ "#initialize an empty list to store the sample means and variances for each stratum\n", "stratum_sample_statistics = []" ], "metadata": { "id": "75C1J4Hbyv_U" }, "execution_count": 29, "outputs": [] }, { "cell_type": "code", "source": [ "#sampling\n", "days_of_week = ['Thur', 'Fri', 'Sat', 'Sun']\n", "sample_size_per_stratum = 20\n", "\n", "for day in days_of_week:\n", " stratum = tips[tips['day'] == day]\n", " stratum_sample = stratum.sample(n=20, random_state = 42, replace = True)\n", " stratum_mean = stratum_sample['tip'].mean()\n", " stratum_variance = stratum_sample['tip'].var()\n", " stratum_sample_statistics.append({\n", " 'Day': day,\n", " 'Sample Mean': stratum_mean,\n", " 'Sample Variance': stratum_variance\n", " })" ], "metadata": { "id": "b4YcEMR0y0SY" }, "execution_count": 30, "outputs": [] }, { "cell_type": "code", "source": [ "for stats in stratum_sample_statistics:\n", " print(f\"Day: {stats['Day']}, Sample Mean: {stats['Sample Mean']}, Sample Variance: {stats['Sample Variance']}\")" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "iGEV1lldwxPN", "outputId": "a7196c9b-41c4-454d-c369-0dcfc93738af" }, "execution_count": 31, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Day: Thur, Sample Mean: 2.894, Sample Variance: 1.6887410526315791\n", "Day: Fri, Sample Mean: 2.7265, Sample Variance: 1.1247186842105263\n", "Day: Sat, Sample Mean: 3.2269999999999994, Sample Variance: 3.2082957894736843\n", "Day: Sun, Sample Mean: 2.9415, Sample Variance: 0.9116028947368421\n" ] } ] }, { "cell_type": "markdown", "source": [ "**3. You want to test whether the average total bill in the dataset is significantly different (greater) from $20. Perform a one-tail t-test to answer this question.**" ], "metadata": { "id": "paeQR9zizE3O" } }, { "cell_type": "code", "source": [ "# Null hypothesis (H0) = bill = 20\n", "# Alternative hypothesis (H1): the average total bill in the dataset is significantly different (greater) from $20\n", "\n", "alpha = 0.05\n", "\n", "# We will use a one-tail t-test because we are testing if the average is greater than $20.\n", "t_statistic, p_value = stats.ttest_1samp(tips['total_bill'], popmean = 20, alternative = 'greater')\n", "\n", "if p_value < alpha:\n", " print(\"Reject the null hypothesis. The average total bill is significantly greater than $20.\")\n", "else:\n", " print(\"Fail to reject the null hypothesis. There is no significant evidence that the average total bill is greater than $20.\")\n", "\n", "print(\"t-statistic:\", t_statistic)\n", "print(\"p-value:\", p_value)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "UT2KgHTUzPkz", "outputId": "2afe831b-fd15-49a0-d340-291774b84ef1" }, "execution_count": 33, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Fail to reject the null hypothesis. There is no significant evidence that the average total bill is greater than $20.\n", "t-statistic: -0.37559294451919506\n", "p-value: 0.646226403218664\n" ] } ] }, { "cell_type": "markdown", "source": [ "**4. You suspect that there is a significant difference in the total bill between lunch and dinner. Perform a two-tail t-test to test this hypothesis.**" ], "metadata": { "id": "9QRGjc1PzgLU" } }, { "cell_type": "code", "source": [ "#Separate the total bill data into two groups: lunch and dinner\n", "total_bill_lunch = tips[tips['time']=='Lunch']['total_bill']\n", "total_bill_dinner = tips[tips['time']=='Dinner']['total_bill']\n", "\n", "#Define the null and alternative hypotheses\n", "# Null hypothesis (H0): bills lunch = bill dinner\n", "# Alternative hypothesis (H1): significant difference between the bills of lunch and dinner\n", "\n", "alpha = 0.05\n", "\n", "t_statistic, p_value = stats.ttest_ind(total_bill_lunch, total_bill_dinner, equal_var = False)\n", "\n", "if p_value < alpha:\n", " print(\"Reject the null hypothesis. There is a significant difference in total bill between lunch and dinner.\")\n", "else:\n", " print(\"Fail to reject the null hypothesis. There is no significant difference in total bill between lunch and dinner.\")\n", "\n", "print(\"t-statistic:\", t_statistic)\n", "print(\"p-value:\", p_value)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "pA20ZP6hzkJV", "outputId": "df968596-094e-4e93-d5d2-49cab568d7a4" }, "execution_count": 34, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Reject the null hypothesis. There is a significant difference in total bill between lunch and dinner.\n", "t-statistic: -3.122986183296264\n", "p-value: 0.0021665735148038933\n" ] } ] }, { "cell_type": "markdown", "source": [ "**5. You are interested in estimating the mean total bill amount for all the restaurant visits in the \"tips\" dataset. Create a 95% confidence interval for this estimate.**\n", "\n", "a. Calculate the sample mean and sample standard deviation for the total bill amounts." ], "metadata": { "id": "vmyPGGUqt1pN" } }, { "cell_type": "code", "source": [ "sample_mean = tips['total_bill'].mean()\n", "sample_std = tips['total_bill'].std()\n", "\n", "print(\"Sample Mean:\", sample_mean)\n", "print(\"Sample Standard Deviation:\", sample_std)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "Sj2NaZ6Oz4lU", "outputId": "2258823a-7860-447c-fd13-c75c69dac859" }, "execution_count": 35, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Sample Mean: 19.78594262295082\n", "Sample Standard Deviation: 8.902411954856856\n" ] } ] }, { "cell_type": "markdown", "source": [ "b. Determine the margin of error for a 95% confidence interval.\n", "\n", "The margin of error for a 95% confidence interval can be determined using the formula:\n", "\n", "Margin of Error = (Critical Value) * (Standard Deviation / √Sample Size)" ], "metadata": { "id": "iuNXkK-Vz_zE" } }, { "cell_type": "code", "source": [ "# Set the confidence level and find the critical value (z-value for a normal distribution)\n", "confidence_level = 0.95\n", "alpha = 1 - confidence_level\n", "critical_value = stats.norm.pdf(1-alpha/2)\n", "\n", "# Set the sample size\n", "sample_size = len(tips)\n", "\n", "# Calculate the margin of error\n", "margin_of_error = critical_value * (sample_std/np.sqrt(sample_size))\n", "\n", "print(\"Margin of Error:\", margin_of_error)\n" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "pD0dKKR2z4jG", "outputId": "cb2262fc-5eaf-4c4e-cc09-ccae5f7b6b70" }, "execution_count": 37, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Margin of Error: 0.14135046577410887\n" ] } ] }, { "cell_type": "markdown", "source": [ "c. Calculate the lower and upper bounds of the confidence interval." ], "metadata": { "id": "IBERLwiuAwIl" } }, { "cell_type": "code", "source": [ "# Calculate the lower and upper bounds of the confidence interval\n", "lower_bound = sample_mean - margin_of_error\n", "upper_bound = sample_mean + margin_of_error\n", "\n", "print(\"Confidence Interval (95%): [{:.2f}, {:.2f}]\".format(lower_bound, upper_bound))\n" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "fdfJ-FaTz4g8", "outputId": "5e130e1e-362a-4212-91b5-cdaa8261f7e4" }, "execution_count": 38, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Confidence Interval (95%): [19.64, 19.93]\n" ] } ] }, { "cell_type": "code", "source": [], "metadata": { "id": "PU9x__AgNYOb" }, "execution_count": null, "outputs": [] } ] }