{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": []
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "source": [
        "# DS 122 Discussion 8"
      ],
      "metadata": {
        "id": "eUStcwQUhsRa"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "In this discussion, we will review the different concepts we have discussed in this course. This discussion contains 3 parts:\n",
        "- EDA using pandas on a dataset\n",
        "- Sampling using numpy\n",
        "- Sampling and Hypothesis testing on a dataset"
      ],
      "metadata": {
        "id": "27i57wCLcRcw"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "#importing libraries\n",
        "import pandas as pd\n",
        "import numpy as np\n",
        "import seaborn as sns\n",
        "import scipy.stats as stats\n",
        "import matplotlib.pyplot as plt"
      ],
      "metadata": {
        "id": "ezfpLVrZ-D7J"
      },
      "execution_count": 32,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Part A: Review of pandas\n",
        "\n",
        "In this section, we will analyze data about players from the UCL. UCL is a football (soccer) league based in Europe. It has various Clubs (or teams) like Liverpool, Manchester United, Real Madrid and Bayern.\n",
        "\n",
        "You all must have heard about names like Ronaldo or Messi. This dataset contains the names of more players.\n",
        "\n",
        "The dataset is taken from Kaggle and can be found at the following link: https://www.kaggle.com/datasets/azminetoushikwasi/ucl-202122-uefa-champions-league?select=attacking.csv.\n",
        "\n",
        "We will begin working with files called `attacking.csv` and `goals.csv` to find out players that are good at attacking."
      ],
      "metadata": {
        "id": "vouobfXXkIFW"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Loading the dataset\n",
        "\n",
        "The first step of a complete data analysis project is to acquire the data. Let's do this by loading the CSV files given to us."
      ],
      "metadata": {
        "id": "UP0PNRzctXhG"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "attack = pd.read_csv('/content/attacking.csv')\n",
        "goals = pd.read_csv('/content/goals.csv')"
      ],
      "metadata": {
        "id": "yhRWQcqYpe9a"
      },
      "execution_count": 2,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "Let us now view our datasets to get an overview of the data."
      ],
      "metadata": {
        "id": "m1g4Ib3E0b5U"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "#view attack\n",
        "attack.head()"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 206
        },
        "id": "IDL7Ko2DphnH",
        "outputId": "c2a294ab-bbb2-4501-897f-5a5b01c7c47a"
      },
      "execution_count": 3,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "   serial       player_name         club    position  assists  corner_taken  \\\n",
              "0       1   Bruno Fernandes  Man. United  Midfielder        7            10   \n",
              "1       2   Vinícius Júnior  Real Madrid     Forward        6             3   \n",
              "2       2              Sané       Bayern  Midfielder        6             3   \n",
              "3       4            Antony         Ajax     Forward        5             3   \n",
              "4       5  Alexander-Arnold    Liverpool    Defender        4            36   \n",
              "\n",
              "   offsides  dribbles  match_played  \n",
              "0         2         7             7  \n",
              "1         4        83            13  \n",
              "2         3        32            10  \n",
              "3         4        28             7  \n",
              "4         0         9             9  "
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-b5d3a1c4-f92e-4b50-8611-362dfab59743\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>serial</th>\n",
              "      <th>player_name</th>\n",
              "      <th>club</th>\n",
              "      <th>position</th>\n",
              "      <th>assists</th>\n",
              "      <th>corner_taken</th>\n",
              "      <th>offsides</th>\n",
              "      <th>dribbles</th>\n",
              "      <th>match_played</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>1</td>\n",
              "      <td>Bruno Fernandes</td>\n",
              "      <td>Man. United</td>\n",
              "      <td>Midfielder</td>\n",
              "      <td>7</td>\n",
              "      <td>10</td>\n",
              "      <td>2</td>\n",
              "      <td>7</td>\n",
              "      <td>7</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>2</td>\n",
              "      <td>Vinícius Júnior</td>\n",
              "      <td>Real Madrid</td>\n",
              "      <td>Forward</td>\n",
              "      <td>6</td>\n",
              "      <td>3</td>\n",
              "      <td>4</td>\n",
              "      <td>83</td>\n",
              "      <td>13</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>2</td>\n",
              "      <td>Sané</td>\n",
              "      <td>Bayern</td>\n",
              "      <td>Midfielder</td>\n",
              "      <td>6</td>\n",
              "      <td>3</td>\n",
              "      <td>3</td>\n",
              "      <td>32</td>\n",
              "      <td>10</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>4</td>\n",
              "      <td>Antony</td>\n",
              "      <td>Ajax</td>\n",
              "      <td>Forward</td>\n",
              "      <td>5</td>\n",
              "      <td>3</td>\n",
              "      <td>4</td>\n",
              "      <td>28</td>\n",
              "      <td>7</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>5</td>\n",
              "      <td>Alexander-Arnold</td>\n",
              "      <td>Liverpool</td>\n",
              "      <td>Defender</td>\n",
              "      <td>4</td>\n",
              "      <td>36</td>\n",
              "      <td>0</td>\n",
              "      <td>9</td>\n",
              "      <td>9</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-b5d3a1c4-f92e-4b50-8611-362dfab59743')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-b5d3a1c4-f92e-4b50-8611-362dfab59743 button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-b5d3a1c4-f92e-4b50-8611-362dfab59743');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "\n",
              "<div id=\"df-d28b4157-1cbd-43cc-a3bb-353892ff2939\">\n",
              "  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-d28b4157-1cbd-43cc-a3bb-353892ff2939')\"\n",
              "            title=\"Suggest charts.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "     width=\"24px\">\n",
              "    <g>\n",
              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
              "    </g>\n",
              "</svg>\n",
              "  </button>\n",
              "\n",
              "<style>\n",
              "  .colab-df-quickchart {\n",
              "      --bg-color: #E8F0FE;\n",
              "      --fill-color: #1967D2;\n",
              "      --hover-bg-color: #E2EBFA;\n",
              "      --hover-fill-color: #174EA6;\n",
              "      --disabled-fill-color: #AAA;\n",
              "      --disabled-bg-color: #DDD;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart {\n",
              "      --bg-color: #3B4455;\n",
              "      --fill-color: #D2E3FC;\n",
              "      --hover-bg-color: #434B5C;\n",
              "      --hover-fill-color: #FFFFFF;\n",
              "      --disabled-bg-color: #3B4455;\n",
              "      --disabled-fill-color: #666;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart {\n",
              "    background-color: var(--bg-color);\n",
              "    border: none;\n",
              "    border-radius: 50%;\n",
              "    cursor: pointer;\n",
              "    display: none;\n",
              "    fill: var(--fill-color);\n",
              "    height: 32px;\n",
              "    padding: 0;\n",
              "    width: 32px;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart:hover {\n",
              "    background-color: var(--hover-bg-color);\n",
              "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "    fill: var(--button-hover-fill-color);\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart-complete:disabled,\n",
              "  .colab-df-quickchart-complete:disabled:hover {\n",
              "    background-color: var(--disabled-bg-color);\n",
              "    fill: var(--disabled-fill-color);\n",
              "    box-shadow: none;\n",
              "  }\n",
              "\n",
              "  .colab-df-spinner {\n",
              "    border: 2px solid var(--fill-color);\n",
              "    border-color: transparent;\n",
              "    border-bottom-color: var(--fill-color);\n",
              "    animation:\n",
              "      spin 1s steps(1) infinite;\n",
              "  }\n",
              "\n",
              "  @keyframes spin {\n",
              "    0% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "      border-left-color: var(--fill-color);\n",
              "    }\n",
              "    20% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    30% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    40% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    60% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    80% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "    90% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "  }\n",
              "</style>\n",
              "\n",
              "  <script>\n",
              "    async function quickchart(key) {\n",
              "      const quickchartButtonEl =\n",
              "        document.querySelector('#' + key + ' button');\n",
              "      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
              "      quickchartButtonEl.classList.add('colab-df-spinner');\n",
              "      try {\n",
              "        const charts = await google.colab.kernel.invokeFunction(\n",
              "            'suggestCharts', [key], {});\n",
              "      } catch (error) {\n",
              "        console.error('Error during call to suggestCharts:', error);\n",
              "      }\n",
              "      quickchartButtonEl.classList.remove('colab-df-spinner');\n",
              "      quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
              "    }\n",
              "    (() => {\n",
              "      let quickchartButtonEl =\n",
              "        document.querySelector('#df-d28b4157-1cbd-43cc-a3bb-353892ff2939 button');\n",
              "      quickchartButtonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "    })();\n",
              "  </script>\n",
              "</div>\n",
              "    </div>\n",
              "  </div>\n"
            ]
          },
          "metadata": {},
          "execution_count": 3
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "#print attack's size\n",
        "attack.shape"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "2-F095RPaCA9",
        "outputId": "5a83d359-0d15-4f97-9649-1499336cd65c"
      },
      "execution_count": 4,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "(176, 9)"
            ]
          },
          "metadata": {},
          "execution_count": 4
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "#view goals\n",
        "goals.head()"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 206
        },
        "id": "coMjcjTP0hxm",
        "outputId": "090875bd-29ca-46f4-ca76-1b7c16d2ea5b"
      },
      "execution_count": 5,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "   serial  player_name         club    position  goals  right_foot  left_foot  \\\n",
              "0       1      Benzema  Real Madrid     Forward     15          11          1   \n",
              "1       2  Lewandowski       Bayern     Forward     13           8          3   \n",
              "2       3       Haller         Ajax     Forward     11           3          4   \n",
              "3       4        Salah    Liverpool     Forward      8           0          8   \n",
              "4       5       Nkunku      Leipzig  Midfielder      7           3          1   \n",
              "\n",
              "   headers  others  inside_area  outside_areas  penalties  match_played  \n",
              "0        3       0           13              2          3            12  \n",
              "1        1       1           13              0          3            10  \n",
              "2        3       1           11              0          1             8  \n",
              "3        0       0            7              1          1            13  \n",
              "4        3       0            7              0          0             6  "
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-f05daf5c-43c1-4ff4-9938-7f890d679d76\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>serial</th>\n",
              "      <th>player_name</th>\n",
              "      <th>club</th>\n",
              "      <th>position</th>\n",
              "      <th>goals</th>\n",
              "      <th>right_foot</th>\n",
              "      <th>left_foot</th>\n",
              "      <th>headers</th>\n",
              "      <th>others</th>\n",
              "      <th>inside_area</th>\n",
              "      <th>outside_areas</th>\n",
              "      <th>penalties</th>\n",
              "      <th>match_played</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>1</td>\n",
              "      <td>Benzema</td>\n",
              "      <td>Real Madrid</td>\n",
              "      <td>Forward</td>\n",
              "      <td>15</td>\n",
              "      <td>11</td>\n",
              "      <td>1</td>\n",
              "      <td>3</td>\n",
              "      <td>0</td>\n",
              "      <td>13</td>\n",
              "      <td>2</td>\n",
              "      <td>3</td>\n",
              "      <td>12</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>2</td>\n",
              "      <td>Lewandowski</td>\n",
              "      <td>Bayern</td>\n",
              "      <td>Forward</td>\n",
              "      <td>13</td>\n",
              "      <td>8</td>\n",
              "      <td>3</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>13</td>\n",
              "      <td>0</td>\n",
              "      <td>3</td>\n",
              "      <td>10</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>3</td>\n",
              "      <td>Haller</td>\n",
              "      <td>Ajax</td>\n",
              "      <td>Forward</td>\n",
              "      <td>11</td>\n",
              "      <td>3</td>\n",
              "      <td>4</td>\n",
              "      <td>3</td>\n",
              "      <td>1</td>\n",
              "      <td>11</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>8</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>4</td>\n",
              "      <td>Salah</td>\n",
              "      <td>Liverpool</td>\n",
              "      <td>Forward</td>\n",
              "      <td>8</td>\n",
              "      <td>0</td>\n",
              "      <td>8</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>7</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>13</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>5</td>\n",
              "      <td>Nkunku</td>\n",
              "      <td>Leipzig</td>\n",
              "      <td>Midfielder</td>\n",
              "      <td>7</td>\n",
              "      <td>3</td>\n",
              "      <td>1</td>\n",
              "      <td>3</td>\n",
              "      <td>0</td>\n",
              "      <td>7</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>6</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-f05daf5c-43c1-4ff4-9938-7f890d679d76')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-f05daf5c-43c1-4ff4-9938-7f890d679d76 button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-f05daf5c-43c1-4ff4-9938-7f890d679d76');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "\n",
              "<div id=\"df-25308461-65a3-4836-817d-ea597ffe8612\">\n",
              "  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-25308461-65a3-4836-817d-ea597ffe8612')\"\n",
              "            title=\"Suggest charts.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "     width=\"24px\">\n",
              "    <g>\n",
              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
              "    </g>\n",
              "</svg>\n",
              "  </button>\n",
              "\n",
              "<style>\n",
              "  .colab-df-quickchart {\n",
              "      --bg-color: #E8F0FE;\n",
              "      --fill-color: #1967D2;\n",
              "      --hover-bg-color: #E2EBFA;\n",
              "      --hover-fill-color: #174EA6;\n",
              "      --disabled-fill-color: #AAA;\n",
              "      --disabled-bg-color: #DDD;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart {\n",
              "      --bg-color: #3B4455;\n",
              "      --fill-color: #D2E3FC;\n",
              "      --hover-bg-color: #434B5C;\n",
              "      --hover-fill-color: #FFFFFF;\n",
              "      --disabled-bg-color: #3B4455;\n",
              "      --disabled-fill-color: #666;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart {\n",
              "    background-color: var(--bg-color);\n",
              "    border: none;\n",
              "    border-radius: 50%;\n",
              "    cursor: pointer;\n",
              "    display: none;\n",
              "    fill: var(--fill-color);\n",
              "    height: 32px;\n",
              "    padding: 0;\n",
              "    width: 32px;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart:hover {\n",
              "    background-color: var(--hover-bg-color);\n",
              "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "    fill: var(--button-hover-fill-color);\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart-complete:disabled,\n",
              "  .colab-df-quickchart-complete:disabled:hover {\n",
              "    background-color: var(--disabled-bg-color);\n",
              "    fill: var(--disabled-fill-color);\n",
              "    box-shadow: none;\n",
              "  }\n",
              "\n",
              "  .colab-df-spinner {\n",
              "    border: 2px solid var(--fill-color);\n",
              "    border-color: transparent;\n",
              "    border-bottom-color: var(--fill-color);\n",
              "    animation:\n",
              "      spin 1s steps(1) infinite;\n",
              "  }\n",
              "\n",
              "  @keyframes spin {\n",
              "    0% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "      border-left-color: var(--fill-color);\n",
              "    }\n",
              "    20% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    30% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    40% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    60% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    80% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "    90% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "  }\n",
              "</style>\n",
              "\n",
              "  <script>\n",
              "    async function quickchart(key) {\n",
              "      const quickchartButtonEl =\n",
              "        document.querySelector('#' + key + ' button');\n",
              "      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
              "      quickchartButtonEl.classList.add('colab-df-spinner');\n",
              "      try {\n",
              "        const charts = await google.colab.kernel.invokeFunction(\n",
              "            'suggestCharts', [key], {});\n",
              "      } catch (error) {\n",
              "        console.error('Error during call to suggestCharts:', error);\n",
              "      }\n",
              "      quickchartButtonEl.classList.remove('colab-df-spinner');\n",
              "      quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
              "    }\n",
              "    (() => {\n",
              "      let quickchartButtonEl =\n",
              "        document.querySelector('#df-25308461-65a3-4836-817d-ea597ffe8612 button');\n",
              "      quickchartButtonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "    })();\n",
              "  </script>\n",
              "</div>\n",
              "    </div>\n",
              "  </div>\n"
            ]
          },
          "metadata": {},
          "execution_count": 5
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "#print shape of goals\n",
        "goals.shape"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "GHMj4O-TaFD_",
        "outputId": "84a5b7d8-b74f-4b5a-faad-b7591779de8d"
      },
      "execution_count": 6,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "(183, 13)"
            ]
          },
          "metadata": {},
          "execution_count": 6
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "### EDA\n",
        "\n",
        "The next step of a data analysis project pipeline is to perform EDA or Exploratory Data Analysis. Our ultimate goal is to get a list of top 10 players from an offensive point of view.\n",
        "\n",
        "EDA includes cleaning the data. Think about the goal of the project. We are not interested in the foot a player used to score the goal. It would've been relevant if we were thinking about which side of the field we want the player to be in. But we just want to know who are good players.\n",
        "\n",
        "Keeping this in mind, let us first begin by cleaning the data."
      ],
      "metadata": {
        "id": "2rpZpnns0pWw"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "#printing the names of the columns in all the dataframes\n",
        "print(attack.columns)\n",
        "print(goals.columns)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "grsU8hC30mI2",
        "outputId": "e58e827c-1ca7-4991-a2fb-27b6030bb308"
      },
      "execution_count": 7,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Index(['serial', 'player_name', 'club', 'position', 'assists', 'corner_taken',\n",
            "       'offsides', 'dribbles', 'match_played'],\n",
            "      dtype='object')\n",
            "Index(['serial', 'player_name', 'club', 'position', 'goals', 'right_foot',\n",
            "       'left_foot', 'headers', 'others', 'inside_area', 'outside_areas',\n",
            "       'penalties', 'match_played'],\n",
            "      dtype='object')\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Player names, club, matches played and position are common in all the 4 datasets, so let's keep them.\n",
        "\n",
        "Let's first work with `attack`. Notice the different columns. They all provide some information on whether the player is good (assists etc) or bad (off-sides etc). So let's preserve them.\n",
        "\n",
        "We can delete columns like 'right_foot', 'left_foot', 'headers', 'others', 'inside_area', and 'outside_areas' from `goals`."
      ],
      "metadata": {
        "id": "eAhw6DLw3FkY"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "#deleting 'serial' from attack\n",
        "attack = attack.drop(columns = ['serial'])"
      ],
      "metadata": {
        "id": "IvYphzPQ8UNt"
      },
      "execution_count": 8,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "attack.head()"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 206
        },
        "id": "zKUQCmBa8sx4",
        "outputId": "354aebef-9787-426b-cbe6-dcf8d6d9f68e"
      },
      "execution_count": 9,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "        player_name         club    position  assists  corner_taken  offsides  \\\n",
              "0   Bruno Fernandes  Man. United  Midfielder        7            10         2   \n",
              "1   Vinícius Júnior  Real Madrid     Forward        6             3         4   \n",
              "2              Sané       Bayern  Midfielder        6             3         3   \n",
              "3            Antony         Ajax     Forward        5             3         4   \n",
              "4  Alexander-Arnold    Liverpool    Defender        4            36         0   \n",
              "\n",
              "   dribbles  match_played  \n",
              "0         7             7  \n",
              "1        83            13  \n",
              "2        32            10  \n",
              "3        28             7  \n",
              "4         9             9  "
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-c0fb41da-8287-4005-8c78-d295f8563a81\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>player_name</th>\n",
              "      <th>club</th>\n",
              "      <th>position</th>\n",
              "      <th>assists</th>\n",
              "      <th>corner_taken</th>\n",
              "      <th>offsides</th>\n",
              "      <th>dribbles</th>\n",
              "      <th>match_played</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>Bruno Fernandes</td>\n",
              "      <td>Man. United</td>\n",
              "      <td>Midfielder</td>\n",
              "      <td>7</td>\n",
              "      <td>10</td>\n",
              "      <td>2</td>\n",
              "      <td>7</td>\n",
              "      <td>7</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>Vinícius Júnior</td>\n",
              "      <td>Real Madrid</td>\n",
              "      <td>Forward</td>\n",
              "      <td>6</td>\n",
              "      <td>3</td>\n",
              "      <td>4</td>\n",
              "      <td>83</td>\n",
              "      <td>13</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>Sané</td>\n",
              "      <td>Bayern</td>\n",
              "      <td>Midfielder</td>\n",
              "      <td>6</td>\n",
              "      <td>3</td>\n",
              "      <td>3</td>\n",
              "      <td>32</td>\n",
              "      <td>10</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>Antony</td>\n",
              "      <td>Ajax</td>\n",
              "      <td>Forward</td>\n",
              "      <td>5</td>\n",
              "      <td>3</td>\n",
              "      <td>4</td>\n",
              "      <td>28</td>\n",
              "      <td>7</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>Alexander-Arnold</td>\n",
              "      <td>Liverpool</td>\n",
              "      <td>Defender</td>\n",
              "      <td>4</td>\n",
              "      <td>36</td>\n",
              "      <td>0</td>\n",
              "      <td>9</td>\n",
              "      <td>9</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-c0fb41da-8287-4005-8c78-d295f8563a81')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-c0fb41da-8287-4005-8c78-d295f8563a81 button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-c0fb41da-8287-4005-8c78-d295f8563a81');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "\n",
              "<div id=\"df-8abe4ed4-016c-491d-a36d-ad4b8d5b7299\">\n",
              "  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-8abe4ed4-016c-491d-a36d-ad4b8d5b7299')\"\n",
              "            title=\"Suggest charts.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "     width=\"24px\">\n",
              "    <g>\n",
              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
              "    </g>\n",
              "</svg>\n",
              "  </button>\n",
              "\n",
              "<style>\n",
              "  .colab-df-quickchart {\n",
              "      --bg-color: #E8F0FE;\n",
              "      --fill-color: #1967D2;\n",
              "      --hover-bg-color: #E2EBFA;\n",
              "      --hover-fill-color: #174EA6;\n",
              "      --disabled-fill-color: #AAA;\n",
              "      --disabled-bg-color: #DDD;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart {\n",
              "      --bg-color: #3B4455;\n",
              "      --fill-color: #D2E3FC;\n",
              "      --hover-bg-color: #434B5C;\n",
              "      --hover-fill-color: #FFFFFF;\n",
              "      --disabled-bg-color: #3B4455;\n",
              "      --disabled-fill-color: #666;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart {\n",
              "    background-color: var(--bg-color);\n",
              "    border: none;\n",
              "    border-radius: 50%;\n",
              "    cursor: pointer;\n",
              "    display: none;\n",
              "    fill: var(--fill-color);\n",
              "    height: 32px;\n",
              "    padding: 0;\n",
              "    width: 32px;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart:hover {\n",
              "    background-color: var(--hover-bg-color);\n",
              "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "    fill: var(--button-hover-fill-color);\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart-complete:disabled,\n",
              "  .colab-df-quickchart-complete:disabled:hover {\n",
              "    background-color: var(--disabled-bg-color);\n",
              "    fill: var(--disabled-fill-color);\n",
              "    box-shadow: none;\n",
              "  }\n",
              "\n",
              "  .colab-df-spinner {\n",
              "    border: 2px solid var(--fill-color);\n",
              "    border-color: transparent;\n",
              "    border-bottom-color: var(--fill-color);\n",
              "    animation:\n",
              "      spin 1s steps(1) infinite;\n",
              "  }\n",
              "\n",
              "  @keyframes spin {\n",
              "    0% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "      border-left-color: var(--fill-color);\n",
              "    }\n",
              "    20% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    30% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    40% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    60% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    80% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "    90% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "  }\n",
              "</style>\n",
              "\n",
              "  <script>\n",
              "    async function quickchart(key) {\n",
              "      const quickchartButtonEl =\n",
              "        document.querySelector('#' + key + ' button');\n",
              "      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
              "      quickchartButtonEl.classList.add('colab-df-spinner');\n",
              "      try {\n",
              "        const charts = await google.colab.kernel.invokeFunction(\n",
              "            'suggestCharts', [key], {});\n",
              "      } catch (error) {\n",
              "        console.error('Error during call to suggestCharts:', error);\n",
              "      }\n",
              "      quickchartButtonEl.classList.remove('colab-df-spinner');\n",
              "      quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
              "    }\n",
              "    (() => {\n",
              "      let quickchartButtonEl =\n",
              "        document.querySelector('#df-8abe4ed4-016c-491d-a36d-ad4b8d5b7299 button');\n",
              "      quickchartButtonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "    })();\n",
              "  </script>\n",
              "</div>\n",
              "    </div>\n",
              "  </div>\n"
            ]
          },
          "metadata": {},
          "execution_count": 9
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "#deleting columns from goals\n",
        "del_col = ['right_foot', 'left_foot', 'headers', 'others', 'inside_area', 'outside_areas', 'serial']\n",
        "#delete columns here\n",
        "goals = goals.drop(columns = del_col)"
      ],
      "metadata": {
        "id": "KK5AEy9427Gj"
      },
      "execution_count": 10,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "Let us now merge the dataframes."
      ],
      "metadata": {
        "id": "8JLJhyAd8Gnk"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "#merge ucl here\n",
        "ucl = attack.merge(goals, on= ['player_name', 'club', 'position', 'match_played'])\n",
        "ucl.head()"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 206
        },
        "id": "np28EL647q38",
        "outputId": "2ac41298-2bdb-4aa3-d5a4-891a381d8b27"
      },
      "execution_count": 11,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "       player_name         club    position  assists  corner_taken  offsides  \\\n",
              "0  Vinícius Júnior  Real Madrid     Forward        6             3         4   \n",
              "1             Sané       Bayern  Midfielder        6             3         3   \n",
              "2           Antony         Ajax     Forward        5             3         4   \n",
              "3        De Bruyne    Man. City  Midfielder        4            18         0   \n",
              "4           Mbappé        Paris     Forward        4             4         8   \n",
              "\n",
              "   dribbles  match_played  goals  penalties  \n",
              "0        83            13      4          0  \n",
              "1        32            10      6          0  \n",
              "2        28             7      2          0  \n",
              "3        14            10      2          0  \n",
              "4        43             8      6          0  "
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-a1b17598-532c-469b-904a-d8da4c99d3b0\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>player_name</th>\n",
              "      <th>club</th>\n",
              "      <th>position</th>\n",
              "      <th>assists</th>\n",
              "      <th>corner_taken</th>\n",
              "      <th>offsides</th>\n",
              "      <th>dribbles</th>\n",
              "      <th>match_played</th>\n",
              "      <th>goals</th>\n",
              "      <th>penalties</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>Vinícius Júnior</td>\n",
              "      <td>Real Madrid</td>\n",
              "      <td>Forward</td>\n",
              "      <td>6</td>\n",
              "      <td>3</td>\n",
              "      <td>4</td>\n",
              "      <td>83</td>\n",
              "      <td>13</td>\n",
              "      <td>4</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>Sané</td>\n",
              "      <td>Bayern</td>\n",
              "      <td>Midfielder</td>\n",
              "      <td>6</td>\n",
              "      <td>3</td>\n",
              "      <td>3</td>\n",
              "      <td>32</td>\n",
              "      <td>10</td>\n",
              "      <td>6</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>Antony</td>\n",
              "      <td>Ajax</td>\n",
              "      <td>Forward</td>\n",
              "      <td>5</td>\n",
              "      <td>3</td>\n",
              "      <td>4</td>\n",
              "      <td>28</td>\n",
              "      <td>7</td>\n",
              "      <td>2</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>De Bruyne</td>\n",
              "      <td>Man. City</td>\n",
              "      <td>Midfielder</td>\n",
              "      <td>4</td>\n",
              "      <td>18</td>\n",
              "      <td>0</td>\n",
              "      <td>14</td>\n",
              "      <td>10</td>\n",
              "      <td>2</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>Mbappé</td>\n",
              "      <td>Paris</td>\n",
              "      <td>Forward</td>\n",
              "      <td>4</td>\n",
              "      <td>4</td>\n",
              "      <td>8</td>\n",
              "      <td>43</td>\n",
              "      <td>8</td>\n",
              "      <td>6</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-a1b17598-532c-469b-904a-d8da4c99d3b0')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-a1b17598-532c-469b-904a-d8da4c99d3b0 button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-a1b17598-532c-469b-904a-d8da4c99d3b0');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "\n",
              "<div id=\"df-689d3f77-f47b-465d-a5cb-bcec90d666f5\">\n",
              "  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-689d3f77-f47b-465d-a5cb-bcec90d666f5')\"\n",
              "            title=\"Suggest charts.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "     width=\"24px\">\n",
              "    <g>\n",
              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
              "    </g>\n",
              "</svg>\n",
              "  </button>\n",
              "\n",
              "<style>\n",
              "  .colab-df-quickchart {\n",
              "      --bg-color: #E8F0FE;\n",
              "      --fill-color: #1967D2;\n",
              "      --hover-bg-color: #E2EBFA;\n",
              "      --hover-fill-color: #174EA6;\n",
              "      --disabled-fill-color: #AAA;\n",
              "      --disabled-bg-color: #DDD;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart {\n",
              "      --bg-color: #3B4455;\n",
              "      --fill-color: #D2E3FC;\n",
              "      --hover-bg-color: #434B5C;\n",
              "      --hover-fill-color: #FFFFFF;\n",
              "      --disabled-bg-color: #3B4455;\n",
              "      --disabled-fill-color: #666;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart {\n",
              "    background-color: var(--bg-color);\n",
              "    border: none;\n",
              "    border-radius: 50%;\n",
              "    cursor: pointer;\n",
              "    display: none;\n",
              "    fill: var(--fill-color);\n",
              "    height: 32px;\n",
              "    padding: 0;\n",
              "    width: 32px;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart:hover {\n",
              "    background-color: var(--hover-bg-color);\n",
              "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "    fill: var(--button-hover-fill-color);\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart-complete:disabled,\n",
              "  .colab-df-quickchart-complete:disabled:hover {\n",
              "    background-color: var(--disabled-bg-color);\n",
              "    fill: var(--disabled-fill-color);\n",
              "    box-shadow: none;\n",
              "  }\n",
              "\n",
              "  .colab-df-spinner {\n",
              "    border: 2px solid var(--fill-color);\n",
              "    border-color: transparent;\n",
              "    border-bottom-color: var(--fill-color);\n",
              "    animation:\n",
              "      spin 1s steps(1) infinite;\n",
              "  }\n",
              "\n",
              "  @keyframes spin {\n",
              "    0% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "      border-left-color: var(--fill-color);\n",
              "    }\n",
              "    20% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    30% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    40% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    60% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    80% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "    90% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "  }\n",
              "</style>\n",
              "\n",
              "  <script>\n",
              "    async function quickchart(key) {\n",
              "      const quickchartButtonEl =\n",
              "        document.querySelector('#' + key + ' button');\n",
              "      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
              "      quickchartButtonEl.classList.add('colab-df-spinner');\n",
              "      try {\n",
              "        const charts = await google.colab.kernel.invokeFunction(\n",
              "            'suggestCharts', [key], {});\n",
              "      } catch (error) {\n",
              "        console.error('Error during call to suggestCharts:', error);\n",
              "      }\n",
              "      quickchartButtonEl.classList.remove('colab-df-spinner');\n",
              "      quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
              "    }\n",
              "    (() => {\n",
              "      let quickchartButtonEl =\n",
              "        document.querySelector('#df-689d3f77-f47b-465d-a5cb-bcec90d666f5 button');\n",
              "      quickchartButtonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "    })();\n",
              "  </script>\n",
              "</div>\n",
              "    </div>\n",
              "  </div>\n"
            ]
          },
          "metadata": {},
          "execution_count": 11
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "#print shape of ucl\n",
        "ucl.shape"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "-tt6HzR8aKiE",
        "outputId": "6a4d81d1-f1bc-400f-93c0-1b5b4847e0ab"
      },
      "execution_count": 12,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "(82, 10)"
            ]
          },
          "metadata": {},
          "execution_count": 12
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "It seems like a lot of players were dropped because they didn't exist in both the datasets. Let us continue with a smaller dataset for the scope of this question."
      ],
      "metadata": {
        "id": "9ayGAYudZxPw"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "Now, we have a final dataframe called `ucl` to work with. Let us now make a list of the top 10 players.\n",
        "\n",
        "Columns like `assists`, `corner_taken`, `dribbles` and `goals` give a positive score.\n",
        "\n",
        "Columns like `offsides` and `penalties` give a negative score.\n",
        "\n",
        "Let us first normalize all the columns using `match_played`, then simply add or subtract the positive and negative scores and then make a dataframe of the top 10 players.\n",
        "\n",
        "Remember, for more complex projects, we will normalize all columns to have a value between 0 and 1, which will not be the case for this project since columns like `dribble` have values like 83, 67 and so on."
      ],
      "metadata": {
        "id": "-1rXgrue85VV"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "#normalizing the data and storing them in new columns\n",
        "ucl['assists_normalized'] = ucl['assists']/ucl['match_played']\n",
        "ucl['corner_taken_normalized'] = ucl['corner_taken']/ucl['match_played']\n",
        "ucl['dribbles_normalized'] = ucl['dribbles']/ucl['match_played']\n",
        "ucl['goals_normalized'] = ucl['goals']/ucl['match_played']\n",
        "ucl['offsides_normalized'] = ucl['offsides']/ucl['match_played']\n",
        "ucl['penalties_normalized'] = ucl['penalties']/ucl['match_played']"
      ],
      "metadata": {
        "id": "OKszPGlUoj4C"
      },
      "execution_count": 13,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# Calculate overall score\n",
        "ucl['score'] = ucl['assists_normalized'] + ucl['corner_taken_normalized'] + ucl['dribbles_normalized'] + ucl['goals_normalized'] - ucl['offsides_normalized'] - ucl['penalties_normalized']"
      ],
      "metadata": {
        "id": "e797uAQool8v"
      },
      "execution_count": 14,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "#ranking the players\n",
        "top_10_players = ucl.nlargest(10, 'score')\n",
        "top_10_players.head(10)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 435
        },
        "id": "TJQwT7uToZqg",
        "outputId": "3f2d51b6-ca74-4dff-ea8e-b9a9b64e7ef6"
      },
      "execution_count": 15,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "        player_name         club    position  assists  corner_taken  offsides  \\\n",
              "9             Coman       Bayern     Forward        3             4         4   \n",
              "0   Vinícius Júnior  Real Madrid     Forward        6             3         4   \n",
              "51   Moumi Ngamaleu   Young Boys  Midfielder        1             1         0   \n",
              "4            Mbappé        Paris     Forward        4             4         8   \n",
              "15           Mahrez    Man. City  Midfielder        2            30         5   \n",
              "2            Antony         Ajax     Forward        5             3         4   \n",
              "11       Bellingham     Dortmund  Midfielder        3             1         1   \n",
              "1              Sané       Bayern  Midfielder        6             3         3   \n",
              "19  Pedro Gonçalves  Sporting CP  Midfielder        2             7         0   \n",
              "16           Ziyech      Chelsea  Midfielder        2            23         0   \n",
              "\n",
              "    dribbles  match_played  goals  penalties  assists_normalized  \\\n",
              "9         59             9      2          0            0.333333   \n",
              "0         83            13      4          0            0.461538   \n",
              "51        34             6      1          0            0.166667   \n",
              "4         43             8      6          0            0.500000   \n",
              "15        28            12      7          2            0.166667   \n",
              "2         28             7      2          0            0.714286   \n",
              "11        24             6      1          0            0.500000   \n",
              "1         32            10      6          0            0.600000   \n",
              "19        10             5      4          1            0.400000   \n",
              "16        12             9      1          0            0.222222   \n",
              "\n",
              "    corner_taken_normalized  dribbles_normalized  goals_normalized  \\\n",
              "9                  0.444444             6.555556          0.222222   \n",
              "0                  0.230769             6.384615          0.307692   \n",
              "51                 0.166667             5.666667          0.166667   \n",
              "4                  0.500000             5.375000          0.750000   \n",
              "15                 2.500000             2.333333          0.583333   \n",
              "2                  0.428571             4.000000          0.285714   \n",
              "11                 0.166667             4.000000          0.166667   \n",
              "1                  0.300000             3.200000          0.600000   \n",
              "19                 1.400000             2.000000          0.800000   \n",
              "16                 2.555556             1.333333          0.111111   \n",
              "\n",
              "    offsides_normalized  penalties_normalized     score  \n",
              "9              0.444444              0.000000  7.111111  \n",
              "0              0.307692              0.000000  7.076923  \n",
              "51             0.000000              0.000000  6.166667  \n",
              "4              1.000000              0.000000  6.125000  \n",
              "15             0.416667              0.166667  5.000000  \n",
              "2              0.571429              0.000000  4.857143  \n",
              "11             0.166667              0.000000  4.666667  \n",
              "1              0.300000              0.000000  4.400000  \n",
              "19             0.000000              0.200000  4.400000  \n",
              "16             0.000000              0.000000  4.222222  "
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-e286e525-2e77-4968-9800-c635972b593f\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>player_name</th>\n",
              "      <th>club</th>\n",
              "      <th>position</th>\n",
              "      <th>assists</th>\n",
              "      <th>corner_taken</th>\n",
              "      <th>offsides</th>\n",
              "      <th>dribbles</th>\n",
              "      <th>match_played</th>\n",
              "      <th>goals</th>\n",
              "      <th>penalties</th>\n",
              "      <th>assists_normalized</th>\n",
              "      <th>corner_taken_normalized</th>\n",
              "      <th>dribbles_normalized</th>\n",
              "      <th>goals_normalized</th>\n",
              "      <th>offsides_normalized</th>\n",
              "      <th>penalties_normalized</th>\n",
              "      <th>score</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>9</th>\n",
              "      <td>Coman</td>\n",
              "      <td>Bayern</td>\n",
              "      <td>Forward</td>\n",
              "      <td>3</td>\n",
              "      <td>4</td>\n",
              "      <td>4</td>\n",
              "      <td>59</td>\n",
              "      <td>9</td>\n",
              "      <td>2</td>\n",
              "      <td>0</td>\n",
              "      <td>0.333333</td>\n",
              "      <td>0.444444</td>\n",
              "      <td>6.555556</td>\n",
              "      <td>0.222222</td>\n",
              "      <td>0.444444</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>7.111111</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>Vinícius Júnior</td>\n",
              "      <td>Real Madrid</td>\n",
              "      <td>Forward</td>\n",
              "      <td>6</td>\n",
              "      <td>3</td>\n",
              "      <td>4</td>\n",
              "      <td>83</td>\n",
              "      <td>13</td>\n",
              "      <td>4</td>\n",
              "      <td>0</td>\n",
              "      <td>0.461538</td>\n",
              "      <td>0.230769</td>\n",
              "      <td>6.384615</td>\n",
              "      <td>0.307692</td>\n",
              "      <td>0.307692</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>7.076923</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>51</th>\n",
              "      <td>Moumi Ngamaleu</td>\n",
              "      <td>Young Boys</td>\n",
              "      <td>Midfielder</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>34</td>\n",
              "      <td>6</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0.166667</td>\n",
              "      <td>0.166667</td>\n",
              "      <td>5.666667</td>\n",
              "      <td>0.166667</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>6.166667</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>Mbappé</td>\n",
              "      <td>Paris</td>\n",
              "      <td>Forward</td>\n",
              "      <td>4</td>\n",
              "      <td>4</td>\n",
              "      <td>8</td>\n",
              "      <td>43</td>\n",
              "      <td>8</td>\n",
              "      <td>6</td>\n",
              "      <td>0</td>\n",
              "      <td>0.500000</td>\n",
              "      <td>0.500000</td>\n",
              "      <td>5.375000</td>\n",
              "      <td>0.750000</td>\n",
              "      <td>1.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>6.125000</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>15</th>\n",
              "      <td>Mahrez</td>\n",
              "      <td>Man. City</td>\n",
              "      <td>Midfielder</td>\n",
              "      <td>2</td>\n",
              "      <td>30</td>\n",
              "      <td>5</td>\n",
              "      <td>28</td>\n",
              "      <td>12</td>\n",
              "      <td>7</td>\n",
              "      <td>2</td>\n",
              "      <td>0.166667</td>\n",
              "      <td>2.500000</td>\n",
              "      <td>2.333333</td>\n",
              "      <td>0.583333</td>\n",
              "      <td>0.416667</td>\n",
              "      <td>0.166667</td>\n",
              "      <td>5.000000</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>Antony</td>\n",
              "      <td>Ajax</td>\n",
              "      <td>Forward</td>\n",
              "      <td>5</td>\n",
              "      <td>3</td>\n",
              "      <td>4</td>\n",
              "      <td>28</td>\n",
              "      <td>7</td>\n",
              "      <td>2</td>\n",
              "      <td>0</td>\n",
              "      <td>0.714286</td>\n",
              "      <td>0.428571</td>\n",
              "      <td>4.000000</td>\n",
              "      <td>0.285714</td>\n",
              "      <td>0.571429</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>4.857143</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>11</th>\n",
              "      <td>Bellingham</td>\n",
              "      <td>Dortmund</td>\n",
              "      <td>Midfielder</td>\n",
              "      <td>3</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>24</td>\n",
              "      <td>6</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0.500000</td>\n",
              "      <td>0.166667</td>\n",
              "      <td>4.000000</td>\n",
              "      <td>0.166667</td>\n",
              "      <td>0.166667</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>4.666667</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>Sané</td>\n",
              "      <td>Bayern</td>\n",
              "      <td>Midfielder</td>\n",
              "      <td>6</td>\n",
              "      <td>3</td>\n",
              "      <td>3</td>\n",
              "      <td>32</td>\n",
              "      <td>10</td>\n",
              "      <td>6</td>\n",
              "      <td>0</td>\n",
              "      <td>0.600000</td>\n",
              "      <td>0.300000</td>\n",
              "      <td>3.200000</td>\n",
              "      <td>0.600000</td>\n",
              "      <td>0.300000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>4.400000</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>19</th>\n",
              "      <td>Pedro Gonçalves</td>\n",
              "      <td>Sporting CP</td>\n",
              "      <td>Midfielder</td>\n",
              "      <td>2</td>\n",
              "      <td>7</td>\n",
              "      <td>0</td>\n",
              "      <td>10</td>\n",
              "      <td>5</td>\n",
              "      <td>4</td>\n",
              "      <td>1</td>\n",
              "      <td>0.400000</td>\n",
              "      <td>1.400000</td>\n",
              "      <td>2.000000</td>\n",
              "      <td>0.800000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.200000</td>\n",
              "      <td>4.400000</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>16</th>\n",
              "      <td>Ziyech</td>\n",
              "      <td>Chelsea</td>\n",
              "      <td>Midfielder</td>\n",
              "      <td>2</td>\n",
              "      <td>23</td>\n",
              "      <td>0</td>\n",
              "      <td>12</td>\n",
              "      <td>9</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>0.222222</td>\n",
              "      <td>2.555556</td>\n",
              "      <td>1.333333</td>\n",
              "      <td>0.111111</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000000</td>\n",
              "      <td>4.222222</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-e286e525-2e77-4968-9800-c635972b593f')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-e286e525-2e77-4968-9800-c635972b593f button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-e286e525-2e77-4968-9800-c635972b593f');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "\n",
              "<div id=\"df-02dcf5ed-8573-4642-8cf2-b1e04357a076\">\n",
              "  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-02dcf5ed-8573-4642-8cf2-b1e04357a076')\"\n",
              "            title=\"Suggest charts.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "     width=\"24px\">\n",
              "    <g>\n",
              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
              "    </g>\n",
              "</svg>\n",
              "  </button>\n",
              "\n",
              "<style>\n",
              "  .colab-df-quickchart {\n",
              "      --bg-color: #E8F0FE;\n",
              "      --fill-color: #1967D2;\n",
              "      --hover-bg-color: #E2EBFA;\n",
              "      --hover-fill-color: #174EA6;\n",
              "      --disabled-fill-color: #AAA;\n",
              "      --disabled-bg-color: #DDD;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart {\n",
              "      --bg-color: #3B4455;\n",
              "      --fill-color: #D2E3FC;\n",
              "      --hover-bg-color: #434B5C;\n",
              "      --hover-fill-color: #FFFFFF;\n",
              "      --disabled-bg-color: #3B4455;\n",
              "      --disabled-fill-color: #666;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart {\n",
              "    background-color: var(--bg-color);\n",
              "    border: none;\n",
              "    border-radius: 50%;\n",
              "    cursor: pointer;\n",
              "    display: none;\n",
              "    fill: var(--fill-color);\n",
              "    height: 32px;\n",
              "    padding: 0;\n",
              "    width: 32px;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart:hover {\n",
              "    background-color: var(--hover-bg-color);\n",
              "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "    fill: var(--button-hover-fill-color);\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart-complete:disabled,\n",
              "  .colab-df-quickchart-complete:disabled:hover {\n",
              "    background-color: var(--disabled-bg-color);\n",
              "    fill: var(--disabled-fill-color);\n",
              "    box-shadow: none;\n",
              "  }\n",
              "\n",
              "  .colab-df-spinner {\n",
              "    border: 2px solid var(--fill-color);\n",
              "    border-color: transparent;\n",
              "    border-bottom-color: var(--fill-color);\n",
              "    animation:\n",
              "      spin 1s steps(1) infinite;\n",
              "  }\n",
              "\n",
              "  @keyframes spin {\n",
              "    0% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "      border-left-color: var(--fill-color);\n",
              "    }\n",
              "    20% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    30% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    40% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    60% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    80% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "    90% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "  }\n",
              "</style>\n",
              "\n",
              "  <script>\n",
              "    async function quickchart(key) {\n",
              "      const quickchartButtonEl =\n",
              "        document.querySelector('#' + key + ' button');\n",
              "      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
              "      quickchartButtonEl.classList.add('colab-df-spinner');\n",
              "      try {\n",
              "        const charts = await google.colab.kernel.invokeFunction(\n",
              "            'suggestCharts', [key], {});\n",
              "      } catch (error) {\n",
              "        console.error('Error during call to suggestCharts:', error);\n",
              "      }\n",
              "      quickchartButtonEl.classList.remove('colab-df-spinner');\n",
              "      quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
              "    }\n",
              "    (() => {\n",
              "      let quickchartButtonEl =\n",
              "        document.querySelector('#df-02dcf5ed-8573-4642-8cf2-b1e04357a076 button');\n",
              "      quickchartButtonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "    })();\n",
              "  </script>\n",
              "</div>\n",
              "    </div>\n",
              "  </div>\n"
            ]
          },
          "metadata": {},
          "execution_count": 15
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "#cleaning the column\n",
        "print(top_10_players.columns)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "bU2cahIJo84-",
        "outputId": "6cad1f6e-9dfa-4784-c8b4-629f75dfe147"
      },
      "execution_count": 16,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Index(['player_name', 'club', 'position', 'assists', 'corner_taken',\n",
            "       'offsides', 'dribbles', 'match_played', 'goals', 'penalties',\n",
            "       'assists_normalized', 'corner_taken_normalized', 'dribbles_normalized',\n",
            "       'goals_normalized', 'offsides_normalized', 'penalties_normalized',\n",
            "       'score'],\n",
            "      dtype='object')\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "#delete\n",
        "del_col = ['assists', 'corner_taken',\n",
        "       'offsides', 'dribbles', 'match_played', 'goals', 'penalties',\n",
        "       'assists_normalized', 'corner_taken_normalized', 'dribbles_normalized',\n",
        "       'goals_normalized', 'offsides_normalized', 'penalties_normalized']\n",
        "top_10_players = top_10_players.drop(columns = del_col)"
      ],
      "metadata": {
        "id": "N2hwWcwvpDg-"
      },
      "execution_count": 17,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "top_10_players.head(10)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 363
        },
        "id": "HtP7mrFJpP9z",
        "outputId": "4cb62487-4865-4c25-bddd-086e3e5a703a"
      },
      "execution_count": 18,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "        player_name         club    position     score\n",
              "9             Coman       Bayern     Forward  7.111111\n",
              "0   Vinícius Júnior  Real Madrid     Forward  7.076923\n",
              "51   Moumi Ngamaleu   Young Boys  Midfielder  6.166667\n",
              "4            Mbappé        Paris     Forward  6.125000\n",
              "15           Mahrez    Man. City  Midfielder  5.000000\n",
              "2            Antony         Ajax     Forward  4.857143\n",
              "11       Bellingham     Dortmund  Midfielder  4.666667\n",
              "1              Sané       Bayern  Midfielder  4.400000\n",
              "19  Pedro Gonçalves  Sporting CP  Midfielder  4.400000\n",
              "16           Ziyech      Chelsea  Midfielder  4.222222"
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-660678a5-6d83-4c29-a64d-8006c51a6195\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>player_name</th>\n",
              "      <th>club</th>\n",
              "      <th>position</th>\n",
              "      <th>score</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>9</th>\n",
              "      <td>Coman</td>\n",
              "      <td>Bayern</td>\n",
              "      <td>Forward</td>\n",
              "      <td>7.111111</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>Vinícius Júnior</td>\n",
              "      <td>Real Madrid</td>\n",
              "      <td>Forward</td>\n",
              "      <td>7.076923</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>51</th>\n",
              "      <td>Moumi Ngamaleu</td>\n",
              "      <td>Young Boys</td>\n",
              "      <td>Midfielder</td>\n",
              "      <td>6.166667</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>Mbappé</td>\n",
              "      <td>Paris</td>\n",
              "      <td>Forward</td>\n",
              "      <td>6.125000</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>15</th>\n",
              "      <td>Mahrez</td>\n",
              "      <td>Man. City</td>\n",
              "      <td>Midfielder</td>\n",
              "      <td>5.000000</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>Antony</td>\n",
              "      <td>Ajax</td>\n",
              "      <td>Forward</td>\n",
              "      <td>4.857143</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>11</th>\n",
              "      <td>Bellingham</td>\n",
              "      <td>Dortmund</td>\n",
              "      <td>Midfielder</td>\n",
              "      <td>4.666667</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>Sané</td>\n",
              "      <td>Bayern</td>\n",
              "      <td>Midfielder</td>\n",
              "      <td>4.400000</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>19</th>\n",
              "      <td>Pedro Gonçalves</td>\n",
              "      <td>Sporting CP</td>\n",
              "      <td>Midfielder</td>\n",
              "      <td>4.400000</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>16</th>\n",
              "      <td>Ziyech</td>\n",
              "      <td>Chelsea</td>\n",
              "      <td>Midfielder</td>\n",
              "      <td>4.222222</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-660678a5-6d83-4c29-a64d-8006c51a6195')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-660678a5-6d83-4c29-a64d-8006c51a6195 button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-660678a5-6d83-4c29-a64d-8006c51a6195');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "\n",
              "<div id=\"df-bbbcc6fb-fa7e-4614-9183-fbe19b4403dc\">\n",
              "  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-bbbcc6fb-fa7e-4614-9183-fbe19b4403dc')\"\n",
              "            title=\"Suggest charts.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "     width=\"24px\">\n",
              "    <g>\n",
              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
              "    </g>\n",
              "</svg>\n",
              "  </button>\n",
              "\n",
              "<style>\n",
              "  .colab-df-quickchart {\n",
              "      --bg-color: #E8F0FE;\n",
              "      --fill-color: #1967D2;\n",
              "      --hover-bg-color: #E2EBFA;\n",
              "      --hover-fill-color: #174EA6;\n",
              "      --disabled-fill-color: #AAA;\n",
              "      --disabled-bg-color: #DDD;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart {\n",
              "      --bg-color: #3B4455;\n",
              "      --fill-color: #D2E3FC;\n",
              "      --hover-bg-color: #434B5C;\n",
              "      --hover-fill-color: #FFFFFF;\n",
              "      --disabled-bg-color: #3B4455;\n",
              "      --disabled-fill-color: #666;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart {\n",
              "    background-color: var(--bg-color);\n",
              "    border: none;\n",
              "    border-radius: 50%;\n",
              "    cursor: pointer;\n",
              "    display: none;\n",
              "    fill: var(--fill-color);\n",
              "    height: 32px;\n",
              "    padding: 0;\n",
              "    width: 32px;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart:hover {\n",
              "    background-color: var(--hover-bg-color);\n",
              "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "    fill: var(--button-hover-fill-color);\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart-complete:disabled,\n",
              "  .colab-df-quickchart-complete:disabled:hover {\n",
              "    background-color: var(--disabled-bg-color);\n",
              "    fill: var(--disabled-fill-color);\n",
              "    box-shadow: none;\n",
              "  }\n",
              "\n",
              "  .colab-df-spinner {\n",
              "    border: 2px solid var(--fill-color);\n",
              "    border-color: transparent;\n",
              "    border-bottom-color: var(--fill-color);\n",
              "    animation:\n",
              "      spin 1s steps(1) infinite;\n",
              "  }\n",
              "\n",
              "  @keyframes spin {\n",
              "    0% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "      border-left-color: var(--fill-color);\n",
              "    }\n",
              "    20% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    30% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    40% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    60% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    80% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "    90% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "  }\n",
              "</style>\n",
              "\n",
              "  <script>\n",
              "    async function quickchart(key) {\n",
              "      const quickchartButtonEl =\n",
              "        document.querySelector('#' + key + ' button');\n",
              "      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
              "      quickchartButtonEl.classList.add('colab-df-spinner');\n",
              "      try {\n",
              "        const charts = await google.colab.kernel.invokeFunction(\n",
              "            'suggestCharts', [key], {});\n",
              "      } catch (error) {\n",
              "        console.error('Error during call to suggestCharts:', error);\n",
              "      }\n",
              "      quickchartButtonEl.classList.remove('colab-df-spinner');\n",
              "      quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
              "    }\n",
              "    (() => {\n",
              "      let quickchartButtonEl =\n",
              "        document.querySelector('#df-bbbcc6fb-fa7e-4614-9183-fbe19b4403dc button');\n",
              "      quickchartButtonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "    })();\n",
              "  </script>\n",
              "</div>\n",
              "    </div>\n",
              "  </div>\n"
            ]
          },
          "metadata": {},
          "execution_count": 18
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Part B - Review of NumPy\n",
        "\n",
        "Let us now review numpy."
      ],
      "metadata": {
        "id": "jVxrL9Xd8uzi"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "**1. Create a 2-dimensional NumPy array with 3 rows and 4 columns, filled with random integers between 0 and 100 (inclusive).**"
      ],
      "metadata": {
        "id": "NSm0sKHH8y31"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "array_2d = np.random.randint(0,101,(3,4))\n",
        "print(array_2d)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "wP8aUQ_s8yFp",
        "outputId": "4118d1c1-b617-4850-f264-047a79a8105c"
      },
      "execution_count": 19,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "[[ 23  24  38  91]\n",
            " [100  92  80  54]\n",
            " [ 76  98  64  31]]\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "**2. Generate an array of 20 random integers between 1 and 100 (inclusive). Calculate the mean, median, and mode of the generated array. Create a new array containing 100 random samples from a standard normal distribution (mean=0, std=1).**"
      ],
      "metadata": {
        "id": "mSo-9GZV87g1"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "random_integers = np.random.randint(1,101, 20)\n",
        "\n",
        "mean_sample = np.mean(random_integers)\n",
        "median_sample = np.median(random_integers)\n",
        "unique_values, counts = np.unique(random_integers, return_counts = True)\n",
        "mode_sample = unique_values[counts.argmax()]\n",
        "\n",
        "standard_normal_samples = np.random.normal(0,1,100)\n",
        "\n",
        "print(random_integers)\n",
        "print(\"Mean:\", mean_sample)\n",
        "print(\"Median:\", median_sample)\n",
        "print(\"Mode:\", mode_sample)\n",
        "print(standard_normal_samples[:10])\n"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "joYkt9-k89ry",
        "outputId": "2db74cfe-adf5-46af-a4db-6894c7ab634b"
      },
      "execution_count": 20,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "[ 21  62  55  50  69  30  47  15  75   6  37  93  94  58  39  26 100  40\n",
            "  92  50]\n",
            "Mean: 52.95\n",
            "Median: 50.0\n",
            "Mode: 50\n",
            "[ 0.04254721 -1.38797692 -0.23088044  0.27202176  0.93998854  1.43665346\n",
            " -0.47827033 -1.07138487  1.03319647 -0.78915187]\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "**3. Create an array of 10 unique integers between 1 and 20 (inclusive) Sample 5 random integers from this array, allowing for replacement. Calculate the mean and standard deviation of the sampled integers.**"
      ],
      "metadata": {
        "id": "nOSnUcsh9AIB"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "unique_integers = np.unique(np.random.randint(1,21,10))\n",
        "\n",
        "sample_with_replacement = np.random.choice(unique_integers, 5, replace = True)\n",
        "\n",
        "mean_sample_replacement = np.mean(sample_with_replacement)\n",
        "std_dev_sample_replacement = np.std(sample_with_replacement)\n",
        "\n",
        "print(unique_integers)\n",
        "print(sample_with_replacement)\n",
        "print(\"Mean:\", mean_sample_replacement)\n",
        "print(\"Standard Deviation:\", std_dev_sample_replacement)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "iMPWjo969CkJ",
        "outputId": "82d5d471-0c18-4eef-945c-e4e744cddb31"
      },
      "execution_count": 21,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "[ 1  3  4  5  6 13 14 16]\n",
            "[14  6  3  4  3]\n",
            "Mean: 6.0\n",
            "Standard Deviation: 4.147288270665544\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "**4. Create an array of 15 random integers between 1 and 10 (inclusive). Sort the array in ascending order. Find the unique values in the sorted array.**"
      ],
      "metadata": {
        "id": "UKdfUmnV9HXO"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "array_15_random = np.random.randint(1,11,15)\n",
        "\n",
        "sorted_array = np.sort(array_15_random)\n",
        "\n",
        "unique_sorted_values = np.unique(sorted_array)\n",
        "\n",
        "print(array_15_random)\n",
        "print(sorted_array)\n",
        "print(unique_sorted_values)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "Rb3FfBRL9H2h",
        "outputId": "28074627-19d8-400c-a534-1ef40db80222"
      },
      "execution_count": 22,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "[ 9  8  6 10  1  6  7  7  1  7  9  8  7  7  9]\n",
            "[ 1  1  6  6  7  7  7  7  7  8  8  9  9  9 10]\n",
            "[ 1  6  7  8  9 10]\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Part C - Sampling and Hypothesis testing\n",
        "\n",
        "In this part, we will work with the `tips` dataset that can be downloaded from the `seaborn` library.\n",
        "\n",
        "The \"tips\" dataset contains information about restaurant tips, including total bills, tips, and other relevant details."
      ],
      "metadata": {
        "id": "gFjCQNBnfDZX"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "#downloading the dataset into a dataframe\n",
        "tips = sns.load_dataset('tips')\n",
        "tips.head()"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 206
        },
        "id": "rcg4FQdgwVQO",
        "outputId": "ec0583c3-da30-48ce-fd49-b56d70bfc480"
      },
      "execution_count": 23,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "   total_bill   tip     sex smoker  day    time  size\n",
              "0       16.99  1.01  Female     No  Sun  Dinner     2\n",
              "1       10.34  1.66    Male     No  Sun  Dinner     3\n",
              "2       21.01  3.50    Male     No  Sun  Dinner     3\n",
              "3       23.68  3.31    Male     No  Sun  Dinner     2\n",
              "4       24.59  3.61  Female     No  Sun  Dinner     4"
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-190e1aee-7e18-43ca-8639-ae2d7e873361\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>total_bill</th>\n",
              "      <th>tip</th>\n",
              "      <th>sex</th>\n",
              "      <th>smoker</th>\n",
              "      <th>day</th>\n",
              "      <th>time</th>\n",
              "      <th>size</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>16.99</td>\n",
              "      <td>1.01</td>\n",
              "      <td>Female</td>\n",
              "      <td>No</td>\n",
              "      <td>Sun</td>\n",
              "      <td>Dinner</td>\n",
              "      <td>2</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>10.34</td>\n",
              "      <td>1.66</td>\n",
              "      <td>Male</td>\n",
              "      <td>No</td>\n",
              "      <td>Sun</td>\n",
              "      <td>Dinner</td>\n",
              "      <td>3</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>21.01</td>\n",
              "      <td>3.50</td>\n",
              "      <td>Male</td>\n",
              "      <td>No</td>\n",
              "      <td>Sun</td>\n",
              "      <td>Dinner</td>\n",
              "      <td>3</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>23.68</td>\n",
              "      <td>3.31</td>\n",
              "      <td>Male</td>\n",
              "      <td>No</td>\n",
              "      <td>Sun</td>\n",
              "      <td>Dinner</td>\n",
              "      <td>2</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>24.59</td>\n",
              "      <td>3.61</td>\n",
              "      <td>Female</td>\n",
              "      <td>No</td>\n",
              "      <td>Sun</td>\n",
              "      <td>Dinner</td>\n",
              "      <td>4</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-190e1aee-7e18-43ca-8639-ae2d7e873361')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-190e1aee-7e18-43ca-8639-ae2d7e873361 button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-190e1aee-7e18-43ca-8639-ae2d7e873361');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "\n",
              "<div id=\"df-5be367d3-7746-423a-8ac0-72da0fab7eb7\">\n",
              "  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-5be367d3-7746-423a-8ac0-72da0fab7eb7')\"\n",
              "            title=\"Suggest charts.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "     width=\"24px\">\n",
              "    <g>\n",
              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
              "    </g>\n",
              "</svg>\n",
              "  </button>\n",
              "\n",
              "<style>\n",
              "  .colab-df-quickchart {\n",
              "      --bg-color: #E8F0FE;\n",
              "      --fill-color: #1967D2;\n",
              "      --hover-bg-color: #E2EBFA;\n",
              "      --hover-fill-color: #174EA6;\n",
              "      --disabled-fill-color: #AAA;\n",
              "      --disabled-bg-color: #DDD;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart {\n",
              "      --bg-color: #3B4455;\n",
              "      --fill-color: #D2E3FC;\n",
              "      --hover-bg-color: #434B5C;\n",
              "      --hover-fill-color: #FFFFFF;\n",
              "      --disabled-bg-color: #3B4455;\n",
              "      --disabled-fill-color: #666;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart {\n",
              "    background-color: var(--bg-color);\n",
              "    border: none;\n",
              "    border-radius: 50%;\n",
              "    cursor: pointer;\n",
              "    display: none;\n",
              "    fill: var(--fill-color);\n",
              "    height: 32px;\n",
              "    padding: 0;\n",
              "    width: 32px;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart:hover {\n",
              "    background-color: var(--hover-bg-color);\n",
              "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "    fill: var(--button-hover-fill-color);\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart-complete:disabled,\n",
              "  .colab-df-quickchart-complete:disabled:hover {\n",
              "    background-color: var(--disabled-bg-color);\n",
              "    fill: var(--disabled-fill-color);\n",
              "    box-shadow: none;\n",
              "  }\n",
              "\n",
              "  .colab-df-spinner {\n",
              "    border: 2px solid var(--fill-color);\n",
              "    border-color: transparent;\n",
              "    border-bottom-color: var(--fill-color);\n",
              "    animation:\n",
              "      spin 1s steps(1) infinite;\n",
              "  }\n",
              "\n",
              "  @keyframes spin {\n",
              "    0% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "      border-left-color: var(--fill-color);\n",
              "    }\n",
              "    20% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    30% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    40% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    60% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    80% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "    90% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "  }\n",
              "</style>\n",
              "\n",
              "  <script>\n",
              "    async function quickchart(key) {\n",
              "      const quickchartButtonEl =\n",
              "        document.querySelector('#' + key + ' button');\n",
              "      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
              "      quickchartButtonEl.classList.add('colab-df-spinner');\n",
              "      try {\n",
              "        const charts = await google.colab.kernel.invokeFunction(\n",
              "            'suggestCharts', [key], {});\n",
              "      } catch (error) {\n",
              "        console.error('Error during call to suggestCharts:', error);\n",
              "      }\n",
              "      quickchartButtonEl.classList.remove('colab-df-spinner');\n",
              "      quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
              "    }\n",
              "    (() => {\n",
              "      let quickchartButtonEl =\n",
              "        document.querySelector('#df-5be367d3-7746-423a-8ac0-72da0fab7eb7 button');\n",
              "      quickchartButtonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "    })();\n",
              "  </script>\n",
              "</div>\n",
              "    </div>\n",
              "  </div>\n"
            ]
          },
          "metadata": {},
          "execution_count": 23
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "print(tips.shape)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "2rPv5olHtHv7",
        "outputId": "9a85e329-8c61-4308-f21b-b3659cbc6191"
      },
      "execution_count": 24,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "(244, 7)\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "**1. You want to analyze the tipping behavior of a sample of customers. Perform simple random sampling to select a sample of 30 customers from the \"tips\" dataset. Calculate the sample mean and sample variance of the \"tip\" column for this sample.**"
      ],
      "metadata": {
        "id": "kqi0F-DZw3tX"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "sample_size = 30\n",
        "sample = tips.sample(n = 30, random_state = 42)"
      ],
      "metadata": {
        "id": "EEHL4DaOw3C6"
      },
      "execution_count": 25,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "sample_mean = sample['tip'].mean()\n",
        "sample_variance = sample['tip'].var()"
      ],
      "metadata": {
        "id": "9wcN7RZ6x4XM"
      },
      "execution_count": 26,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "print(\"Sample Mean:\", sample_mean)\n",
        "print(\"Sample Variance:\", sample_variance)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "E25a6rpsx6rB",
        "outputId": "77c4f0c6-0b57-46af-f601-0fcabb7bd0fd"
      },
      "execution_count": 27,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Sample Mean: 2.832333333333333\n",
            "Sample Variance: 1.3560667816091954\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "**2. You suspect that the tipping behavior varies by the day of the week. Perform stratified sampling to select a sample of 20 customers from each day (Thursday, Friday, Saturday, and Sunday) from the \"tips\" dataset. Calculate the sample mean and sample variance of the \"tip\" column for each stratum.**"
      ],
      "metadata": {
        "id": "SScNR11Lwxb5"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "#initialize an empty list to store the sample means and variances for each stratum\n",
        "stratum_sample_statistics = []"
      ],
      "metadata": {
        "id": "75C1J4Hbyv_U"
      },
      "execution_count": 29,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "#sampling\n",
        "days_of_week = ['Thur', 'Fri', 'Sat', 'Sun']\n",
        "sample_size_per_stratum = 20\n",
        "\n",
        "for day in days_of_week:\n",
        "    stratum = tips[tips['day'] == day]\n",
        "    stratum_sample = stratum.sample(n=20, random_state = 42, replace = True)\n",
        "    stratum_mean = stratum_sample['tip'].mean()\n",
        "    stratum_variance = stratum_sample['tip'].var()\n",
        "    stratum_sample_statistics.append({\n",
        "        'Day': day,\n",
        "        'Sample Mean': stratum_mean,\n",
        "        'Sample Variance': stratum_variance\n",
        "    })"
      ],
      "metadata": {
        "id": "b4YcEMR0y0SY"
      },
      "execution_count": 30,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "for stats in stratum_sample_statistics:\n",
        "    print(f\"Day: {stats['Day']}, Sample Mean: {stats['Sample Mean']}, Sample Variance: {stats['Sample Variance']}\")"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "iGEV1lldwxPN",
        "outputId": "a7196c9b-41c4-454d-c369-0dcfc93738af"
      },
      "execution_count": 31,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Day: Thur, Sample Mean: 2.894, Sample Variance: 1.6887410526315791\n",
            "Day: Fri, Sample Mean: 2.7265, Sample Variance: 1.1247186842105263\n",
            "Day: Sat, Sample Mean: 3.2269999999999994, Sample Variance: 3.2082957894736843\n",
            "Day: Sun, Sample Mean: 2.9415, Sample Variance: 0.9116028947368421\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "**3. You want to test whether the average total bill in the dataset is significantly different (greater) from $20. Perform a one-tail t-test to answer this question.**"
      ],
      "metadata": {
        "id": "paeQR9zizE3O"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# Null hypothesis (H0) = bill = 20\n",
        "# Alternative hypothesis (H1): the average total bill in the dataset is significantly different (greater) from $20\n",
        "\n",
        "alpha = 0.05\n",
        "\n",
        "# We will use a one-tail t-test because we are testing if the average is greater than $20.\n",
        "t_statistic, p_value = stats.ttest_1samp(tips['total_bill'], popmean = 20, alternative = 'greater')\n",
        "\n",
        "if p_value < alpha:\n",
        "    print(\"Reject the null hypothesis. The average total bill is significantly greater than $20.\")\n",
        "else:\n",
        "    print(\"Fail to reject the null hypothesis. There is no significant evidence that the average total bill is greater than $20.\")\n",
        "\n",
        "print(\"t-statistic:\", t_statistic)\n",
        "print(\"p-value:\", p_value)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "UT2KgHTUzPkz",
        "outputId": "2afe831b-fd15-49a0-d340-291774b84ef1"
      },
      "execution_count": 33,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Fail to reject the null hypothesis. There is no significant evidence that the average total bill is greater than $20.\n",
            "t-statistic: -0.37559294451919506\n",
            "p-value: 0.646226403218664\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "**4. You suspect that there is a significant difference in the total bill between lunch and dinner. Perform a two-tail t-test to test this hypothesis.**"
      ],
      "metadata": {
        "id": "9QRGjc1PzgLU"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "#Separate the total bill data into two groups: lunch and dinner\n",
        "total_bill_lunch = tips[tips['time']=='Lunch']['total_bill']\n",
        "total_bill_dinner = tips[tips['time']=='Dinner']['total_bill']\n",
        "\n",
        "#Define the null and alternative hypotheses\n",
        "# Null hypothesis (H0): bills lunch = bill dinner\n",
        "# Alternative hypothesis (H1): significant difference between the bills of lunch and dinner\n",
        "\n",
        "alpha = 0.05\n",
        "\n",
        "t_statistic, p_value = stats.ttest_ind(total_bill_lunch, total_bill_dinner, equal_var = False)\n",
        "\n",
        "if p_value < alpha:\n",
        "    print(\"Reject the null hypothesis. There is a significant difference in total bill between lunch and dinner.\")\n",
        "else:\n",
        "    print(\"Fail to reject the null hypothesis. There is no significant difference in total bill between lunch and dinner.\")\n",
        "\n",
        "print(\"t-statistic:\", t_statistic)\n",
        "print(\"p-value:\", p_value)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "pA20ZP6hzkJV",
        "outputId": "df968596-094e-4e93-d5d2-49cab568d7a4"
      },
      "execution_count": 34,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Reject the null hypothesis. There is a significant difference in total bill between lunch and dinner.\n",
            "t-statistic: -3.122986183296264\n",
            "p-value: 0.0021665735148038933\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "**5. You are interested in estimating the mean total bill amount for all the restaurant visits in the \"tips\" dataset. Create a 95% confidence interval for this estimate.**\n",
        "\n",
        "a. Calculate the sample mean and sample standard deviation for the total bill amounts."
      ],
      "metadata": {
        "id": "vmyPGGUqt1pN"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "sample_mean = tips['total_bill'].mean()\n",
        "sample_std = tips['total_bill'].std()\n",
        "\n",
        "print(\"Sample Mean:\", sample_mean)\n",
        "print(\"Sample Standard Deviation:\", sample_std)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "Sj2NaZ6Oz4lU",
        "outputId": "2258823a-7860-447c-fd13-c75c69dac859"
      },
      "execution_count": 35,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Sample Mean: 19.78594262295082\n",
            "Sample Standard Deviation: 8.902411954856856\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "b. Determine the margin of error for a 95% confidence interval.\n",
        "\n",
        "The margin of error for a 95% confidence interval can be determined using the formula:\n",
        "\n",
        "Margin of Error = (Critical Value) * (Standard Deviation / √Sample Size)"
      ],
      "metadata": {
        "id": "iuNXkK-Vz_zE"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# Set the confidence level and find the critical value (z-value for a normal distribution)\n",
        "confidence_level = 0.95\n",
        "alpha = 1 - confidence_level\n",
        "critical_value = stats.norm.pdf(1-alpha/2)\n",
        "\n",
        "# Set the sample size\n",
        "sample_size = len(tips)\n",
        "\n",
        "# Calculate the margin of error\n",
        "margin_of_error = critical_value * (sample_std/np.sqrt(sample_size))\n",
        "\n",
        "print(\"Margin of Error:\", margin_of_error)\n"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "pD0dKKR2z4jG",
        "outputId": "cb2262fc-5eaf-4c4e-cc09-ccae5f7b6b70"
      },
      "execution_count": 37,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Margin of Error: 0.14135046577410887\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "c. Calculate the lower and upper bounds of the confidence interval."
      ],
      "metadata": {
        "id": "IBERLwiuAwIl"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# Calculate the lower and upper bounds of the confidence interval\n",
        "lower_bound = sample_mean - margin_of_error\n",
        "upper_bound = sample_mean + margin_of_error\n",
        "\n",
        "print(\"Confidence Interval (95%): [{:.2f}, {:.2f}]\".format(lower_bound, upper_bound))\n"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "fdfJ-FaTz4g8",
        "outputId": "5e130e1e-362a-4212-91b5-cdaa8261f7e4"
      },
      "execution_count": 38,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Confidence Interval (95%): [19.64, 19.93]\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [],
      "metadata": {
        "id": "PU9x__AgNYOb"
      },
      "execution_count": null,
      "outputs": []
    }
  ]
}