{ "cells": [ { "cell_type": "markdown", "id": "2a5ff1be-2da3-4ce2-9a6c-9f171f1ff6c1", "metadata": { "id": "2a5ff1be-2da3-4ce2-9a6c-9f171f1ff6c1" }, "source": [ "# DS 122 Homework 2 Computational\n", "\n", "**Due Sep., 27th**\n", "\n", "**Full credit is 50 points (With Bonus Question: 55 Points)**\n", "\n", "**Name: Xiang Fu**\n", "\n", "**BUID: U69445651**\n", "\n", "Most homeworks will involve “analytical” questions, and many will involve “computational” questions. This homework involves analytical and computational questions.\n", "\n", "**NOTE**\n", "\n", "- It is advised not to use CHAT GPT or any other LLM to complete the homeworks and try the questions on your own unless otherwise stated in the question.\n", "\n", "- Try to answer the questions in detail, In case you do not get the correct answer, we will take into consideration the steps (process you take to solve the question) which will help you in getting partial points.\n", "\n", "- Coding questions might seem a little daunting at first but if you go through them you will notice that a lot of answers are directly available in the notebook. It is more for your understanding than for testing, if you are unable to find a solution at first please try reading up the documentation and your class notes. We are always available during our office hours in case you have doubts regarding a topic.\n", "\n", "**SUBMISSION GUIDELINES**\n", "\n", "- For coding questions, please edit the jupyter notebook itself in the space provided to input your answer. You can choose to create a new cell to enter your code so as to not lose the sample output.\n", "\n", "- Final submissions should contain both your code (Jupyter Notebook) as well as mathematical files (Scanned or Typed PDF). You can select more than one file while uploading during submission. Please try to use the following naming convention for your submissions **{FirstName}\\_\\{LastName}\\_\\{BUID}\\_\\{analytical/computational}.zip**\n", "\n", "- This is the coding part of the homework, please submit this in the HW2 Computation Part on gradescope as a python notebook. The first part is directly to be completed on gradescope.\n", "\n" ] }, { "cell_type": "markdown", "id": "f039bb78-5200-4bce-bdd3-f04f68fe344e", "metadata": { "id": "f039bb78-5200-4bce-bdd3-f04f68fe344e" }, "source": [ "## Computational\n", "\n", "- Add your answers in the same cell as the code or add another cell by copy pasting the existing cell\n", "- Outputs from the answer key have been left as they are for your reference. My personal suggestion would be to create a new cell with the same code copied and make sure that the output coming is the same." ] }, { "cell_type": "markdown", "id": "7061ccc5-e39c-4ec5-84b9-435fcfd54586", "metadata": { "id": "7061ccc5-e39c-4ec5-84b9-435fcfd54586" }, "source": [ "#### Problem E - Sampling\n", "**20 points**" ] }, { "cell_type": "markdown", "id": "cad0080c-fd41-472b-94cc-558f34221e98", "metadata": { "id": "cad0080c-fd41-472b-94cc-558f34221e98" }, "source": [ "**Part a - 15 points**\n", "\n", "Consider an office with a 3 level heirarchy (Associates, Senior Associates and Partners)\n", "Salry Range for these groups are as follows:\n", "- Associates are [40000, 49999]\n", "- Senior Associates are [50000, 99999]\n", "- Partners are [100000, 150000]" ] }, { "cell_type": "code", "execution_count": 1, "id": "00920544-0dc8-4d90-b00f-8f2db81451d0", "metadata": { "id": "00920544-0dc8-4d90-b00f-8f2db81451d0" }, "outputs": [], "source": [ "# importing libraries\n", "import numpy as np # TODO: Import Numpy" ] }, { "cell_type": "code", "execution_count": 2, "id": "7eae7099-bbb0-452f-8472-56801694fe53", "metadata": { "id": "7eae7099-bbb0-452f-8472-56801694fe53" }, "outputs": [], "source": [ "# creating a simulated dataset of salaries (in USD) for a group of employees in a large corporation, based on job levels\n", "assoc_salaries = np.random.choice(np.arange(40000, 50000), 5000, replace=True) # TODO Enter range for salaries and fill the functions\n", "senior_assoc_salaries = np.random.choice(np.arange(50000, 100000), 3000, replace=True) # TODO Enter range for salaries and fill the functions\n", "partner_salaries = np.random.choice(np.arange(100000, 150001), 2000, replace=True) # TODO Enter range for salaries and fill the functions\n", "all_salaries = np.concatenate([assoc_salaries, senior_assoc_salaries, partner_salaries]) # TODO Concatenate the salaries\n", "\n", "# What is the second parameter being passed to arange function, what is its significance (Just for your understanding, not scored)\n", "\n", "# The second parameter being passed to the np.arange function is the 'stop' value.\n", "# The function generates values starting from the 'start' value up to, but not including, the 'stop' value.\n", "# So the significance of this parameter is to define the upper limit of the range of values being generated.\n" ] }, { "cell_type": "code", "execution_count": 3, "id": "6e26c205-0cc2-46ca-9ac8-4f5abe56ab2a", "metadata": { "id": "6e26c205-0cc2-46ca-9ac8-4f5abe56ab2a" }, "outputs": [], "source": [ "assoc_sample = np.random.choice(assoc_salaries, 100, replace=False) # TODO: Write code to sample 100 associate salaries\n", "senior_assoc_sample = np.random.choice(senior_assoc_salaries, 80, replace=False) # TODO: Write code to sample 80 senior associate salaries\n", "partner_sample = np.random.choice(partner_salaries, 50, replace=False) # TODO: Write code to sample 50 partner salaries" ] }, { "cell_type": "markdown", "id": "fe7df61e-c751-4a9c-878f-8e374a584147", "metadata": { "id": "fe7df61e-c751-4a9c-878f-8e374a584147" }, "source": [ "### This part is only to explain real world scenarios, it is already done for you. Just run the cell below and you should be good\n", "\n", "##### Stratified sampling is a method of sampling from a population that can be partitioned into subpopulations or \"strata\". Each stratum is homogeneous, or similar, but there's significant variability between the different strata. The goal of stratified sampling is to ensure that each subgroup is adequately represented within the whole sample collection of a population." ] }, { "cell_type": "code", "execution_count": 4, "id": "c1251106-cdbb-46a2-b327-dab32c7faaa5", "metadata": { "id": "c1251106-cdbb-46a2-b327-dab32c7faaa5" }, "outputs": [], "source": [ "stratified_sample = np.concatenate([assoc_sample, senior_assoc_sample, partner_sample])" ] }, { "cell_type": "code", "execution_count": 5, "id": "2854c2cc-cb5b-4da2-99fd-19e619529fd0", "metadata": { "id": "2854c2cc-cb5b-4da2-99fd-19e619529fd0" }, "outputs": [], "source": [ "# calculating mean and standard deviation of the stratified sample\n", "\n", "sample_mean = np.mean(stratified_sample) # TODO: find mean of the stratified sample\n", "sample_std_dev = np.std(stratified_sample) # TODO: find standard deviation of the stratified sample" ] }, { "cell_type": "code", "source": [ "# displaying the answers\n", "print(sample_mean)\n", "print(sample_std_dev)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "F_9vDnXwHbcY", "outputId": "ab7e133c-e578-4e3b-ce1d-18fdfd5c6f64" }, "id": "F_9vDnXwHbcY", "execution_count": 6, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "71808.32608695653\n", "31586.165795538283\n" ] } ] }, { "cell_type": "markdown", "id": "94df6431-df7c-4651-b55b-efe0ad4132e5", "metadata": { "id": "94df6431-df7c-4651-b55b-efe0ad4132e5" }, "source": [ "#### Let us see if there was actually any use of doing this, we shall find the mean and standard deviation of combined salaries without the stratified information" ] }, { "cell_type": "code", "execution_count": 7, "id": "c434c1b1-636c-4fc6-90f7-58bc79e0f9ed", "metadata": { "id": "c434c1b1-636c-4fc6-90f7-58bc79e0f9ed" }, "outputs": [], "source": [ "# Calculating the mean and standard deviation of the combined salaries without stratification\n", "\n", "non_strata_sample_mean = np.mean(all_salaries)\n", "non_strata_sample_std = np.std(all_salaries)" ] }, { "cell_type": "code", "execution_count": 8, "id": "dded47b4-1574-4260-94d0-25e966e0287c", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "dded47b4-1574-4260-94d0-25e966e0287c", "outputId": "0857b8be-7e04-4fe4-901e-f7fd5c1c2eb8" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "69981.4971\n", "32045.23867445508\n" ] } ], "source": [ "print(non_strata_sample_mean)\n", "print(non_strata_sample_std)" ] }, { "cell_type": "markdown", "id": "8df3c048-f97d-425d-b9ea-ec86fe2fbfb1", "metadata": { "id": "8df3c048-f97d-425d-b9ea-ec86fe2fbfb1" }, "source": [ "##### Try to think which method would yield better results in a real world scenario, it is a minor difference in this example but in real world cases small changes and methods like this can make major difference.\n", "\n", "---" ] }, { "cell_type": "markdown", "id": "5a56ec8d-01e5-48dd-b868-9ba3c46cca4c", "metadata": { "id": "5a56ec8d-01e5-48dd-b868-9ba3c46cca4c" }, "source": [ "**Part b - 3 points**\n", "\n", "Select 100 random samples of size 20 and store it in 'random_samples'. Use numpy to create samples." ] }, { "cell_type": "code", "execution_count": 9, "id": "a6b2f162-053b-4fc8-954f-0e318a3d7242", "metadata": { "id": "a6b2f162-053b-4fc8-954f-0e318a3d7242" }, "outputs": [], "source": [ "# finding out the shape of X run the cell as it is\n", "n = stratified_sample.shape\n", "\n", "# declaring an empty list to append samples into\n", "data_samples = []" ] }, { "cell_type": "code", "execution_count": 10, "id": "c48cb34b-c105-444d-a3df-5ece08d3f375", "metadata": { "id": "c48cb34b-c105-444d-a3df-5ece08d3f375" }, "outputs": [], "source": [ "'''\n", "Write your code by replacing the comment below\n", "'''\n", "\n", "def random_sampling(df,no_of_samples, sample_size):\n", " random_samples = []\n", " for i in range(no_of_samples): # Enter the range for the no_of_samples\n", " # Randomly sampling without replacement\n", " sample = np.random.choice(df, size=sample_size, replace=False) # TODO: Randomly choose with size as sample_size\n", " random_samples.append(sample)\n", " random_samples = np.array(random_samples)\n", " return random_samples # TODO: Return the random samples" ] }, { "cell_type": "code", "execution_count": 11, "id": "c7ba3fbf-ec4f-41f1-a03c-06900db415d5", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "c7ba3fbf-ec4f-41f1-a03c-06900db415d5", "outputId": "f700bdbc-f4e1-454a-c512-19e6f2172972" }, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "array([[129200, 46594, 61562, 89662, 41137, 43827, 137671, 59101,\n", " 104246, 58873, 41994, 106482, 62396, 75567, 59285, 50535,\n", " 46440, 43716, 67300, 133856],\n", " [115355, 49093, 71452, 66876, 115204, 130944, 120491, 59012,\n", " 149283, 113558, 42254, 41137, 47104, 70163, 61658, 41531,\n", " 55472, 113975, 87867, 107820]])" ] }, "metadata": {}, "execution_count": 11 } ], "source": [ "# calling the function for the data, run the cell as it is\n", "data_samples = random_sampling(stratified_sample, 100, 20)\n", "# printing 2 examples of random samples\n", "data_samples[:2]" ] }, { "cell_type": "code", "execution_count": 12, "id": "52950b7f-0095-4c11-a82c-7dbcfc7482aa", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 430 }, "id": "52950b7f-0095-4c11-a82c-7dbcfc7482aa", "outputId": "98b1673e-2fb1-4efc-e2b9-21187da53f86" }, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "<Figure size 640x480 with 1 Axes>" ], "image/png": "iVBORw0KGgoAAAANSUhEUgAAAiMAAAGdCAYAAADAAnMpAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/bCgiHAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAoAklEQVR4nO3df1RU953/8df4g0FXwZ8wiGgwGPyFv6sZ04rdkCDLycp2j2tdtxirds3iWa1ZTMmm8Ws8Lp5mrfFU6482ht01lMTW4G5qdAmWuolohEgimtKYuEIsg9mooCZBA5/vHzlOMvJzEPzI8Hycc/+4n/v53Pu+H2eur3O5M+MwxhgBAABY0s12AQAAoGsjjAAAAKsIIwAAwCrCCAAAsIowAgAArCKMAAAAqwgjAADAKsIIAACwqoftAlqjvr5ef/rTn9S3b185HA7b5QAAgFYwxujKlSsaMmSIunVr+v5Hpwgjf/rTnxQVFWW7DAAA0AYVFRUaOnRok9s7RRjp27evpC9PJiQkxHI1AACgNWpqahQVFeX9f7wpnSKM3PzTTEhICGEEAIBOpqVHLHiAFQAAWEUYAQAAVhFGAACAVYQRAABgFWEEAABYRRgBAABWEUYAAIBVhBEAAGAVYQQAAFhFGAEAAFbdVhjZsGGDHA6HVq5c2Wy/PXv2aNSoUQoODlZcXJz2799/O4cFAAABpM1h5Pjx49qxY4fGjx/fbL8jR45o/vz5Wrx4sU6cOKGUlBSlpKSotLS0rYcGAAABpE1h5OrVq1qwYIF+8YtfqH///s323bx5s2bPnq309HSNHj1a69at0+TJk7Vly5Y2FQwAAAJLm8JIWlqakpOTlZCQ0GLfwsLCBv0SExNVWFjY5Jja2lrV1NT4LAAAIDD18HdATk6O3n77bR0/frxV/T0ej8LDw33awsPD5fF4mhyTmZmptWvX+lsaAMvWNvIz4WuMsVBJ67W15lvH3e3n2VaNzc+tAvXccef4dWekoqJCK1as0Isvvqjg4OCOqkkZGRmqrq72LhUVFR12LAAAYJdfd0aKi4t14cIFTZ482dtWV1enw4cPa8uWLaqtrVX37t19xrhcLlVVVfm0VVVVyeVyNXkcp9Mpp9PpT2kAAKCT8uvOyIMPPqiTJ0+qpKTEu0ydOlULFixQSUlJgyAiSW63W/n5+T5teXl5crvdt1c5AAAICH7dGenbt6/GjRvn0/Znf/ZnGjhwoLc9NTVVkZGRyszMlCStWLFC8fHx2rhxo5KTk5WTk6OioiLt3LmznU4BAAB0Zu3+Dazl5eWqrKz0rs+YMUPZ2dnauXOnJkyYoF//+tfKzc1tEGoAAEDX5PenaW5VUFDQ7LokzZ07V3Pnzr3dQwEAgADEb9MAAACrCCMAAMAqwggAALCKMAIAAKwijAAAAKsIIwAAwCrCCAAAsIowAgAArCKMAAAAqwgjAADAKsIIAACwijACAACsIowAAACrCCMAAMAqwggAALCKMAIAAKwijAAAAKsIIwAAwCrCCAAAsIowAgAArCKMAAAAqwgjAADAKsIIAACwijACAACsIowAAACrCCMAAMAqwggAALCKMAIAAKwijAAAAKsIIwAAwCrCCAAAsIowAgAArCKMAAAAq/wKI9u2bdP48eMVEhKikJAQud1uvfbaa032z8rKksPh8FmCg4Nvu2gAABA4evjTeejQodqwYYNGjhwpY4z+7d/+TXPmzNGJEyc0duzYRseEhISorKzMu+5wOG6vYgAAEFD8CiOPPPKIz/r69eu1bds2HT16tMkw4nA45HK52l4hAAAIaG1+ZqSurk45OTm6du2a3G53k/2uXr2q4cOHKyoqSnPmzNGpU6da3Hdtba1qamp8FgAAEJj8DiMnT55Unz595HQ6tWzZMr3yyisaM2ZMo31jY2O1a9cu7du3T7t371Z9fb1mzJihjz76qNljZGZmKjQ01LtERUX5WyYAAOgk/A4jsbGxKikp0bFjx/TYY49p4cKFOn36dKN93W63UlNTNXHiRMXHx2vv3r0aPHiwduzY0ewxMjIyVF1d7V0qKir8LRMAAHQSfj0zIklBQUGKiYmRJE2ZMkXHjx/X5s2bWwwYktSzZ09NmjRJZ86cabaf0+mU0+n0tzQAANAJ3fb3jNTX16u2trZVfevq6nTy5ElFRETc7mEBAECA8OvOSEZGhpKSkjRs2DBduXJF2dnZKigo0MGDByVJqampioyMVGZmpiTpmWee0f3336+YmBhdvnxZzz77rM6dO6clS5a0/5kAAIBOya8wcuHCBaWmpqqyslKhoaEaP368Dh48qIceekiSVF5erm7dvrrZcunSJS1dulQej0f9+/fXlClTdOTIkSYfeAUAAF2PX2Hk+eefb3Z7QUGBz/qmTZu0adMmv4sCAABdB79NAwAArCKMAAAAqwgjAADAKsIIAACwijACAACsIowAAACrCCMAAMAqwggAALCKMAIAAKwijAAAAKsIIwAAwCrCCAAAsIowAgAArCKMAAAAqwgjAADAKsIIAACwijACAACsIowAAACrCCMAAMAqwggAALCKMAIAAKwijAAAAKsIIwAAwCrCCAAAsIowAgAArCKMAAAAqwgjAADAKsIIAACwijACAACsIowAAACrCCMAAMAqwggAALCKMAIAAKzyK4xs27ZN48ePV0hIiEJCQuR2u/Xaa681O2bPnj0aNWqUgoODFRcXp/37999WwQAAILD4FUaGDh2qDRs2qLi4WEVFRfrzP/9zzZkzR6dOnWq0/5EjRzR//nwtXrxYJ06cUEpKilJSUlRaWtouxQMAgM7PrzDyyCOP6C/+4i80cuRI3XfffVq/fr369Omjo0ePNtp/8+bNmj17ttLT0zV69GitW7dOkydP1pYtW9qleAAA0Pm1+ZmRuro65eTk6Nq1a3K73Y32KSwsVEJCgk9bYmKiCgsLm913bW2tampqfBYAABCYevg74OTJk3K73fr888/Vp08fvfLKKxozZkyjfT0ej8LDw33awsPD5fF4mj1GZmam1q5d629pACStdTh81tcY02F97iSb9dx67M6gsZpt/xsCTfH7zkhsbKxKSkp07NgxPfbYY1q4cKFOnz7drkVlZGSourrau1RUVLTr/gEAwN3D7zsjQUFBiomJkSRNmTJFx48f1+bNm7Vjx44GfV0ul6qqqnzaqqqq5HK5mj2G0+mU0+n0tzQAANAJ3fb3jNTX16u2trbRbW63W/n5+T5teXl5TT5jAgAAuh6/7oxkZGQoKSlJw4YN05UrV5Sdna2CggIdPHhQkpSamqrIyEhlZmZKklasWKH4+Hht3LhRycnJysnJUVFRkXbu3Nn+ZwIAADolv8LIhQsXlJqaqsrKSoWGhmr8+PE6ePCgHnroIUlSeXm5unX76mbLjBkzlJ2draeeekpPPvmkRo4cqdzcXI0bN659zwIAAHRafoWR559/vtntBQUFDdrmzp2ruXPn+lUUAADoOvhtGgAAYBVhBAAAWEUYAQAAVhFGAACAVYQRAABgFWEEAABYRRgBAABWEUYAAIBVhBEAAGAVYQQAAFhFGAEAAFYRRgAAgFWEEQAAYBVhBAAAWEUYAQAAVhFGAACAVYQRAABgFWEEAABYRRgBAABWEUYAAIBVhBEAAGAVYQQAAFhFGAEAAFYRRgAAgFWEEQAAYBVhBAAAWEUYAQAAVhFGAACAVYQRAABgFWEEAABYRRgBAABWEUYAAIBVhBEAAGCVX2EkMzNT3/jGN9S3b1+FhYUpJSVFZWVlzY7JysqSw+HwWYKDg2+raAAAEDj8CiO///3vlZaWpqNHjyovL083btzQww8/rGvXrjU7LiQkRJWVld7l3Llzt1U0AAAIHD386XzgwAGf9aysLIWFham4uFgzZ85scpzD4ZDL5WpbhQAAIKDd1jMj1dXVkqQBAwY02+/q1asaPny4oqKiNGfOHJ06darZ/rW1taqpqfFZAABAYGpzGKmvr9fKlSv1wAMPaNy4cU32i42N1a5du7Rv3z7t3r1b9fX1mjFjhj766KMmx2RmZio0NNS7REVFtbVMAABwl2tzGElLS1NpaalycnKa7ed2u5WamqqJEycqPj5ee/fu1eDBg7Vjx44mx2RkZKi6utq7VFRUtLVMAABwl/PrmZGbli9frldffVWHDx/W0KFD/Rrbs2dPTZo0SWfOnGmyj9PplNPpbEtpAACgk/HrzogxRsuXL9crr7yiQ4cOKTo62u8D1tXV6eTJk4qIiPB7LAAACDx+3RlJS0tTdna29u3bp759+8rj8UiSQkND1atXL0lSamqqIiMjlZmZKUl65plndP/99ysmJkaXL1/Ws88+q3PnzmnJkiXtfCoAAKAz8iuMbNu2TZI0a9Ysn/YXXnhBjz76qCSpvLxc3bp9dcPl0qVLWrp0qTwej/r3768pU6boyJEjGjNmzO1VDgAAAoJfYcQY02KfgoICn/VNmzZp06ZNfhUFAAC6Dn6bBgAAWEUYAQAAVhFGAACAVYQRAABgFWEEAABYRRgBAABWEUYAAIBVhBEAAGAVYQQAAFhFGAEAAFYRRgAAgFWEEQAAYBVhBAAAWEUYAQAAVhFGAACAVYQRAABgFWEEAABYRRgBAABWEUYAAIBVhBEAAGAVYQQAAFhFGAEAAFYRRgAAgFWEEQAAYBVhBAAAWEUYAQAAVhFGAACAVYQRAABgFWEEAABYRRgBAABWEUYAAIBVhBEAAGAVYQQAAFjlVxjJzMzUN77xDfXt21dhYWFKSUlRWVlZi+P27NmjUaNGKTg4WHFxcdq/f3+bCwYAAIHFrzDy+9//XmlpaTp69Kjy8vJ048YNPfzww7p27VqTY44cOaL58+dr8eLFOnHihFJSUpSSkqLS0tLbLh4AAHR+PfzpfODAAZ/1rKwshYWFqbi4WDNnzmx0zObNmzV79mylp6dLktatW6e8vDxt2bJF27dvb2PZAAAgUNzWMyPV1dWSpAEDBjTZp7CwUAkJCT5tiYmJKiwsbHJMbW2tampqfBYAABCY/Loz8nX19fVauXKlHnjgAY0bN67Jfh6PR+Hh4T5t4eHh8ng8TY7JzMzU2rVr21raXWutw9GgbY0xFiq5O9w6H43NRWNzdqs7OYet+TfsDDV3FW19z91tc9Zer6nWvOfay508Vnu52967jemM89oabb4zkpaWptLSUuXk5LRnPZKkjIwMVVdXe5eKiop2PwYAALg7tOnOyPLly/Xqq6/q8OHDGjp0aLN9XS6XqqqqfNqqqqrkcrmaHON0OuV0OttSGgAA6GT8ujNijNHy5cv1yiuv6NChQ4qOjm5xjNvtVn5+vk9bXl6e3G63f5UCAICA5NedkbS0NGVnZ2vfvn3q27ev97mP0NBQ9erVS5KUmpqqyMhIZWZmSpJWrFih+Ph4bdy4UcnJycrJyVFRUZF27tzZzqcCAAA6I7/ujGzbtk3V1dWaNWuWIiIivMtLL73k7VNeXq7Kykrv+owZM5Sdna2dO3dqwoQJ+vWvf63c3NxmH3oFAABdh193RkwrntotKCho0DZ37lzNnTvXn0MBAIAugt+mAQAAVhFGAACAVYQRAABgFWEEAABYRRgBAABWEUYAAIBVhBEAAGAVYQQAAFhFGAEAAFYRRgAAgFWEEQAAYBVhBAAAWEUYAQAAVhFGAACAVYQRAABgFWEEAABYRRgBAABWEUYAAIBVhBEAAGAVYQQAAFhFGAEAAFYRRgAAgFWEEQAAYBVhBAAAWEUYAQAAVhFGAACAVYQRAABgFWEEAABYRRgBAABWEUYAAIBVhBEAAGAVYQQAAFhFGAEAAFb5HUYOHz6sRx55REOGDJHD4VBubm6z/QsKCuRwOBosHo+nrTUDAIAA4ncYuXbtmiZMmKCtW7f6Na6srEyVlZXeJSwszN9DAwCAANTD3wFJSUlKSkry+0BhYWHq16+f3+MAAEBgu2PPjEycOFERERF66KGH9Oabbzbbt7a2VjU1NT4LAAAITB0eRiIiIrR9+3b95je/0W9+8xtFRUVp1qxZevvtt5sck5mZqdDQUO8SFRXV0WUCAABL/P4zjb9iY2MVGxvrXZ8xY4Y++OADbdq0Sf/xH//R6JiMjAytWrXKu15TU0MgAQAgQHV4GGnMtGnT9MYbbzS53el0yul03sGKAACALVa+Z6SkpEQRERE2Dg0AAO4yft8ZuXr1qs6cOeNdP3v2rEpKSjRgwAANGzZMGRkZOn/+vP793/9dkvTcc88pOjpaY8eO1eeff65f/vKXOnTokP77v/+7/c4CAAB0Wn6HkaKiIn3729/2rt98tmPhwoXKyspSZWWlysvLvduvX7+uxx9/XOfPn1fv3r01fvx4vf766z77AAAAXZffYWTWrFkyxjS5PSsry2d99erVWr16td+FAQCAroHfpgEAAFYRRgAAgFWEEQAAYBVhBAAAWEUYAQAAVhFGAACAVYQRAABgFWEEAABYRRgBAABWEUYAAIBVhBEAAGAVYQQAAFhFGAEAAFYRRgAAgFWEEQAAYBVhBAAAWEUYAQAAVhFGAACAVYQRAABgFWEEAABYRRgBAABWEUYAAIBVhBEAAGAVYQQAAFhFGAEAAFYRRgAAgFWEEQAAYBVhBAAAWEUYAQAAVhFGAACAVYQRAABgFWEEAABYRRgBAABW+R1GDh8+rEceeURDhgyRw+FQbm5ui2MKCgo0efJkOZ1OxcTEKCsrqw2lAgCAQOR3GLl27ZomTJigrVu3tqr/2bNnlZycrG9/+9sqKSnRypUrtWTJEh08eNDvYgEAQODp4e+ApKQkJSUltbr/9u3bFR0drY0bN0qSRo8erTfeeEObNm1SYmKiv4cHAAABpsOfGSksLFRCQoJPW2JiogoLC5scU1tbq5qaGp8FAAAEJr/vjPjL4/EoPDzcpy08PFw1NTX67LPP1KtXrwZjMjMztXbt2o4uTZK01uFo0LbGmA7bt839ttd53aq95rCt59Wa47dm3x01P41pr3/D9nqNtefct2VMW86rra+7jnpftsbddr2x/d7tjO7knHXkNenW49/J619j7spP02RkZKi6utq7VFRU2C4JAAB0kA6/M+JyuVRVVeXTVlVVpZCQkEbvikiS0+mU0+ns6NIAAMBdoMPvjLjdbuXn5/u05eXlye12d/ShAQBAJ+B3GLl69apKSkpUUlIi6cuP7paUlKi8vFzSl39iSU1N9fZftmyZPvzwQ61evVp/+MMf9POf/1wvv/yyfvjDH7bPGQAAgE7N7zBSVFSkSZMmadKkSZKkVatWadKkSXr66aclSZWVld5gIknR0dH67W9/q7y8PE2YMEEbN27UL3/5Sz7WCwAAJLXhmZFZs2bJNPPUbWPfrjpr1iydOHHC30MBAIAu4K78NA0AAOg6CCMAAMAqwggAALCKMAIAAKwijAAAAKsIIwAAwCrCCAAAsIowAgAArCKMAAAAqwgjAADAKsIIAACwijACAACsIowAAACrCCMAAMAqwggAALCKMAIAAKwijAAAAKsIIwAAwCrCCAAAsIowAgAArCKMAAAAqwgjAADAKsIIAACwijACAACsIowAAACrCCMAAMAqwggAALCKMAIAAKwijAAAAKsIIwAAwCrCCAAAsIowAgAArCKMAAAAq9oURrZu3ap77rlHwcHBmj59ut56660m+2ZlZcnhcPgswcHBbS4YAAAEFr/DyEsvvaRVq1ZpzZo1evvttzVhwgQlJibqwoULTY4JCQlRZWWldzl37txtFQ0AAAKH32Hkpz/9qZYuXapFixZpzJgx2r59u3r37q1du3Y1OcbhcMjlcnmX8PDw2yoaAAAEDr/CyPXr11VcXKyEhISvdtCtmxISElRYWNjkuKtXr2r48OGKiorSnDlzdOrUqWaPU1tbq5qaGp8FAAAEJr/CyP/93/+prq6uwZ2N8PBweTyeRsfExsZq165d2rdvn3bv3q36+nrNmDFDH330UZPHyczMVGhoqHeJioryp0wAANCJdPinadxut1JTUzVx4kTFx8dr7969Gjx4sHbs2NHkmIyMDFVXV3uXioqKji4TAABY0sOfzoMGDVL37t1VVVXl015VVSWXy9WqffTs2VOTJk3SmTNnmuzjdDrldDr9KQ0AAHRSft0ZCQoK0pQpU5Sfn+9tq6+vV35+vtxud6v2UVdXp5MnTyoiIsK/SgEAQEDy686IJK1atUoLFy7U1KlTNW3aND333HO6du2aFi1aJElKTU1VZGSkMjMzJUnPPPOM7r//fsXExOjy5ct69tlnde7cOS1ZsqR9zwQAAHRKfoeRefPm6eOPP9bTTz8tj8ejiRMn6sCBA96HWsvLy9Wt21c3XC5duqSlS5fK4/Gof//+mjJlio4cOaIxY8a031kAAIBOy+8wIknLly/X8uXLG91WUFDgs75p0yZt2rSpLYcBAABdAL9NAwAArCKMAAAAqwgjAADAKsIIAACwijACAACsIowAAACrCCMAAMAqwggAALCKMAIAAKwijAAAAKsIIwAAwCrCCAAAsIowAgAArCKMAAAAqwgjAADAKsIIAACwijACAACsIowAAACrCCMAAMAqwggAALCKMAIAAKwijAAAAKsIIwAAwCrCCAAAsIowAgAArCKMAAAAqwgjAADAKsIIAACwijACAACsIowAAACrCCMAAMAqwggAALCKMAIAAKxqUxjZunWr7rnnHgUHB2v69Ol66623mu2/Z88ejRo1SsHBwYqLi9P+/fvbVCwAAAg8foeRl156SatWrdKaNWv09ttva8KECUpMTNSFCxca7X/kyBHNnz9fixcv1okTJ5SSkqKUlBSVlpbedvEAAKDz8zuM/PSnP9XSpUu1aNEijRkzRtu3b1fv3r21a9euRvtv3rxZs2fPVnp6ukaPHq1169Zp8uTJ2rJly20XDwAAOr8e/nS+fv26iouLlZGR4W3r1q2bEhISVFhY2OiYwsJCrVq1yqctMTFRubm5TR6ntrZWtbW13vXq6mpJUk1NjT/ltsrnjbS113Ea23d7HKs1+22vY7VGW+ewrefRGrcev73+LVpzrnfbeXXUsVt7/NbU3F7ndbfNz63uxvfKrTpqDjvq+tOe7uS1ta3XpFvHtXVe22s/Lbm5X2NM8x2NH86fP28kmSNHjvi0p6enm2nTpjU6pmfPniY7O9unbevWrSYsLKzJ46xZs8ZIYmFhYWFhYQmApaKiotl84dedkTslIyPD525KfX29Ll68qIEDB8rhcFipqaamRlFRUaqoqFBISIiVGu4WzIUv5uMrzIUv5uMrzIWvrjIfxhhduXJFQ4YMabafX2Fk0KBB6t69u6qqqnzaq6qq5HK5Gh3jcrn86i9JTqdTTqfTp61fv37+lNphQkJCAvqF4w/mwhfz8RXmwhfz8RXmwldXmI/Q0NAW+/j1AGtQUJCmTJmi/Px8b1t9fb3y8/PldrsbHeN2u336S1JeXl6T/QEAQNfi959pVq1apYULF2rq1KmaNm2annvuOV27dk2LFi2SJKWmpioyMlKZmZmSpBUrVig+Pl4bN25UcnKycnJyVFRUpJ07d7bvmQAAgE7J7zAyb948ffzxx3r66afl8Xg0ceJEHThwQOHh4ZKk8vJydev21Q2XGTNmKDs7W0899ZSefPJJjRw5Urm5uRo3blz7ncUd4HQ6tWbNmgZ/PuqKmAtfzMdXmAtfzMdXmAtfzIcvhzEtfd4GAACg4/DbNAAAwCrCCAAAsIowAgAArCKMAAAAqwI2jPy///f/5HA4fJZRo0Z5t8+aNavB9mXLlvnso7y8XMnJyerdu7fCwsKUnp6uL774wqdPQUGBJk+eLKfTqZiYGGVlZTWoZevWrbrnnnsUHBys6dOn66233uqQc27O+fPn9Xd/93caOHCgevXqpbi4OBUVFXm3G2P09NNPKyIiQr169VJCQoLef/99n31cvHhRCxYsUEhIiPr166fFixfr6tWrPn3effddfetb31JwcLCioqL0k5/8pEEte/bs0ahRoxQcHKy4uDjt37+/Y066GS3Nx6OPPtrg9TF79myffQTKfNxzzz0NztXhcCgtLU2S9PnnnystLU0DBw5Unz599Nd//dcNvsgwUN4rLc1FV7tu1NXV6cc//rGio6PVq1cv3XvvvVq3bp3P74x0lWtHa+aiK1032l1Lv0fTWa1Zs8aMHTvWVFZWepePP/7Yuz0+Pt4sXbrUZ3t1dbV3+xdffGHGjRtnEhISzIkTJ8z+/fvNoEGDTEZGhrfPhx9+aHr37m1WrVplTp8+bX72s5+Z7t27mwMHDnj75OTkmKCgILNr1y5z6tQps3TpUtOvXz9TVVV1ZybCGHPx4kUzfPhw8+ijj5pjx46ZDz/80Bw8eNCcOXPG22fDhg0mNDTU5Obmmnfeecf85V/+pYmOjjafffaZt8/s2bPNhAkTzNGjR83//M//mJiYGDN//nzv9urqahMeHm4WLFhgSktLza9+9SvTq1cvs2PHDm+fN99803Tv3t385Cc/MadPnzZPPfWU6dmzpzl58uSdmQzTuvlYuHChmT17ts/r4+LFiz77CZT5uHDhgs955uXlGUnmd7/7nTHGmGXLlpmoqCiTn59vioqKzP33329mzJjhHR9I75WW5qIrXTeMMWb9+vVm4MCB5tVXXzVnz541e/bsMX369DGbN2/29ukq147WzEVXum60t4AOIxMmTGhye3x8vFmxYkWT2/fv32+6detmPB6Pt23btm0mJCTE1NbWGmOMWb16tRk7dqzPuHnz5pnExETv+rRp00xaWpp3va6uzgwZMsRkZmb6eUZt98QTT5hvfvObTW6vr683LpfLPPvss962y5cvG6fTaX71q18ZY4w5ffq0kWSOHz/u7fPaa68Zh8Nhzp8/b4wx5uc//7np37+/d35uHjs2Nta7/jd/8zcmOTnZ5/jTp083f//3f397J+mHlubDmC8vKnPmzGlyeyDNx61WrFhh7r33XlNfX28uX75sevbsafbs2ePd/t577xlJprCw0BgTWO+VW319LozpWtcNY4xJTk423//+933avvOd75gFCxYYY7rWtaOluTCma183blfA/plGkt5//30NGTJEI0aM0IIFC1ReXu6z/cUXX9SgQYM0btw4ZWRk6NNPP/VuKywsVFxcnPfL3CQpMTFRNTU1OnXqlLdPQkKCzz4TExNVWFgoSbp+/bqKi4t9+nTr1k0JCQnePnfCf/7nf2rq1KmaO3euwsLCNGnSJP3iF7/wbj979qw8Ho9PnaGhoZo+fbq3zsLCQvXr109Tp0719klISFC3bt107Ngxb5+ZM2cqKCjI2ycxMVFlZWW6dOmSt09zc3YntDQfNxUUFCgsLEyxsbF67LHH9Mknn3i3BdJ8fN3169e1e/duff/735fD4VBxcbFu3LjhU+OoUaM0bNgwn9dGoLxXvu7Wubipq1w3pC+/tDI/P19//OMfJUnvvPOO3njjDSUlJUnqWteOlubipq543WgPd+Wv9raH6dOnKysrS7GxsaqsrNTatWv1rW99S6Wlperbt6/+9m//VsOHD9eQIUP07rvv6oknnlBZWZn27t0rSfJ4PD4XFEnedY/H02yfmpoaffbZZ7p06ZLq6uoa7fOHP/yho069gQ8//FDbtm3TqlWr9OSTT+r48eP6x3/8RwUFBWnhwoXe82mszq+fa1hYmM/2Hj16aMCAAT59oqOjG+zj5rb+/fs3OWc393EntDQfkjR79mx95zvfUXR0tD744AM9+eSTSkpKUmFhobp37x5Q8/F1ubm5unz5sh599FFJX9YZFBTU4Icqb31tBMp75etunQtJXeq6IUk/+tGPVFNTo1GjRql79+6qq6vT+vXrtWDBAknqUteOluZC6rrXjfYQsGHk62l1/Pjxmj59uoYPH66XX35Zixcv1g9+8APv9ri4OEVEROjBBx/UBx98oHvvvddGyR2mvr5eU6dO1b/8y79IkiZNmqTS0lJt377d+59vV9Ka+fjud7/r7R8XF6fx48fr3nvvVUFBgR588EErdd8Jzz//vJKSklr8ue+uoLG56ErXDUl6+eWX9eKLLyo7O1tjx45VSUmJVq5cqSFDhnS5a0dr5qKrXjfaQ0D/mebr+vXrp/vuu09nzpxpdPv06dMlybvd5XI1+MTAzXWXy9Vsn5CQEPXq1UuDBg1S9+7dG+1zcx93QkREhMaMGePTNnr0aO+frW7W0lydLpdLFy5c8Nn+xRdf6OLFiy3Ox9eP0VSfu2k+GjNixAgNGjTI5/URKPNx07lz5/T6669ryZIl3jaXy6Xr16/r8uXLPn1vfW0EynvlpsbmojGBfN2QpPT0dP3oRz/Sd7/7XcXFxel73/uefvjDH3p/CLUrXTtamovGdIXrRnvpMmHk6tWr+uCDDxQREdHo9pKSEknybne73Tp58qTPCycvL08hISHe/8jcbrfy8/N99pOXlye32y1JCgoK0pQpU3z61NfXKz8/39vnTnjggQdUVlbm0/bHP/5Rw4cPlyRFR0fL5XL51FlTU6Njx45563S73bp8+bKKi4u9fQ4dOqT6+nrvBdntduvw4cO6ceOGt09eXp5iY2PVv39/b5/m5uxOaGk+GvPRRx/pk08+8Xl9BMp83PTCCy8oLCxMycnJ3rYpU6aoZ8+ePjWWlZWpvLzc57URKO+Vmxqbi8YE8nVDkj799FOfHz6VpO7du6u+vl5S17p2tDQXjekK1412Y/sJ2o7y+OOPm4KCAnP27Fnz5ptvmoSEBDNo0CBz4cIFc+bMGfPMM8+YoqIic/bsWbNv3z4zYsQIM3PmTO/4mx/Re/jhh01JSYk5cOCAGTx4cKMf0UtPTzfvvfee2bp1a6Mf0XM6nSYrK8ucPn3a/OAHPzD9+vXzedq+o7311lumR48eZv369eb99983L774oundu7fZvXu3t8+GDRtMv379zL59+8y7775r5syZ0+jH8yZNmmSOHTtm3njjDTNy5Eifj6RdvnzZhIeHm+9973umtLTU5OTkmN69ezf4SFqPHj3Mv/7rv5r33nvPrFmz5o5/JK2l+bhy5Yr5p3/6J1NYWGjOnj1rXn/9dTN58mQzcuRI8/nnnwfcfBjz5ac1hg0bZp544okG25YtW2aGDRtmDh06ZIqKiozb7TZut9u7PZDeK8Y0PRdd7bphzJefDomMjPR+nHXv3r1m0KBBZvXq1d4+XeXa0dJcdMXrRnsK2DAyb948ExERYYKCgkxkZKSZN2+e93skysvLzcyZM82AAQOM0+k0MTExJj093ef7Aowx5n//939NUlKS6dWrlxk0aJB5/PHHzY0bN3z6/O53vzMTJ040QUFBZsSIEeaFF15oUMvPfvYzM2zYMBMUFGSmTZtmjh492mHn3ZT/+q//MuPGjTNOp9OMGjXK7Ny502d7fX29+fGPf2zCw8ON0+k0Dz74oCkrK/Pp88knn5j58+ebPn36mJCQELNo0SJz5coVnz7vvPOO+eY3v2mcTqeJjIw0GzZsaFDLyy+/bO677z4TFBRkxo4da37729+2/wm3oLn5+PTTT83DDz9sBg8ebHr27GmGDx9uli5d2uA/gkCaj4MHDxpJDf7NjTHms88+M//wD/9g+vfvb3r37m3+6q/+ylRWVvr0CaT3SlNz0RWvGzU1NWbFihVm2LBhJjg42IwYMcL88z//s8/HTrvKtaOlueiK14325DDma18fBwAAcId1mWdGAADA3YkwAgAArCKMAAAAqwgjAADAKsIIAACwijACAACsIowAAACrCCMAAMAqwggAALCKMAIAAKwijAAAAKsIIwAAwKr/D5+iD3NM0COsAAAAAElFTkSuQmCC\n" }, "metadata": {} } ], "source": [ "# plotting the means of random samples on bar graph, run as it is\n", "\n", "import matplotlib.pyplot as plt\n", "x_bar = data_samples.mean(axis=1)\n", "plt.hist(x_bar, bins=100, color = 'maroon');" ] }, { "cell_type": "markdown", "id": "05ef02cd-0dca-4732-a524-838b894554f8", "metadata": { "id": "05ef02cd-0dca-4732-a524-838b894554f8" }, "source": [ "#### Part C - Sampling on a dataset - 2 points\n", "\n", "Now that you have understood what sampling is, use pandas to load the data and the function defined above to draw samples from 'Pokemon.csv'." ] }, { "cell_type": "code", "execution_count": 13, "id": "61c5db0f-20e7-4287-b695-b822bffbcad5", "metadata": { "id": "61c5db0f-20e7-4287-b695-b822bffbcad5" }, "outputs": [], "source": [ "# importing libraries\n", "import pandas as pd" ] }, { "cell_type": "code", "source": [ "import os\n", "from google.colab import files\n", "\n", "def file_upload(allowed_file_types=None, max_size_mb=None):\n", " \"\"\"\n", " A File uploader for Google Colab.\n", "\n", " Parameters:\n", " - allowed_file_types: List of allowed file extensions. If None, all file types are accepted.\n", " - max_size_mb: Maximum allowed file size in MB. If None, all sizes are accepted.\n", "\n", " Returns:\n", " - Path to the saved uploaded file.\n", " \"\"\"\n", "\n", " # Upload the files\n", " uploaded_files = files.upload()\n", "\n", " # If no files uploaded, return None\n", " if not uploaded_files:\n", " print(\"No files uploaded.\")\n", " return None\n", "\n", " # Filter the uploaded files based on the criteria\n", " for file_name, file_content in uploaded_files.items():\n", "\n", " # Check file type\n", " if allowed_file_types:\n", " ext = os.path.splitext(file_name)[1]\n", " if ext not in allowed_file_types:\n", " print(f\"File {file_name} has an invalid file type {ext}. Skipping.\")\n", " continue\n", "\n", " # Check file size\n", " if max_size_mb:\n", " file_size_mb = len(file_content) / (1024 * 1024)\n", " if file_size_mb > max_size_mb:\n", " print(f\"File {file_name} exceeds the max size of {max_size_mb}MB. Skipping.\")\n", " continue\n", "\n", " # Save the valid uploaded file to the local environment and return its path\n", " with open(file_name, 'wb') as f:\n", " f.write(file_content)\n", " print(f\"File {file_name} uploaded and saved successfully!\")\n", " return file_name\n", "\n", " return None\n" ], "metadata": { "id": "AO5E3U_gKNHb" }, "id": "AO5E3U_gKNHb", "execution_count": 14, "outputs": [] }, { "cell_type": "code", "execution_count": 15, "id": "dff7da41-21a6-4457-8b66-c237565d6e65", "metadata": { "id": "dff7da41-21a6-4457-8b66-c237565d6e65", "colab": { "base_uri": "https://localhost:8080/", "height": 95 }, "outputId": "4953d2d7-db5c-405d-a126-47ee60c0af5b" }, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "<IPython.core.display.HTML object>" ], "text/html": [ "\n", " <input type=\"file\" id=\"files-772c6608-a205-411b-96ba-333cb96bd622\" name=\"files[]\" multiple disabled\n", " style=\"border:none\" />\n", " <output id=\"result-772c6608-a205-411b-96ba-333cb96bd622\">\n", " Upload widget is only available when the cell has been executed in the\n", " current browser session. Please rerun this cell to enable.\n", " </output>\n", " <script>// Copyright 2017 Google LLC\n", "//\n", "// Licensed under the Apache License, Version 2.0 (the \"License\");\n", "// you may not use this file except in compliance with the License.\n", "// You may obtain a copy of the License at\n", "//\n", "// http://www.apache.org/licenses/LICENSE-2.0\n", "//\n", "// Unless required by applicable law or agreed to in writing, software\n", "// distributed under the License is distributed on an \"AS IS\" BASIS,\n", "// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "// See the License for the specific language governing permissions and\n", "// limitations under the License.\n", "\n", "/**\n", " * @fileoverview Helpers for google.colab Python module.\n", " */\n", "(function(scope) {\n", "function span(text, styleAttributes = {}) {\n", " const element = document.createElement('span');\n", " element.textContent = text;\n", " for (const key of Object.keys(styleAttributes)) {\n", " element.style[key] = styleAttributes[key];\n", " }\n", " return element;\n", "}\n", "\n", "// Max number of bytes which will be uploaded at a time.\n", "const MAX_PAYLOAD_SIZE = 100 * 1024;\n", "\n", "function _uploadFiles(inputId, outputId) {\n", " const steps = uploadFilesStep(inputId, outputId);\n", " const outputElement = document.getElementById(outputId);\n", " // Cache steps on the outputElement to make it available for the next call\n", " // to uploadFilesContinue from Python.\n", " outputElement.steps = steps;\n", "\n", " return _uploadFilesContinue(outputId);\n", "}\n", "\n", "// This is roughly an async generator (not supported in the browser yet),\n", "// where there are multiple asynchronous steps and the Python side is going\n", "// to poll for completion of each step.\n", "// This uses a Promise to block the python side on completion of each step,\n", "// then passes the result of the previous step as the input to the next step.\n", "function _uploadFilesContinue(outputId) {\n", " const outputElement = document.getElementById(outputId);\n", " const steps = outputElement.steps;\n", "\n", " const next = steps.next(outputElement.lastPromiseValue);\n", " return Promise.resolve(next.value.promise).then((value) => {\n", " // Cache the last promise value to make it available to the next\n", " // step of the generator.\n", " outputElement.lastPromiseValue = value;\n", " return next.value.response;\n", " });\n", "}\n", "\n", "/**\n", " * Generator function which is called between each async step of the upload\n", " * process.\n", " * @param {string} inputId Element ID of the input file picker element.\n", " * @param {string} outputId Element ID of the output display.\n", " * @return {!Iterable<!Object>} Iterable of next steps.\n", " */\n", "function* uploadFilesStep(inputId, outputId) {\n", " const inputElement = document.getElementById(inputId);\n", " inputElement.disabled = false;\n", "\n", " const outputElement = document.getElementById(outputId);\n", " outputElement.innerHTML = '';\n", "\n", " const pickedPromise = new Promise((resolve) => {\n", " inputElement.addEventListener('change', (e) => {\n", " resolve(e.target.files);\n", " });\n", " });\n", "\n", " const cancel = document.createElement('button');\n", " inputElement.parentElement.appendChild(cancel);\n", " cancel.textContent = 'Cancel upload';\n", " const cancelPromise = new Promise((resolve) => {\n", " cancel.onclick = () => {\n", " resolve(null);\n", " };\n", " });\n", "\n", " // Wait for the user to pick the files.\n", " const files = yield {\n", " promise: Promise.race([pickedPromise, cancelPromise]),\n", " response: {\n", " action: 'starting',\n", " }\n", " };\n", "\n", " cancel.remove();\n", "\n", " // Disable the input element since further picks are not allowed.\n", " inputElement.disabled = true;\n", "\n", " if (!files) {\n", " return {\n", " response: {\n", " action: 'complete',\n", " }\n", " };\n", " }\n", "\n", " for (const file of files) {\n", " const li = document.createElement('li');\n", " li.append(span(file.name, {fontWeight: 'bold'}));\n", " li.append(span(\n", " `(${file.type || 'n/a'}) - ${file.size} bytes, ` +\n", " `last modified: ${\n", " file.lastModifiedDate ? file.lastModifiedDate.toLocaleDateString() :\n", " 'n/a'} - `));\n", " const percent = span('0% done');\n", " li.appendChild(percent);\n", "\n", " outputElement.appendChild(li);\n", "\n", " const fileDataPromise = new Promise((resolve) => {\n", " const reader = new FileReader();\n", " reader.onload = (e) => {\n", " resolve(e.target.result);\n", " };\n", " reader.readAsArrayBuffer(file);\n", " });\n", " // Wait for the data to be ready.\n", " let fileData = yield {\n", " promise: fileDataPromise,\n", " response: {\n", " action: 'continue',\n", " }\n", " };\n", "\n", " // Use a chunked sending to avoid message size limits. See b/62115660.\n", " let position = 0;\n", " do {\n", " const length = Math.min(fileData.byteLength - position, MAX_PAYLOAD_SIZE);\n", " const chunk = new Uint8Array(fileData, position, length);\n", " position += length;\n", "\n", " const base64 = btoa(String.fromCharCode.apply(null, chunk));\n", " yield {\n", " response: {\n", " action: 'append',\n", " file: file.name,\n", " data: base64,\n", " },\n", " };\n", "\n", " let percentDone = fileData.byteLength === 0 ?\n", " 100 :\n", " Math.round((position / fileData.byteLength) * 100);\n", " percent.textContent = `${percentDone}% done`;\n", "\n", " } while (position < fileData.byteLength);\n", " }\n", "\n", " // All done.\n", " yield {\n", " response: {\n", " action: 'complete',\n", " }\n", " };\n", "}\n", "\n", "scope.google = scope.google || {};\n", "scope.google.colab = scope.google.colab || {};\n", "scope.google.colab._files = {\n", " _uploadFiles,\n", " _uploadFilesContinue,\n", "};\n", "})(self);\n", "</script> " ] }, "metadata": {} }, { "output_type": "stream", "name": "stdout", "text": [ "Saving pokemon.csv to pokemon.csv\n", "File pokemon.csv uploaded and saved successfully!\n" ] } ], "source": [ "import pandas as pd\n", "\n", "file_path = file_upload(allowed_file_types=['.csv'])\n", "if file_path:\n", " df = pd.read_csv(file_path)\n", "\n", "\n", "# loading the dataset into a pandas dataframe\n", "df = pd.read_csv(\"pokemon.csv\") # TODO: Read CSV named pokemon.csv" ] }, { "cell_type": "code", "execution_count": 16, "id": "da384329-7704-487f-828a-2cd585fc7e27", "metadata": { "id": "da384329-7704-487f-828a-2cd585fc7e27", "outputId": "03a94799-1084-40a2-dd8b-4092922708a4", "colab": { "base_uri": "https://localhost:8080/", "height": 206 } }, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " # Name Type 1 Type 2 Total HP Attack Defense \\\n", "0 1 Bulbasaur Grass Poison 318 45 49 49 \n", "1 2 Ivysaur Grass Poison 405 60 62 63 \n", "2 3 Venusaur Grass Poison 525 80 82 83 \n", "3 3 VenusaurMega Venusaur Grass Poison 625 80 100 123 \n", "4 4 Charmander Fire NaN 309 39 52 43 \n", "\n", " Sp. Atk Sp. Def Speed Generation Legendary \n", "0 65 65 45 1 False \n", "1 80 80 60 1 False \n", "2 100 100 80 1 False \n", "3 122 120 80 1 False \n", "4 60 50 65 1 False " ], "text/html": [ "\n", " <div id=\"df-aae08774-2908-4f03-8f97-b68b772eb566\" class=\"colab-df-container\">\n", " <div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>#</th>\n", " <th>Name</th>\n", " <th>Type 1</th>\n", " <th>Type 2</th>\n", " <th>Total</th>\n", " <th>HP</th>\n", " <th>Attack</th>\n", " <th>Defense</th>\n", " <th>Sp. Atk</th>\n", " <th>Sp. Def</th>\n", " <th>Speed</th>\n", " <th>Generation</th>\n", " <th>Legendary</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>1</td>\n", " <td>Bulbasaur</td>\n", " <td>Grass</td>\n", " <td>Poison</td>\n", " <td>318</td>\n", " <td>45</td>\n", " <td>49</td>\n", " <td>49</td>\n", " <td>65</td>\n", " <td>65</td>\n", " <td>45</td>\n", " <td>1</td>\n", " <td>False</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>2</td>\n", " <td>Ivysaur</td>\n", " <td>Grass</td>\n", " <td>Poison</td>\n", " <td>405</td>\n", " <td>60</td>\n", " <td>62</td>\n", " <td>63</td>\n", " <td>80</td>\n", " <td>80</td>\n", " <td>60</td>\n", " <td>1</td>\n", " <td>False</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>3</td>\n", " <td>Venusaur</td>\n", " <td>Grass</td>\n", " <td>Poison</td>\n", " <td>525</td>\n", " <td>80</td>\n", " <td>82</td>\n", " <td>83</td>\n", " <td>100</td>\n", " <td>100</td>\n", " <td>80</td>\n", " <td>1</td>\n", " <td>False</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>3</td>\n", " <td>VenusaurMega Venusaur</td>\n", " <td>Grass</td>\n", " <td>Poison</td>\n", " <td>625</td>\n", " <td>80</td>\n", " <td>100</td>\n", " <td>123</td>\n", " <td>122</td>\n", " <td>120</td>\n", " <td>80</td>\n", " <td>1</td>\n", " <td>False</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>4</td>\n", " <td>Charmander</td>\n", " <td>Fire</td>\n", " <td>NaN</td>\n", " <td>309</td>\n", " <td>39</td>\n", " <td>52</td>\n", " <td>43</td>\n", " <td>60</td>\n", " <td>50</td>\n", " <td>65</td>\n", " <td>1</td>\n", " <td>False</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>\n", " <div class=\"colab-df-buttons\">\n", "\n", " <div class=\"colab-df-container\">\n", " <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-aae08774-2908-4f03-8f97-b68b772eb566')\"\n", " title=\"Convert this dataframe to an interactive table.\"\n", " style=\"display:none;\">\n", "\n", " <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n", " <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n", " </svg>\n", " </button>\n", "\n", " <style>\n", " .colab-df-container {\n", " display:flex;\n", " gap: 12px;\n", " }\n", "\n", " .colab-df-convert {\n", " background-color: #E8F0FE;\n", " border: none;\n", " border-radius: 50%;\n", " cursor: pointer;\n", " display: none;\n", " fill: #1967D2;\n", " height: 32px;\n", " padding: 0 0 0 0;\n", " width: 32px;\n", " }\n", "\n", " .colab-df-convert:hover {\n", " background-color: #E2EBFA;\n", " box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n", " fill: #174EA6;\n", " }\n", "\n", " .colab-df-buttons div {\n", " margin-bottom: 4px;\n", " }\n", "\n", " [theme=dark] .colab-df-convert {\n", " background-color: #3B4455;\n", " fill: #D2E3FC;\n", " }\n", "\n", " [theme=dark] .colab-df-convert:hover {\n", " background-color: #434B5C;\n", " box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n", " filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n", " fill: #FFFFFF;\n", " }\n", " </style>\n", "\n", " <script>\n", " const buttonEl =\n", " document.querySelector('#df-aae08774-2908-4f03-8f97-b68b772eb566 button.colab-df-convert');\n", " buttonEl.style.display =\n", " google.colab.kernel.accessAllowed ? 'block' : 'none';\n", "\n", " async function convertToInteractive(key) {\n", " const element = document.querySelector('#df-aae08774-2908-4f03-8f97-b68b772eb566');\n", " const dataTable =\n", " await google.colab.kernel.invokeFunction('convertToInteractive',\n", " [key], {});\n", " if (!dataTable) return;\n", "\n", " const docLinkHtml = 'Like what you see? Visit the ' +\n", " '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n", " + ' to learn more about interactive tables.';\n", " element.innerHTML = '';\n", " dataTable['output_type'] = 'display_data';\n", " await google.colab.output.renderOutput(dataTable, element);\n", " const docLink = document.createElement('div');\n", " docLink.innerHTML = docLinkHtml;\n", " element.appendChild(docLink);\n", " }\n", " </script>\n", " </div>\n", "\n", "\n", "<div id=\"df-6500ebfd-2533-47ec-b69d-e61e08dc2cb3\">\n", " <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-6500ebfd-2533-47ec-b69d-e61e08dc2cb3')\"\n", " title=\"Suggest charts.\"\n", " style=\"display:none;\">\n", "\n", "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n", " width=\"24px\">\n", " <g>\n", " <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n", " </g>\n", "</svg>\n", " </button>\n", "\n", "<style>\n", " .colab-df-quickchart {\n", " --bg-color: #E8F0FE;\n", " --fill-color: #1967D2;\n", " --hover-bg-color: #E2EBFA;\n", " --hover-fill-color: #174EA6;\n", " --disabled-fill-color: #AAA;\n", " --disabled-bg-color: #DDD;\n", " }\n", "\n", " [theme=dark] .colab-df-quickchart {\n", " --bg-color: #3B4455;\n", " --fill-color: #D2E3FC;\n", " --hover-bg-color: #434B5C;\n", " --hover-fill-color: #FFFFFF;\n", " --disabled-bg-color: #3B4455;\n", " --disabled-fill-color: #666;\n", " }\n", "\n", " .colab-df-quickchart {\n", " background-color: var(--bg-color);\n", " border: none;\n", " border-radius: 50%;\n", " cursor: pointer;\n", " display: none;\n", " fill: var(--fill-color);\n", " height: 32px;\n", " padding: 0;\n", " width: 32px;\n", " }\n", "\n", " .colab-df-quickchart:hover {\n", " background-color: var(--hover-bg-color);\n", " box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n", " fill: var(--button-hover-fill-color);\n", " }\n", "\n", " .colab-df-quickchart-complete:disabled,\n", " .colab-df-quickchart-complete:disabled:hover {\n", " background-color: var(--disabled-bg-color);\n", " fill: var(--disabled-fill-color);\n", " box-shadow: none;\n", " }\n", "\n", " .colab-df-spinner {\n", " border: 2px solid var(--fill-color);\n", " border-color: transparent;\n", " border-bottom-color: var(--fill-color);\n", " animation:\n", " spin 1s steps(1) infinite;\n", " }\n", "\n", " @keyframes spin {\n", " 0% {\n", " border-color: transparent;\n", " border-bottom-color: var(--fill-color);\n", " border-left-color: var(--fill-color);\n", " }\n", " 20% {\n", " border-color: transparent;\n", " border-left-color: var(--fill-color);\n", " border-top-color: var(--fill-color);\n", " }\n", " 30% {\n", " border-color: transparent;\n", " border-left-color: var(--fill-color);\n", " border-top-color: var(--fill-color);\n", " border-right-color: var(--fill-color);\n", " }\n", " 40% {\n", " border-color: transparent;\n", " border-right-color: var(--fill-color);\n", " border-top-color: var(--fill-color);\n", " }\n", " 60% {\n", " border-color: transparent;\n", " border-right-color: var(--fill-color);\n", " }\n", " 80% {\n", " border-color: transparent;\n", " border-right-color: var(--fill-color);\n", " border-bottom-color: var(--fill-color);\n", " }\n", " 90% {\n", " border-color: transparent;\n", " border-bottom-color: var(--fill-color);\n", " }\n", " }\n", "</style>\n", "\n", " <script>\n", " async function quickchart(key) {\n", " const quickchartButtonEl =\n", " document.querySelector('#' + key + ' button');\n", " quickchartButtonEl.disabled = true; // To prevent multiple clicks.\n", " quickchartButtonEl.classList.add('colab-df-spinner');\n", " try {\n", " const charts = await google.colab.kernel.invokeFunction(\n", " 'suggestCharts', [key], {});\n", " } catch (error) {\n", " console.error('Error during call to suggestCharts:', error);\n", " }\n", " quickchartButtonEl.classList.remove('colab-df-spinner');\n", " quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n", " }\n", " (() => {\n", " let quickchartButtonEl =\n", " document.querySelector('#df-6500ebfd-2533-47ec-b69d-e61e08dc2cb3 button');\n", " quickchartButtonEl.style.display =\n", " google.colab.kernel.accessAllowed ? 'block' : 'none';\n", " })();\n", " </script>\n", "</div>\n", " </div>\n", " </div>\n" ] }, "metadata": {}, "execution_count": 16 } ], "source": [ "# printing first 5 rows of the data\n", "df.head()" ] }, { "cell_type": "code", "execution_count": 17, "id": "9f743240-9886-4a89-82bf-72ba1693abb0", "metadata": { "id": "9f743240-9886-4a89-82bf-72ba1693abb0", "outputId": "bd9336c0-0378-460e-b373-c008f1153062", "colab": { "base_uri": "https://localhost:8080/" } }, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "(800, 13)" ] }, "metadata": {}, "execution_count": 17 } ], "source": [ "# printing the shape of the data\n", "df.shape # TODO: Print the shape of the dataframe" ] }, { "cell_type": "markdown", "id": "fc57aaaa-2f27-43c5-acd4-7f1b538a9927", "metadata": { "id": "fc57aaaa-2f27-43c5-acd4-7f1b538a9927" }, "source": [ "Using 'random_sampling()' defined above, draw 200 samples of size 50 and store them in a list called 'pokemon_samples'." ] }, { "cell_type": "code", "source": [ "pokemon_samples = random_sampling(df['Total'], 200, 50) # TODO: Draw 200 samples of size 50\n", "pokemon_samples.shape" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "FCz1g6oI-jGx", "outputId": "0bb53454-ada2-4528-8fa3-339da08bae68" }, "id": "FCz1g6oI-jGx", "execution_count": 18, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "(200, 50)" ] }, "metadata": {}, "execution_count": 18 } ] }, { "cell_type": "code", "execution_count": 19, "id": "f5e671f9-7f07-4f2a-9ce5-f03ca4f6e4cf", "metadata": { "id": "f5e671f9-7f07-4f2a-9ce5-f03ca4f6e4cf", "outputId": "e87cdf02-ab9f-4b87-c07b-5406fdf32f26", "colab": { "base_uri": "https://localhost:8080/" } }, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "array([285, 600, 505, 325, 308, 400, 410, 395, 210, 405, 418, 700, 479,\n", " 305, 580, 365, 535, 530, 500, 540, 455, 290, 303, 340, 468, 405,\n", " 330, 380, 250, 465, 405, 210, 600, 425, 625, 475, 540, 525, 770,\n", " 525, 490, 395, 510, 195, 390, 320, 423, 253, 413, 600])" ] }, "metadata": {}, "execution_count": 19 } ], "source": [ "# printing the first sample\n", "pokemon_samples[0]" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.13" }, "colab": { "provenance": [], "gpuType": "T4" }, "accelerator": "GPU" }, "nbformat": 4, "nbformat_minor": 5 }