{ "cells": [ { "cell_type": "markdown", "id": "2a5ff1be-2da3-4ce2-9a6c-9f171f1ff6c1", "metadata": {}, "source": [ "# DS 122 Homework 2 Computational\n", "\n", "**Due Sep., 27th**\n", "\n", "**Full credit is 50 points (With Bonus Question: 55 Points)**\n", "\n", "**Name:**\n", "\n", "**BUID:**\n", "\n", "Most homeworks will involve “analytical” questions, and many will involve “computational” questions. This homework involves analytical and computational questions.\n", "\n", "**NOTE**\n", "\n", "- It is advised not to use CHAT GPT or any other LLM to complete the homeworks and try the questions on your own unless otherwise stated in the question.\n", "\n", "- Try to answer the questions in detail, In case you do not get the correct answer, we will take into consideration the steps (process you take to solve the question) which will help you in getting partial points.\n", "\n", "- Coding questions might seem a little daunting at first but if you go through them you will notice that a lot of answers are directly available in the notebook. It is more for your understanding than for testing, if you are unable to find a solution at first please try reading up the documentation and your class notes. We are always available during our office hours in case you have doubts regarding a topic.\n", "\n", "**SUBMISSION GUIDELINES**\n", "\n", "- For coding questions, please edit the jupyter notebook itself in the space provided to input your answer. You can choose to create a new cell to enter your code so as to not lose the sample output. \n", "\n", "- Final submissions should contain both your code (Jupyter Notebook) as well as mathematical files (Scanned or Typed PDF). You can select more than one file while uploading during submission. Please try to use the following naming convention for your submissions **{FirstName}\\_\\{LastName}\\_\\{BUID}\\_\\{analytical/computational}.zip**\n", "\n", "- This is the coding part of the homework, please submit this in the HW2 Computation Part on gradescope as a python notebook. The first part is directly to be completed on gradescope. \n", "\n" ] }, { "cell_type": "markdown", "id": "f039bb78-5200-4bce-bdd3-f04f68fe344e", "metadata": {}, "source": [ "## Computational\n", "\n", "- Add your answers in the same cell as the code or add another cell by copy pasting the existing cell\n", "- Outputs from the answer key have been left as they are for your reference. My personal suggestion would be to create a new cell with the same code copied and make sure that the output coming is the same. " ] }, { "cell_type": "markdown", "id": "7061ccc5-e39c-4ec5-84b9-435fcfd54586", "metadata": {}, "source": [ "#### Problem E - Sampling\n", "**20 points**" ] }, { "cell_type": "markdown", "id": "cad0080c-fd41-472b-94cc-558f34221e98", "metadata": {}, "source": [ "**Part a - 15 points**\n", "\n", "Consider an office with a 3 level heirarchy (Associates, Senior Associates and Partners) \n", "Salry Range for these groups are as follows:\n", "- Associates are [40000, 49999]\n", "- Senior Associates are [50000, 99999]\n", "- Partners are [100000, 150000]" ] }, { "cell_type": "code", "execution_count": null, "id": "00920544-0dc8-4d90-b00f-8f2db81451d0", "metadata": {}, "outputs": [], "source": [ "#importing libraries\n", "import numpy as ____ #TODO: Import Numpy" ] }, { "cell_type": "code", "execution_count": 1, "id": "7eae7099-bbb0-452f-8472-56801694fe53", "metadata": {}, "outputs": [], "source": [ "#creating a simulated dataset of salaries (in USD) for a group of employees in a large corporation, based on job levels\n", "assoc_salaries = np.random._____(np.arange(___, 49999), 5000, replace=True) #TODO Enter range for salaries and fill the functions\n", "senior_assoc_salaries = np.____.choice(np.arange(___, 99999), 3000, replace=True) #TODO Enter range for salaries and fill the functions\n", "partner_salaries = np.random.choice(np.____(100000, ___), 2000, replace=True) #TODO Enter range for salaries and fill the functions\n", "all_salaries = np.concatenate([____, ____, ____]) #TODO Concatenate the salaries\n", "\n", "# What is the second parameter being passed to arange function, what is its significance (Just for your understanding, not scored)\n" ] }, { "cell_type": "code", "execution_count": null, "id": "6e26c205-0cc2-46ca-9ac8-4f5abe56ab2a", "metadata": {}, "outputs": [], "source": [ "assoc_sample = np.random.choice(___, 100, replace=False) # TODO: Write code to sample 100 associate salaries\n", "senior_assoc_sample = np.random.choice(senior_assoc_salaries, ____, replace=False) # TODO: Write code to sample 80 senior associate salaries\n", "partner_sample = np.____.____(partner_salaries, 50, replace=False) #TODO: Write code to sample 50 partner salaries" ] }, { "cell_type": "markdown", "id": "fe7df61e-c751-4a9c-878f-8e374a584147", "metadata": {}, "source": [ "### This part is only to explain real world scenarios, it is already done for you. Just run the cell below and you should be good\n", "\n", "##### Stratified sampling is a method of sampling from a population that can be partitioned into subpopulations or \"strata\". Each stratum is homogeneous, or similar, but there's significant variability between the different strata. The goal of stratified sampling is to ensure that each subgroup is adequately represented within the whole sample collection of a population." ] }, { "cell_type": "code", "execution_count": 6, "id": "c1251106-cdbb-46a2-b327-dab32c7faaa5", "metadata": {}, "outputs": [], "source": [ "stratified_sample = np.concatenate([assoc_sample, senior_assoc_sample, partner_sample])" ] }, { "cell_type": "code", "execution_count": null, "id": "2854c2cc-cb5b-4da2-99fd-19e619529fd0", "metadata": {}, "outputs": [], "source": [ "#calculating mean and standard deviation of the stratified sample\n", "\n", "sample_mean = np.___(stratified_sample) #TODO: find mean of the stratified sample\n", "sample_std_dev = np.___(stratified_sample) #TODO: find standard deviation of the stratified sample" ] }, { "cell_type": "code", "execution_count": 8, "id": "8cb45d42-ae6f-466c-9e06-f08c3d077166", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "72364.31739130434\n", "32099.109718104566\n" ] } ], "source": [ "#displaying the answers\n", "print(sample_mean)\n", "print(sample_std_dev)" ] }, { "cell_type": "markdown", "id": "94df6431-df7c-4651-b55b-efe0ad4132e5", "metadata": {}, "source": [ "#### Let us see if there was actually any use of doing this, we shall find the mean and standard deviation of combined salaries without the stratified information" ] }, { "cell_type": "code", "execution_count": 9, "id": "c434c1b1-636c-4fc6-90f7-58bc79e0f9ed", "metadata": {}, "outputs": [], "source": [ "non_strata_sample_mean = np.mean(all_salaries)\n", "non_strata_sample_std = np.std(all_salaries)" ] }, { "cell_type": "code", "execution_count": 10, "id": "dded47b4-1574-4260-94d0-25e966e0287c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "69873.8334\n", "31976.06361167435\n" ] } ], "source": [ "print(non_strata_sample_mean)\n", "print(non_strata_sample_std)" ] }, { "cell_type": "markdown", "id": "8df3c048-f97d-425d-b9ea-ec86fe2fbfb1", "metadata": {}, "source": [ "##### Try to think which method would yield better results in a real world scenario, it is a minor difference in this example but in real world cases small changes and methods like this can make major difference. \n", "\n", "---" ] }, { "cell_type": "markdown", "id": "5a56ec8d-01e5-48dd-b868-9ba3c46cca4c", "metadata": {}, "source": [ "**Part b - 3 points**\n", "\n", "Select 100 random samples of size 20 and store it in 'random_samples'. Use numpy to create samples." ] }, { "cell_type": "code", "execution_count": 11, "id": "a6b2f162-053b-4fc8-954f-0e318a3d7242", "metadata": {}, "outputs": [], "source": [ "#finding out the shape of X run the cell as it is\n", "n = stratified_sample.shape\n", "\n", "#declaring an empty list to append samples into\n", "data_samples = []" ] }, { "cell_type": "code", "execution_count": null, "id": "c48cb34b-c105-444d-a3df-5ece08d3f375", "metadata": {}, "outputs": [], "source": [ "'''\n", "Write your code by replacing the comment below\n", "'''\n", "\n", "def random_sampling(df,no_of_samples, sample_size):\n", " random_samples = [] \n", " for i in range(_____): # Enter the range for the no_of_samples\n", " # Randomly sampling without replacement\n", " sample = np.random.choice(df, size=______, replace=False) #TODO: Randomly choose with size as sample_size\n", " random_samples.append(sample)\n", " random_samples = np.array(random_samples)\n", " return _______ #TODO: Return the random samples" ] }, { "cell_type": "code", "execution_count": 13, "id": "c7ba3fbf-ec4f-41f1-a03c-06900db415d5", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 45794, 48045, 78631, 50829, 125038, 41802, 41351, 117584,\n", " 81799, 44358, 69647, 63526, 64875, 46718, 45680, 42537,\n", " 44267, 40114, 102471, 56731],\n", " [ 66224, 40316, 44564, 87694, 40285, 120883, 44129, 65465,\n", " 42537, 137922, 76142, 121365, 84585, 79803, 47886, 66789,\n", " 46523, 47779, 43200, 147905]])" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#calling the function for the data, run the cell as it is\n", "data_samples = random_sampling(stratified_sample, 100, 20)\n", "#printing 2 examples of random samples\n", "data_samples[:2]" ] }, { "cell_type": "code", "execution_count": 14, "id": "52950b7f-0095-4c11-a82c-7dbcfc7482aa", "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAhYAAAGdCAYAAABO2DpVAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAZKUlEQVR4nO3dfZDVVf3A8c/KwxVx2RRFngkt82GFDEoxMx+KJDUbp1JSwsxmbEAlKpVsBjFt+afGZpwoHcfGUGGc1MwUhRLMeBAhC7UUR0ZWBSnFXcS8CJzfH7/hyoVFuOtZlt37es3cP+73nnu/5x4Oy3vu7vKtSSmlAADIYL/2ngAA0HkICwAgG2EBAGQjLACAbIQFAJCNsAAAshEWAEA2wgIAyKbr3j7h1q1b47XXXova2tqoqanZ26cHAFohpRQbNmyI/v37x3777fpzib0eFq+99loMGjRob58WAMigsbExBg4cuMvH93pY1NbWRsT/T6xXr157+/QAQCs0NzfHoEGDSv+O78peD4tt3/7o1auXsACADmZ3P8bghzcBgGyEBQCQjbAAALIRFgBANsICAMhGWAAA2QgLACAbYQEAZCMsAIBshAUAkE1FYXHddddFTU1N2a1v375tNTcAoIOp+Fohxx57bMybN690v0uXLlknBAB0XBWHRdeuXX1KAQC0qOKfsVi5cmX0798/hg4dGhdccEG89NJLHzi+WCxGc3Nz2Q0A6Jwq+sTihBNOiDvuuCOOPPLIeP311+OGG26Ik046KZ599tno3bt3i89paGiIadOmZZkskMe0Fi57PDWldnsdoPOoSan1XwU2btwYRxxxRFx11VUxefLkFscUi8UoFoul+83NzTFo0KBoamqKXr16tfbUwIcgLIBKNTc3R11d3W7//a74Zyy217NnzzjuuONi5cqVuxxTKBSiUCh8mNMAAB3Eh/p/LIrFYvzrX/+Kfv365ZoPANCBVRQWP/zhD2PBggWxatWqWLJkSXzta1+L5ubmGD9+fFvNDwDoQCr6Vsgrr7wSY8eOjf/+979x6KGHxoknnhiLFy+OIUOGtNX8AIAOpKKwmDVrVlvNAwDoBFwrBADIRlgAANkICwAgG2EBAGQjLACAbIQFAJCNsAAAshEWAEA2wgIAyEZYAADZCAsAIBthAQBkIywAgGyEBQCQjbAAALIRFgBANsICAMhGWAAA2QgLACAbYQEAZCMsAIBshAUAkI2wAACyERYAQDbCAgDIRlgAANkICwAgG2EBAGQjLACAbIQFAJCNsAAAshEWAEA2wgIAyEZYAADZCAsAIBthAQBkIywAgGyEBQCQjbAAALIRFgBANsICAMhGWAAA2QgLACAbYQEAZCMsAIBshAUAkI2wAACyERYAQDbCAgDIRlgAANkICwAgG2EBAGQjLACAbIQFAJCNsAAAshEWAEA2wgIAyEZYAADZCAsAIBthAQBk86HCoqGhIWpqamLSpEmZpgMAdGStDoulS5fGLbfcEsOGDcs5HwCgA2tVWLz99ttx4YUXxq233hoHHXRQ7jkBAB1Uq8JiwoQJcdZZZ8UXvvCF3Y4tFovR3NxcdgMAOqeulT5h1qxZsXz58li6dOkejW9oaIhp06ZVPDGgc5hWU7PTsakptcNMgL2hok8sGhsb48orr4yZM2fG/vvvv0fPmTJlSjQ1NZVujY2NrZooALDvq+gTi2XLlsW6detixIgRpWNbtmyJxx9/PG6++eYoFovRpUuXsucUCoUoFAp5ZgsA7NMqCoszzjgjVqxYUXbs29/+dhx11FFx9dVX7xQVAEB1qSgsamtro76+vuxYz549o3fv3jsdBwCqj/95EwDIpuLfCtnR/PnzM0wDAOgMfGIBAGQjLACAbIQFAJCNsAAAshEWAEA2wgIAyEZYAADZCAsAIBthAQBkIywAgGyEBQCQjbAAALIRFgBANsICAMhGWAAA2QgLACAbYQEAZCMsAIBshAUAkI2wAACyERYAQDbCAgDIRlgAANkICwAgG2EBAGQjLACAbIQFAJCNsAAAshEWAEA2wgIAyEZYAADZCAsAIBthAQBkIywAgGyEBQCQjbAAALIRFgBANsICAMhGWAAA2QgLACAbYQEAZCMsAIBshAUAkI2wAACyERYAQDbCAgDIRlgAANkICwAgG2EBAGQjLACAbIQFAJCNsAAAshEWAEA2wgIAyEZYAADZCAsAIBthAQBkIywAgGyEBQCQTUVhMWPGjBg2bFj06tUrevXqFaNGjYqHH364reYGAHQwFYXFwIEDY/r06fHUU0/FU089Faeffnqce+658eyzz7bV/ACADqRrJYPPOeecsvs33nhjzJgxIxYvXhzHHnts1okBAB1PRWGxvS1btsQ999wTGzdujFGjRu1yXLFYjGKxWLrf3Nzc2lMCAPu4isNixYoVMWrUqHj33XfjwAMPjPvuuy+OOeaYXY5vaGiIadOmfahJQqWm1dSU3Z+aUruduyU557Mn73VP5rQnz2mrddzba7avnR86k4p/K+QTn/hEPP3007F48eL43ve+F+PHj4/nnntul+OnTJkSTU1NpVtjY+OHmjAAsO+q+BOL7t27x8c+9rGIiBg5cmQsXbo0fvnLX8ZvfvObFscXCoUoFAofbpYAQIfwof8fi5RS2c9QAADVq6JPLH784x/HmDFjYtCgQbFhw4aYNWtWzJ8/P+bMmdNW8wMAOpCKwuL111+PcePGxZo1a6Kuri6GDRsWc+bMiS9+8YttNT8AoAOpKCxuu+22tpoHANAJuFYIAJCNsAAAshEWAEA2wgIAyEZYAADZCAsAIBthAQBkIywAgGyEBQCQjbAAALIRFgBANsICAMhGWAAA2QgLACAbYQEAZCMsAIBshAUAkI2wAACyERYAQDbCAgDIRlgAANkICwAgG2EBAGQjLACAbIQFAJCNsAAAshEWAEA2wgIAyEZYAADZCAsAIBthAQBkIywAgGyEBQCQjbAAALIRFgBANsICAMhGWAAA2QgLACAbYQEAZCMsAIBshAUAkI2wAACyERYAQDbCAgDIRlgAANkICwAgG2EBAGQjLACAbIQFAJCNsAAAshEWAEA2wgIAyEZYAADZCAsAIBthAQBkIywAgGyEBQCQjbAAALIRFgBANhWFRUNDQ3z605+O2tra6NOnT3z1q1+N559/vq3mBgB0MBWFxYIFC2LChAmxePHimDt3bmzevDlGjx4dGzdubKv5AQAdSNdKBs+ZM6fs/u233x59+vSJZcuWxSmnnJJ1YgBAx1NRWOyoqakpIiIOPvjgXY4pFotRLBZL95ubmz/MKQGAfVirwyKlFJMnT46TTz456uvrdzmuoaEhpk2b1trTVGRaTc1Ox6amtFfO3ZJ9bT57056897YcQ+X2ZB33tbXecT578vdrX3sPuVTz1xv2La3+rZCJEyfGP//5z7j77rs/cNyUKVOiqampdGtsbGztKQGAfVyrPrG4/PLL44EHHojHH388Bg4c+IFjC4VCFAqFVk0OAOhYKgqLlFJcfvnlcd9998X8+fNj6NChbTUvAKADqigsJkyYEHfddVf84Q9/iNra2li7dm1ERNTV1UWPHj3aZIIAQMdR0c9YzJgxI5qamuLUU0+Nfv36lW6zZ89uq/kBAB1Ixd8KAQDYFdcKAQCyERYAQDbCAgDIRlgAANkICwAgG2EBAGQjLACAbIQFAJCNsAAAshEWAEA2wgIAyEZYAADZCAsAIBthAQBkIywAgGyEBQCQjbAAALIRFgBANsICAMhGWAAA2QgLACAbYQEAZCMsAIBshAUAkI2wAACyERYAQDbCAgDIRlgAANkICwAgG2EBAGQjLACAbIQFAJCNsAAAshEWAEA2wgIAyEZYAADZCAsAIBthAQBkIywAgGyEBQCQjbAAALIRFgBANsICAMhGWAAA2QgLACAbYQEAZCMsAIBshAUAkI2wAACyERYAQDbCAgDIRlgAANkICwAgG2EBAGQjLACAbIQFAJCNsAAAshEWAEA2wgIAyEZYAADZVBwWjz/+eJxzzjnRv3//qKmpifvvv78NpgUAdEQVh8XGjRtj+PDhcfPNN7fFfACADqxrpU8YM2ZMjBkzpi3mAgB0cBWHRaWKxWIUi8XS/ebm5rY+JQDQTto8LBoaGmLatGltfZoPZVpNTdn9qSllGbMn59oTLZ2rNa+b63VaK9drt+Z12vJ9tca+Np+9ra3+DNt7XXP9PdyT123N67RWrq+RtI19be3b/LdCpkyZEk1NTaVbY2NjW58SAGgnbf6JRaFQiEKh0NanAQD2Af4fCwAgm4o/sXj77bfjxRdfLN1ftWpVPP3003HwwQfH4MGDs04OAOhYKg6Lp556Kk477bTS/cmTJ0dExPjx4+O3v/1ttokBAB1PxWFx6qmnRvLTvgBAC/yMBQCQjbAAALIRFgBANsICAMhGWAAA2QgLACAbYQEAZCMsAIBshAUAkI2wAACyERYAQDbCAgDIRlgAANkICwAgG2EBAGQjLACAbIQFAJCNsAAAshEWAEA2wgIAyEZYAADZCAsAIBthAQBkIywAgGyEBQCQjbAAALIRFgBANsICAMhGWAAA2QgLACAbYQEAZCMsAIBshAUAkI2wAACyERYAQDbCAgDIRlgAANkICwAgG2EBAGQjLACAbIQFAJCNsAAAshEWAEA2wgIAyEZYAADZCAsAIBthAQBkIywAgGyEBQCQjbAAALIRFgBANsICAMhGWAAA2QgLACAbYQEAZCMsAIBshAUAkI2wAACyERYAQDatCotf/epXMXTo0Nh///1jxIgR8de//jX3vACADqjisJg9e3ZMmjQprr322vj73/8en/vc52LMmDGxevXqtpgfANCBVBwWv/jFL+I73/lOXHrppXH00UfHTTfdFIMGDYoZM2a0xfwAgA6kayWDN23aFMuWLYtrrrmm7Pjo0aNj4cKFLT6nWCxGsVgs3W9qaoqIiObm5krnulvvtnBsT86z4/Naek5rxuTSmvfQlq/TlnacY3vPpyPam3tzT87fWf4M9+bfwz153T1Z51xfZ3N9jaRt7K213/a6KaUPHpgq8Oqrr6aISH/729/Kjt94443pyCOPbPE5U6dOTRHh5ubm5ubm1glujY2NH9gKFX1isU1NTU3Z/ZTSTse2mTJlSkyePLl0f+vWrfHmm29G7969d/mc1mhubo5BgwZFY2Nj9OrVK9vrdlTWo5z1eJ+1KGc93mctylmPciml2LBhQ/Tv3/8Dx1UUFoccckh06dIl1q5dW3Z83bp1cdhhh7X4nEKhEIVCoezYRz7ykUpOW5FevXrZANuxHuWsx/usRTnr8T5rUc56vK+urm63Yyr64c3u3bvHiBEjYu7cuWXH586dGyeddFJlswMAOp2KvxUyefLkGDduXIwcOTJGjRoVt9xyS6xevTouu+yytpgfANCBVBwW559/frzxxhtx/fXXx5o1a6K+vj4eeuihGDJkSFvMb48VCoWYOnXqTt92qVbWo5z1eJ+1KGc93mctylmP1qlJu/29EQCAPeNaIQBANsICAMhGWAAA2QgLACCbdg2LV199NS666KLo3bt3HHDAAfHJT34yli1bVno8pRTXXXdd9O/fP3r06BGnnnpqPPvss2WvUSwW4/LLL49DDjkkevbsGV/5ylfilVdeKRuzfv36GDduXNTV1UVdXV2MGzcu3nrrrbIxq1evjnPOOSd69uwZhxxySFxxxRWxadOmNnvvLdndelx88cVRU1NTdjvxxBPLXqOzrMdHP/rRnd5rTU1NTJgwISKqa2/sbi2qaV9ERGzevDl+8pOfxNChQ6NHjx5x+OGHx/XXXx9bt24tjamW/bEna1Ft+2PDhg0xadKkGDJkSPTo0SNOOumkWLp0aenxatkb7aqSa4Xk9Oabb6YhQ4akiy++OC1ZsiStWrUqzZs3L7344oulMdOnT0+1tbXp97//fVqxYkU6//zzU79+/VJzc3NpzGWXXZYGDBiQ5s6dm5YvX55OO+20NHz48LR58+bSmDPPPDPV19enhQsXpoULF6b6+vp09tlnlx7fvHlzqq+vT6eddlpavnx5mjt3burfv3+aOHHi3lmMtGfrMX78+HTmmWemNWvWlG5vvPFG2et0lvVYt25d2fucO3duioj02GOPpZSqa2/sbi2qaV+klNINN9yQevfunR588MG0atWqdM8996QDDzww3XTTTaUx1bI/9mQtqm1/fOMb30jHHHNMWrBgQVq5cmWaOnVq6tWrV3rllVdSStWzN9pTu4XF1VdfnU4++eRdPr5169bUt2/fNH369NKxd999N9XV1aVf//rXKaWU3nrrrdStW7c0a9as0phXX3017bfffmnOnDkppZSee+65FBFp8eLFpTGLFi1KEZH+/e9/p5RSeuihh9J+++2XXn311dKYu+++OxUKhdTU1JTnDe/G7tYjpf//AnHuuefu8vHOtB47uvLKK9MRRxyRtm7dWnV7Y0fbr0VK1bcvzjrrrHTJJZeUHTvvvPPSRRddlFKqrq8du1uLlKprf7zzzjupS5cu6cEHHyw7Pnz48HTttddW1d5oT+32rZAHHnggRo4cGV//+tejT58+cfzxx8ett95aenzVqlWxdu3aGD16dOlYoVCIz3/+86VLtC9btizee++9sjH9+/eP+vr60phFixZFXV1dnHDCCaUxJ554YtTV1ZWNqa+vL7uwype+9KUoFotl34poS7tbj23mz58fffr0iSOPPDK++93vxrp160qPdab12N6mTZti5syZcckll0RNTU3V7Y3t7bgW21TTvjj55JPjz3/+c7zwwgsREfGPf/wjnnjiifjyl78cEdX1tWN3a7FNteyPzZs3x5YtW2L//fcvO96jR4944oknqmpvtKd2C4uXXnopZsyYER//+MfjkUceicsuuyyuuOKKuOOOOyIiShc62/HiZocddljpsbVr10b37t3joIMO+sAxffr02en8ffr0KRuz43kOOuig6N69+04XXGsru1uPiIgxY8bEnXfeGX/5y1/i5z//eSxdujROP/30KBaLpffRWdZje/fff3+89dZbcfHFF5fmF1E9e2N7O65FRPXti6uvvjrGjh0bRx11VHTr1i2OP/74mDRpUowdO7Y0z4jq2B+7W4uI6toftbW1MWrUqPjpT38ar732WmzZsiVmzpwZS5YsiTVr1lTV3mhPrbpseg5bt26NkSNHxs9+9rOIiDj++OPj2WefjRkzZsS3vvWt0rhKLtG+qzEtjW/NmLa0J+tx/vnnl8bX19fHyJEjY8iQIfGnP/0pzjvvvF2+dkdcj+3ddtttMWbMmJ0u1Vste2N7La1Fte2L2bNnx8yZM+Ouu+6KY489Np5++umYNGlS9O/fP8aPH7/LuXbG/bEna1Ft++N3v/tdXHLJJTFgwIDo0qVLfOpTn4pvfvObsXz58l3OszPujfbUbp9Y9OvXL4455piyY0cffXSsXr06IiL69u0bEbFT2W1/ifa+ffvGpk2bYv369R845vXXX9/p/P/5z3/Kxux4nvXr18d77723y8vB57a79djVc4YMGRIrV66MiM61Htu8/PLLMW/evLj00ktLx6ptb2zT0lq0pLPvix/96EdxzTXXxAUXXBDHHXdcjBs3Lr7//e9HQ0NDaZ4R1bE/drcWLens++OII46IBQsWxNtvvx2NjY3x5JNPxnvvvRdDhw6tqr3RntotLD772c/G888/X3bshRdeKF3MbNsm2P4S7Zs2bYoFCxaULtE+YsSI6NatW9mYNWvWxDPPPFMaM2rUqGhqaoonn3yyNGbJkiXR1NRUNuaZZ56JNWvWlMY8+uijUSgUYsSIEZnfect2tx4teeONN6KxsTH69esXEZ1rPba5/fbbo0+fPnHWWWeVjlXb3timpbVoSWffF++8807st1/5l64uXbqUfsWymvbH7taiJZ19f2zTs2fP6NevX6xfvz4eeeSROPfcc6tqb7SrvfZjojt48sknU9euXdONN96YVq5cme688850wAEHpJkzZ5bGTJ8+PdXV1aV77703rVixIo0dO7bFXwsaOHBgmjdvXlq+fHk6/fTTW/y1oGHDhqVFixalRYsWpeOOO67FXws644wz0vLly9O8efPSwIED9+qvBe1uPTZs2JB+8IMfpIULF6ZVq1alxx57LI0aNSoNGDCgU65HSilt2bIlDR48OF199dU7PVZNeyOlXa9FNe6L8ePHpwEDBpR+xfLee+9NhxxySLrqqqtKY6plf+xuLapxf8yZMyc9/PDD6aWXXkqPPvpoGj58ePrMZz6TNm3alFKqnr3RntotLFJK6Y9//GOqr69PhUIhHXXUUemWW24pe3zr1q1p6tSpqW/fvqlQKKRTTjklrVixomzM//73vzRx4sR08MEHpx49eqSzzz47rV69umzMG2+8kS688MJUW1ubamtr04UXXpjWr19fNubll19OZ511VurRo0c6+OCD08SJE9O7777bJu97Vz5oPd555500evTodOihh6Zu3bqlwYMHp/Hjx+/0XjvTejzyyCMpItLzzz+/02PVtjd2tRbVuC+am5vTlVdemQYPHpz233//dPjhh6drr702FYvF0phq2R+7W4tq3B+zZ89Ohx9+eOrevXvq27dvmjBhQnrrrbdKj1fL3mhPLpsOAGTjWiEAQDbCAgDIRlgAANkICwAgG2EBAGQjLACAbIQFAJCNsAAAshEWAEA2wgIAyEZYAADZCAsAIJv/Axtjl96/Mvn2AAAAAElFTkSuQmCC\n", "text/plain": [ "<Figure size 640x480 with 1 Axes>" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "#plotting the means of random samples on bar graph, run as it is\n", "\n", "import matplotlib.pyplot as plt\n", "x_bar = data_samples.mean(axis=1)\n", "plt.hist(x_bar, bins=100, color = 'maroon');" ] }, { "cell_type": "markdown", "id": "05ef02cd-0dca-4732-a524-838b894554f8", "metadata": {}, "source": [ "#### Part c - Sampling on a dataset - 2 points\n", "\n", "Now that you have understood what sampling is, use pandas to load the data and the function defined above to draw samples from 'Pokemon.csv'." ] }, { "cell_type": "code", "execution_count": 15, "id": "61c5db0f-20e7-4287-b695-b822bffbcad5", "metadata": {}, "outputs": [], "source": [ "#importing libraries\n", "import pandas as pd" ] }, { "cell_type": "code", "execution_count": null, "id": "dff7da41-21a6-4457-8b66-c237565d6e65", "metadata": {}, "outputs": [], "source": [ "#loading the dataset into a pandas dataframe\n", "df = pd.read_csv(_____) #TODO: Read CSV named pokemon.csv" ] }, { "cell_type": "code", "execution_count": null, "id": "da384329-7704-487f-828a-2cd585fc7e27", "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>#</th>\n", " <th>Name</th>\n", " <th>Type 1</th>\n", " <th>Type 2</th>\n", " <th>Total</th>\n", " <th>HP</th>\n", " <th>Attack</th>\n", " <th>Defense</th>\n", " <th>Sp. Atk</th>\n", " <th>Sp. Def</th>\n", " <th>Speed</th>\n", " <th>Generation</th>\n", " <th>Legendary</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>1</td>\n", " <td>Bulbasaur</td>\n", " <td>Grass</td>\n", " <td>Poison</td>\n", " <td>318</td>\n", " <td>45</td>\n", " <td>49</td>\n", " <td>49</td>\n", " <td>65</td>\n", " <td>65</td>\n", " <td>45</td>\n", " <td>1</td>\n", " <td>False</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>2</td>\n", " <td>Ivysaur</td>\n", " <td>Grass</td>\n", " <td>Poison</td>\n", " <td>405</td>\n", " <td>60</td>\n", " <td>62</td>\n", " <td>63</td>\n", " <td>80</td>\n", " <td>80</td>\n", " <td>60</td>\n", " <td>1</td>\n", " <td>False</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>3</td>\n", " <td>Venusaur</td>\n", " <td>Grass</td>\n", " <td>Poison</td>\n", " <td>525</td>\n", " <td>80</td>\n", " <td>82</td>\n", " <td>83</td>\n", " <td>100</td>\n", " <td>100</td>\n", " <td>80</td>\n", " <td>1</td>\n", " <td>False</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>3</td>\n", " <td>VenusaurMega Venusaur</td>\n", " <td>Grass</td>\n", " <td>Poison</td>\n", " <td>625</td>\n", " <td>80</td>\n", " <td>100</td>\n", " <td>123</td>\n", " <td>122</td>\n", " <td>120</td>\n", " <td>80</td>\n", " <td>1</td>\n", " <td>False</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>4</td>\n", " <td>Charmander</td>\n", " <td>Fire</td>\n", " <td>NaN</td>\n", " <td>309</td>\n", " <td>39</td>\n", " <td>52</td>\n", " <td>43</td>\n", " <td>60</td>\n", " <td>50</td>\n", " <td>65</td>\n", " <td>1</td>\n", " <td>False</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " # Name Type 1 Type 2 Total HP Attack Defense \\\n", "0 1 Bulbasaur Grass Poison 318 45 49 49 \n", "1 2 Ivysaur Grass Poison 405 60 62 63 \n", "2 3 Venusaur Grass Poison 525 80 82 83 \n", "3 3 VenusaurMega Venusaur Grass Poison 625 80 100 123 \n", "4 4 Charmander Fire NaN 309 39 52 43 \n", "\n", " Sp. Atk Sp. Def Speed Generation Legendary \n", "0 65 65 45 1 False \n", "1 80 80 60 1 False \n", "2 100 100 80 1 False \n", "3 122 120 80 1 False \n", "4 60 50 65 1 False " ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#printing first 5 rows of the data\n", "df.head()" ] }, { "cell_type": "code", "execution_count": null, "id": "9f743240-9886-4a89-82bf-72ba1693abb0", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(800, 13)" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#printing the shape of the data\n", "df._____ #TODO: Print the shape of the dataframe" ] }, { "cell_type": "markdown", "id": "fc57aaaa-2f27-43c5-acd4-7f1b538a9927", "metadata": {}, "source": [ "Using 'random_sampling()' defined above, draw 200 samples of size 50 and store them in a list called 'pokemon_samples'." ] }, { "cell_type": "code", "execution_count": null, "id": "839b84ba-8593-4ec7-a599-a6791226f991", "metadata": {}, "outputs": [], "source": [ "pokemon_samples = random_sampling(df['Total'], ____, ____) #TODO: Draw 200 samples of size 50" ] }, { "cell_type": "code", "execution_count": null, "id": "f5e671f9-7f07-4f2a-9ce5-f03ca4f6e4cf", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([475, 318, 634, 380, 600, 485, 305, 525, 500, 540, 510, 435, 405,\n", " 495, 370, 500, 310, 345, 555, 494, 680, 450, 413, 370, 500, 270,\n", " 362, 319, 507, 485, 205, 680, 330, 600, 300, 442, 485, 490, 515,\n", " 600, 680, 460, 700, 555, 534, 580, 428, 410, 590, 288])" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#printing the first sample\n", "pokemon_samples[0]" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.13" } }, "nbformat": 4, "nbformat_minor": 5 }