Merge branch 'The-Marcy-Lab-School:main' into Code7draft

1xcel · web-flow · commit 676c355d2e31 · 2025-06-11T16:53:13.000-04:00
diff --git a/Mod1/M1L7-DataManipulation_STUDENT (1).ipynb b/Mod1/M1L7-DataManipulation_STUDENT (1).ipynb
@@ -0,0 +1,221 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# M1L7 Data Challenge:  Data Manipulation \n",
+    "\n",
+    " We'll continue to work with UFO sighting data.\n",
+    "\n",
+    "### **Dataset:** [UFO Sightings](https://www.kaggle.com/datasets/jonwright13/ufo-sightings-around-the-world-better?resource=download) -- This is also in your data folder \n",
+    "\n",
+    "### **Objectives:**\n",
+    "\n",
+    "- Use string methods to manipulate data \n",
+    "- Filter Data \n",
+    "- Work more with dates in Python\n",
+    "\n",
+    "\n",
+    "\n",
+    "**Let's get started!**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Step 1:  Import Pandas & Numpy"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Import Pandas \n",
+    "None"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Step 2: Load the dataset (csv file stored in the data folder) into a Pandas DataFrame called `ufo`\n",
+    "\n",
+    "- The file is callled `ufo-sightings.csv`\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ufo = None\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Step 3: Explore the Data\n",
+    "\n",
+    "Use any method(s) of your choice to look at the data and explore it \n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "None"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Step 4:  Clean the UFO_shape column \n",
+    "- Make the column all uppercase \n",
+    "- Strip off any leading and trailing spaces \n",
+    "\n",
+    "Even if there are no actual spaces; it is still good practice to trim off spaces even if you can't see space with the naked eye\n",
+    "\n",
+    "Hint:  You will use both `str.upper()` and `str.strip()` -- you can do it in one step or two separate steps "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "None"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Step 5:  Use `pd.crosstab` to sum the number of shapes seen by season\n",
+    "\n",
+    "- Add a comment of a main takeaway from the output "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "None\n",
+    "#Add comment here:  "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Run this cell without changes before moving on to step 6!\n",
+    "\n",
+    "ufo['Date_time'] = pd.to_datetime(ufo['Date_time'], format=\"%Y-%m-%d %H:%M:%S\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Step 6:  Filter the data where the region is equal to `New York`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "None"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Step 7:  Get the most recent `Date_time` that a UFO was sighted in New York \n",
+    "\n",
+    "Hint:  Make sure you saved your filtered data from Step 6 to a new dataframe object aka varaible.  You can use `.max()` right after a column name to get the max of that column\n",
+    "\n",
+    "You are using the `Date_time` column for this question"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "None"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Above and Beyond (AAB)  -- OPTIONAL\n",
+    "\n",
+    "### Question 1:  How many days have passed between the first UFO sighting in NY and the most recent sighting in NY based on this data?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "None"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Question 2:  Filter the data where UFO_shape is `UNKNOWN` and the Region is `New York` "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "None"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python (learn-env)",
+   "language": "python",
+   "name": "learn-env"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.4"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/Mod1/lecture_code_alongs/M1L7-DataManipulation_STUDENT.ipynb b/Mod1/lecture_code_alongs/M1L7-DataManipulation_STUDENT.ipynb
@@ -0,0 +1,161 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# M1L7 Data Types, Dates, Strings \n",
+    "\n",
+    " We'll be working with UFO sighting data.\n",
+    "\n",
+    "### **Dataset:** [UFO Sightings](https://www.kaggle.com/datasets/jonwright13/ufo-sightings-around-the-world-better?resource=download) -- This is also in your data folder \n",
+    "\n",
+    "### **Objectives:**\n",
+    "\n",
+    "- Change an object to a datetime object \n",
+    "- Use string methods to manipulate data \n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Step 1:  Import pandas and numpy "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#Import packages \n",
+    "\n",
+    "None"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Step 2:  Load in the data and save it as `ufo`\n",
+    "\n",
+    "- The dataset is named `ufo-sightings.csv`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ufo = None"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Step 3: Check column data types and the head of the data -- does the data/types make sense?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "None"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Step 4:  Convert the `Date` column to datetime \n",
+    "\n",
+    "- Even though we have columns for year, month, and hour; we still want to change Date_time to a datetime object \n",
+    "- Dates can come in many formats so we will use this format: '%Y-%m-%d %H:%M:%S'"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ufo['Date_time'] = None"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#Run this to see if the update worked \n",
+    "ufo.info()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Step 5:  Make the `Description` column all lowercase \n",
+    "\n",
+    "- Think about why would we want text all lowercase \n",
+    "\n",
+    "**Instructor Notes**\n",
+    "Feel free to talk about text analytics or LLMs or a simple case like states being different cases and you want to do aggregations"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ufo['Description'] = None\n",
+    "print(ufo['Description'])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Step 6:  Replace spaces with underscores in the `Encounter_Duration` column\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ufo['Encounter_Duration'] = None\n",
+    "print(ufo['Encounter_Duration'])"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python (learn-env)",
+   "language": "python",
+   "name": "learn-env"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.4"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}