Skip to content

Commit 676c355

Browse files
authored
Merge branch 'The-Marcy-Lab-School:main' into Code7draft
2 parents caf55d2 + 7d809da commit 676c355

File tree

2 files changed

+382
-0
lines changed

2 files changed

+382
-0
lines changed
Lines changed: 221 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,221 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# M1L7 Data Challenge: Data Manipulation \n",
8+
"\n",
9+
" We'll continue to work with UFO sighting data.\n",
10+
"\n",
11+
"### **Dataset:** [UFO Sightings](https://www.kaggle.com/datasets/jonwright13/ufo-sightings-around-the-world-better?resource=download) -- This is also in your data folder \n",
12+
"\n",
13+
"### **Objectives:**\n",
14+
"\n",
15+
"- Use string methods to manipulate data \n",
16+
"- Filter Data \n",
17+
"- Work more with dates in Python\n",
18+
"\n",
19+
"\n",
20+
"\n",
21+
"**Let's get started!**"
22+
]
23+
},
24+
{
25+
"cell_type": "markdown",
26+
"metadata": {},
27+
"source": [
28+
"### Step 1: Import Pandas & Numpy"
29+
]
30+
},
31+
{
32+
"cell_type": "code",
33+
"execution_count": null,
34+
"metadata": {},
35+
"outputs": [],
36+
"source": [
37+
"# Import Pandas \n",
38+
"None"
39+
]
40+
},
41+
{
42+
"cell_type": "markdown",
43+
"metadata": {},
44+
"source": [
45+
"### Step 2: Load the dataset (csv file stored in the data folder) into a Pandas DataFrame called `ufo`\n",
46+
"\n",
47+
"- The file is callled `ufo-sightings.csv`\n"
48+
]
49+
},
50+
{
51+
"cell_type": "code",
52+
"execution_count": null,
53+
"metadata": {},
54+
"outputs": [],
55+
"source": [
56+
"ufo = None\n"
57+
]
58+
},
59+
{
60+
"cell_type": "markdown",
61+
"metadata": {},
62+
"source": [
63+
"### Step 3: Explore the Data\n",
64+
"\n",
65+
"Use any method(s) of your choice to look at the data and explore it \n"
66+
]
67+
},
68+
{
69+
"cell_type": "code",
70+
"execution_count": null,
71+
"metadata": {},
72+
"outputs": [],
73+
"source": [
74+
"None"
75+
]
76+
},
77+
{
78+
"cell_type": "markdown",
79+
"metadata": {},
80+
"source": [
81+
"### Step 4: Clean the UFO_shape column \n",
82+
"- Make the column all uppercase \n",
83+
"- Strip off any leading and trailing spaces \n",
84+
"\n",
85+
"Even if there are no actual spaces; it is still good practice to trim off spaces even if you can't see space with the naked eye\n",
86+
"\n",
87+
"Hint: You will use both `str.upper()` and `str.strip()` -- you can do it in one step or two separate steps "
88+
]
89+
},
90+
{
91+
"cell_type": "code",
92+
"execution_count": null,
93+
"metadata": {},
94+
"outputs": [],
95+
"source": [
96+
"None"
97+
]
98+
},
99+
{
100+
"cell_type": "markdown",
101+
"metadata": {},
102+
"source": [
103+
"### Step 5: Use `pd.crosstab` to sum the number of shapes seen by season\n",
104+
"\n",
105+
"- Add a comment of a main takeaway from the output "
106+
]
107+
},
108+
{
109+
"cell_type": "code",
110+
"execution_count": null,
111+
"metadata": {},
112+
"outputs": [],
113+
"source": [
114+
"None\n",
115+
"#Add comment here: "
116+
]
117+
},
118+
{
119+
"cell_type": "code",
120+
"execution_count": null,
121+
"metadata": {},
122+
"outputs": [],
123+
"source": [
124+
"# Run this cell without changes before moving on to step 6!\n",
125+
"\n",
126+
"ufo['Date_time'] = pd.to_datetime(ufo['Date_time'], format=\"%Y-%m-%d %H:%M:%S\")"
127+
]
128+
},
129+
{
130+
"cell_type": "markdown",
131+
"metadata": {},
132+
"source": [
133+
"### Step 6: Filter the data where the region is equal to `New York`"
134+
]
135+
},
136+
{
137+
"cell_type": "code",
138+
"execution_count": null,
139+
"metadata": {},
140+
"outputs": [],
141+
"source": [
142+
"None"
143+
]
144+
},
145+
{
146+
"cell_type": "markdown",
147+
"metadata": {},
148+
"source": [
149+
"### Step 7: Get the most recent `Date_time` that a UFO was sighted in New York \n",
150+
"\n",
151+
"Hint: Make sure you saved your filtered data from Step 6 to a new dataframe object aka varaible. You can use `.max()` right after a column name to get the max of that column\n",
152+
"\n",
153+
"You are using the `Date_time` column for this question"
154+
]
155+
},
156+
{
157+
"cell_type": "code",
158+
"execution_count": null,
159+
"metadata": {},
160+
"outputs": [],
161+
"source": [
162+
"None"
163+
]
164+
},
165+
{
166+
"cell_type": "markdown",
167+
"metadata": {},
168+
"source": [
169+
"## Above and Beyond (AAB) -- OPTIONAL\n",
170+
"\n",
171+
"### Question 1: How many days have passed between the first UFO sighting in NY and the most recent sighting in NY based on this data?"
172+
]
173+
},
174+
{
175+
"cell_type": "code",
176+
"execution_count": null,
177+
"metadata": {},
178+
"outputs": [],
179+
"source": [
180+
"None"
181+
]
182+
},
183+
{
184+
"cell_type": "markdown",
185+
"metadata": {},
186+
"source": [
187+
"### Question 2: Filter the data where UFO_shape is `UNKNOWN` and the Region is `New York` "
188+
]
189+
},
190+
{
191+
"cell_type": "code",
192+
"execution_count": null,
193+
"metadata": {},
194+
"outputs": [],
195+
"source": [
196+
"None"
197+
]
198+
}
199+
],
200+
"metadata": {
201+
"kernelspec": {
202+
"display_name": "Python (learn-env)",
203+
"language": "python",
204+
"name": "learn-env"
205+
},
206+
"language_info": {
207+
"codemirror_mode": {
208+
"name": "ipython",
209+
"version": 3
210+
},
211+
"file_extension": ".py",
212+
"mimetype": "text/x-python",
213+
"name": "python",
214+
"nbconvert_exporter": "python",
215+
"pygments_lexer": "ipython3",
216+
"version": "3.12.4"
217+
}
218+
},
219+
"nbformat": 4,
220+
"nbformat_minor": 2
221+
}
Lines changed: 161 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,161 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# M1L7 Data Types, Dates, Strings \n",
8+
"\n",
9+
" We'll be working with UFO sighting data.\n",
10+
"\n",
11+
"### **Dataset:** [UFO Sightings](https://www.kaggle.com/datasets/jonwright13/ufo-sightings-around-the-world-better?resource=download) -- This is also in your data folder \n",
12+
"\n",
13+
"### **Objectives:**\n",
14+
"\n",
15+
"- Change an object to a datetime object \n",
16+
"- Use string methods to manipulate data \n"
17+
]
18+
},
19+
{
20+
"cell_type": "markdown",
21+
"metadata": {},
22+
"source": [
23+
"### Step 1: Import pandas and numpy "
24+
]
25+
},
26+
{
27+
"cell_type": "code",
28+
"execution_count": null,
29+
"metadata": {},
30+
"outputs": [],
31+
"source": [
32+
"#Import packages \n",
33+
"\n",
34+
"None"
35+
]
36+
},
37+
{
38+
"cell_type": "markdown",
39+
"metadata": {},
40+
"source": [
41+
"### Step 2: Load in the data and save it as `ufo`\n",
42+
"\n",
43+
"- The dataset is named `ufo-sightings.csv`"
44+
]
45+
},
46+
{
47+
"cell_type": "code",
48+
"execution_count": null,
49+
"metadata": {},
50+
"outputs": [],
51+
"source": [
52+
"ufo = None"
53+
]
54+
},
55+
{
56+
"cell_type": "markdown",
57+
"metadata": {},
58+
"source": [
59+
"### Step 3: Check column data types and the head of the data -- does the data/types make sense?"
60+
]
61+
},
62+
{
63+
"cell_type": "code",
64+
"execution_count": null,
65+
"metadata": {},
66+
"outputs": [],
67+
"source": [
68+
"None"
69+
]
70+
},
71+
{
72+
"cell_type": "markdown",
73+
"metadata": {},
74+
"source": [
75+
"### Step 4: Convert the `Date` column to datetime \n",
76+
"\n",
77+
"- Even though we have columns for year, month, and hour; we still want to change Date_time to a datetime object \n",
78+
"- Dates can come in many formats so we will use this format: '%Y-%m-%d %H:%M:%S'"
79+
]
80+
},
81+
{
82+
"cell_type": "code",
83+
"execution_count": null,
84+
"metadata": {},
85+
"outputs": [],
86+
"source": [
87+
"ufo['Date_time'] = None"
88+
]
89+
},
90+
{
91+
"cell_type": "code",
92+
"execution_count": null,
93+
"metadata": {},
94+
"outputs": [],
95+
"source": [
96+
"#Run this to see if the update worked \n",
97+
"ufo.info()"
98+
]
99+
},
100+
{
101+
"cell_type": "markdown",
102+
"metadata": {},
103+
"source": [
104+
"### Step 5: Make the `Description` column all lowercase \n",
105+
"\n",
106+
"- Think about why would we want text all lowercase \n",
107+
"\n",
108+
"**Instructor Notes**\n",
109+
"Feel free to talk about text analytics or LLMs or a simple case like states being different cases and you want to do aggregations"
110+
]
111+
},
112+
{
113+
"cell_type": "code",
114+
"execution_count": null,
115+
"metadata": {},
116+
"outputs": [],
117+
"source": [
118+
"ufo['Description'] = None\n",
119+
"print(ufo['Description'])"
120+
]
121+
},
122+
{
123+
"cell_type": "markdown",
124+
"metadata": {},
125+
"source": [
126+
"### Step 6: Replace spaces with underscores in the `Encounter_Duration` column\n"
127+
]
128+
},
129+
{
130+
"cell_type": "code",
131+
"execution_count": null,
132+
"metadata": {},
133+
"outputs": [],
134+
"source": [
135+
"ufo['Encounter_Duration'] = None\n",
136+
"print(ufo['Encounter_Duration'])"
137+
]
138+
}
139+
],
140+
"metadata": {
141+
"kernelspec": {
142+
"display_name": "Python (learn-env)",
143+
"language": "python",
144+
"name": "learn-env"
145+
},
146+
"language_info": {
147+
"codemirror_mode": {
148+
"name": "ipython",
149+
"version": 3
150+
},
151+
"file_extension": ".py",
152+
"mimetype": "text/x-python",
153+
"name": "python",
154+
"nbconvert_exporter": "python",
155+
"pygments_lexer": "ipython3",
156+
"version": "3.12.4"
157+
}
158+
},
159+
"nbformat": 4,
160+
"nbformat_minor": 2
161+
}

0 commit comments

Comments
 (0)