Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions assignments/r-tidy-data.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
---
layout: page
element: assignment
title: Tidy Data
language: R
exercises: ['Improving Messy Data', 'Data entry validation in Excel', 'Clean Up Untidy Data', 'Tree Biomass', 'Check That Your Code Runs']
points: [20, 20, 20, 30, 10]
---

### Learning Objectives

> Following this assignment students should be able to:
>
> - understand the basic rules of tidy data
> - implement quality control for data entry in spreadsheets
> - know how to make messy data tidy using tidyr

{% include reading.html %}

{% include assignment.html %}
20 changes: 20 additions & 0 deletions exercises/Qaqc-data-entry-validation-in-excel-R.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
---
layout: exercise
topic: QAQC
title: Data entry validation in Excel
language: R
---

Create a spreadsheet in Excel for data entry. It should have five columns: Date, Site, Species, Mass, and Length.

Set the following data validation criteria to prevent invalid data from getting entered:

1. The Date column should be set so that it doesn't convert dates to other formats.
2. Use data validation so that Site can only be one of the following `A1`, `A2`, `B1`, `B2`. Set the error message on this validation criteria to provide information on what the valid values are.
3. Use data validation so that Species can only be one of the following `Dipodomys spectabilis`, `Dipodomys ordii`, `Dipodomys merriami`. Set the error message on this validation criteria to provide information on what the valid values are.
4. Use data validation so that Mass can only be a decimal greater than or equal to zero but less than or equal to 500. Set the error message on this validation criteria to provide information on what the valid values are.
5. Length should be an integer (i.e., a whole number) between 1 and 10. Set the error message on this validation criteria to provide information on what the valid values are.

Check that the validation rules and data formating are working, but do not include any entered data in the final file.

Save this file as `data_entry_form.xlsx`.
15 changes: 15 additions & 0 deletions exercises/Tidy-data-clean-up-R.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
---
layout: exercise
topic: Tidy Data
title: Clean Up Untidy Data
language: R
---

A lot of real data isn't very tidy, mostly because most scientists aren't taught
about how to structure their data in a way that is easy to analyze.

[Download an untidy version](https://ndownloader.figshare.com/files/24469424)
of some of the Portal Project data, which includes information on the data, species
identification, weight and sampling plot for some small mammals.

Convert the data into a more tidy format.
25 changes: 25 additions & 0 deletions exercises/Tidy-data-improving-messy-data-R.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
---
layout: exercise
topic: Tidy Data
title: Improving Messy Data
language: R
---

A lot of real data isn't very tidy, mostly because most scientists aren't taught
about how to structure their data in a way that is easy to analyze.

[Download an untidy version]({{ site.baseurl }}/data/untidy-portal-data.xlsx)
of some of the Portal Project data, which includes information on the site, date,
species identification, weight and sampling plot (within the site) for some small mammals.

Think about what could be improved about this data and write down answers to the following questions:

1. Describe five things about this data that are not tidy and how you could
fix each of those issues.

2. Could this data easily be imported into a programming language or a
database in its current form?

3. Do you think it's a good idea to enter the data like this and clean it up
later, or to have a good data structure for analysis by the time data is
being entered? Why?
11 changes: 11 additions & 0 deletions lectures/R-tidy-data.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
layout: page
element: lecture
title: Tidy Data
language: R
---

1. [Accessing Excel Online at UF]({{ site.baseurl }}/materials/excel-online-uf)
2. [Tidy Data]({{ site.baseurl }}/materials/tidy-data)
3. [Data Entry]({{ site.baseurl }}/materials/data-entry)
4. [tidyr]({{ site.baseurl }}/materials/tidyr)
7 changes: 4 additions & 3 deletions schedule.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,10 @@ assignments:
"Making Choices",
"Repeating Things 1",
"Repeating Things 2",
"Class Choice",
"Class Choice",
"Class Choice",
"Spatial Data 1",
"Spatial Data 2",
"AI Coding Assistance/Assistants",
"Tidy Data",
"Fall Break",
"Class Choice",
]
Expand Down
Loading