forked from rdpeng/RepData_PeerAssessment1
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathPA1_template.Rmd
128 lines (92 loc) · 3.94 KB
/
PA1_template.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
---
title: "Reproducible Research: Peer Assessment 1"
output:
html_document:
keep_md: true
---
## Loading and preprocessing the data
```{r warning = FALSE}
library(ggplot2)
```
To load and process the data, I used the read.csv() method specifing the type of data for each column:
```{r}
myData <- read.csv("activity.csv", colClasses = c("integer", "Date", "integer"))
```
## What is mean total number of steps taken per day?
To calculate the total steps taken per day I use te tapply method and compute a histogram of it:
```{r warning = FALSE}
StepsPerDay <- tapply(myData$steps, myData$date, sum)
qplot(StepsPerDay, binwidth = 1000, xlab = "Steps taken per Day", main = "Steps taken per Day Histogram")
```
Next I calculate the mean and median of the total number of steps taken per day:
```{r}
meanSteps <- mean(StepsPerDay, na.rm = T)
medianSteps <- median(StepsPerDay, na.rm = T)
```
The mean is: `r as.character(round(meanSteps, 2))`
The median is: `r medianSteps`
## What is the average daily activity pattern?
To calculate the average daily activity pattern I use the aggregate method to create a new dataframe with the mean steps by minute interval and plot the pattern using the ggplot system:
```{r}
meanStepsbyInterval <- aggregate(x=list(steps=myData$steps), by=list(interval=myData$interval),FUN=mean, na.rm=TRUE)
ggplot(meanStepsbyInterval, aes(x=interval, y=steps)) +
geom_line() +
ggtitle("Average Daily Activity Pattern") +
xlab("minute interval") +
ylab("average number of steps")
maxStepsInterval <- meanStepsbyInterval$interval[meanStepsbyInterval$steps == max(meanStepsbyInterval$steps)]
```
The `r maxStepsInterval` minute interval has the maximun number of steps taken in average with `r round(max(meanStepsbyInterval$steps), 2)` steps
## Imputing missing values
First I calculate the total number of NAs:
```{r}
totalNas <- sum(is.na(myData$steps))
```
The total number of rows with NAs values is `r totalNas`
Filling the NAs:
```{r}
fillNAs <- function(interval, steps){
newValue <- NA
if (is.na(steps))
newValue <- as.integer(meanStepsbyInterval$steps[meanStepsbyInterval$interval == interval])
else
newValue <- steps
return(newValue)
}
myNewData <- myData
myNewData$steps <- mapply(FUN=fillNAs, myNewData$interval, myNewData$steps)
```
Calculate histogram:
```{r}
NewStepsPerDay <- tapply(myNewData$steps, myNewData$date, sum)
qplot(NewStepsPerDay, binwidth = 1000, xlab = "Steps taken per Day", main = "Steps taken per Day Histogram")
```
Next I calculate the mean and median of the total number of steps taken per day:
```{r}
NewMeanSteps <- mean(NewStepsPerDay, na.rm = T)
NewMedianSteps <- median(NewStepsPerDay, na.rm = T)
```
The mean is: `r as.character(round(NewMeanSteps, 2))`
The median is: `r NewMedianSteps`
The values do differ from the calculated above, this is mainly because in the first calculation all the NAs where ignored, therefore, the total amount of data is less compared to the data with the filled NAs.
## Are there differences in activity patterns between weekdays and weekends?
First I create a new factor variable by using the sapply method ann the funtion weekday below:
```{r}
weekday <- function(day){
if (day == "Saturday" | day == "Sunday")
return("weekend")
else
return("weekday")
}
myNewData$day <- factor(sapply(weekdays(myNewData$date), FUN=weekday))
```
Then I create a new dataframe with the mean steps by minute interval and week or weekend day and plot the pattern using the ggplot system in two panels:
```{r}
newMeanStepsbyInterval <- aggregate(x=list(steps=myNewData$steps), by=list(interval=myNewData$interval, day=myNewData$day),FUN=mean, na.rm=TRUE)
ggplot(newMeanStepsbyInterval, aes(x=interval, y=steps)) +
geom_line() + facet_grid(day~.) +
ggtitle("Average Daily Activity Pattern") +
xlab("minute interval") +
ylab("average number of steps")
```
In the above plot we can see that the patterns vary slightly between weekdays and weekend days.