Skip to content

Sync up complete incomplete #91

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 8 additions & 8 deletions session1/intro_to_r_training.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -494,7 +494,7 @@ The format is dataframe name, $, variable name. Note that a vector is returned.



All variables have an associated class. The class will determine what calculations are possible with them and how R should treat them. So far, our dataset offenders has variables of three different classes; integer, number, and character. Other useful types are factor, logical and date.
All variables have an associated class. The class will determine what calculations are possible with them and how R should treat them. So far, our dataset iris has variables of three different classes; integer, number, and character. Other useful types are factor, logical and date.



Expand All @@ -510,7 +510,7 @@ class(iris$Sepal.Length)



It's possible to coerce variables from one class to another. We can change the Sepal.Length variable in the offenders dataset to be an integer variable as follows:
It's possible to coerce variables from one class to another. We can change the Sepal.Length variable in the iris dataset to be an integer variable as follows:



Expand Down Expand Up @@ -546,7 +546,7 @@ iris

Another common class is factors.

Factors are for categorical variables involving different levels. So for example, in the dataset 'iris', there are 3 levels of Species: setosa, versicolor, virginica. We can see this now when looking at the environment tab (after clicking the arrow to the left of offenders) and also the order from using the following command:
Factors are for categorical variables involving different levels. So for example, in the dataset 'iris', there are 3 levels of Species: setosa, versicolor, virginica. We can see this now when looking at the environment tab (after clicking the arrow to the left of iris) and also the order from using the following command:



Expand Down Expand Up @@ -638,6 +638,7 @@ iris$Species <- as.factor(iris$Species)
## 3 Data wrangling and 'group by' calculations



### 3.1 Filter

To start off with a simple data wrangling function; if you would like to produce statistics for a subset of rows or observations, a good function to use is filter() from the dplyr package.
Expand Down Expand Up @@ -780,6 +781,7 @@ sepal_length_average <- iris %>%

```


### 3.3 Select


Expand Down Expand Up @@ -820,7 +822,7 @@ iris_petals <- iris %>%



We can rename variables using the dplyr function rename(). Let's amend our above coding in creating the 'iris_petals' dataset so that Petal.Length is just calles P.Length, and Petal.Width is P.Width.
We can rename variables using the dplyr function rename(). Let's amend our above coding in creating the 'iris_petals' dataset so that Petal.Length is just called P.Length, and Petal.Width is P.Width.



Expand Down Expand Up @@ -868,7 +870,7 @@ iris_petals <- iris_petals %>%



Another useful function found in the dplyr package is if_else, which works in a similar way to if statements in Excel. This uses a logical statement to determine the output. The below code uses this to identify petals that are less than 2 cm long, the mutate function is used to add a variable in to the offenders dataset which is 1 if the petal is less than 2 cm and 0 if it is 2 cm or more.
Another useful function found in the dplyr package is if_else, which works in a similar way to if statements in Excel. This uses a logical statement to determine the output. The below code uses this to identify petals that are less than 2 cm long, the mutate function is used to add a variable in to the iris dataset which is 1 if the petal is less than 2 cm and 0 if it is 2 cm or more.



Expand Down Expand Up @@ -1011,7 +1013,7 @@ write_csv(iris_petals, path = "iris_petals.csv")



This assumes by default that the columns are separated by a comma symbol. The data will be saved as a CSV in your working directory to a file called `iris_petals.csv`.
This assumes by default that the columns are separated by a comma symbol. The data will be saved as a CSV in your working directory to a file called `iris_petals.csv`.



Expand All @@ -1037,8 +1039,6 @@ This assumes by default that the columns are separated by a comma symbol. The da

There are lots of resources that can help you develop your R knowledge, but below are a few that are particularly helpful:



+ Scottish Government 'Good Coding Practices': https://github.com/DataScienceScotland/good_practices/blob/main/coding.md

+ DataCamp is a website which hosts multiple online courses that teach coding. Their 'Introduction to R' course is free to complete and provides a broader overview in the basic concepts for coding in R. A link to the course can be found here: https://www.datacamp.com/courses/free-introduction-to-r.
Expand Down
6 changes: 3 additions & 3 deletions session1/intro_to_r_training_incomplete.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -494,7 +494,7 @@ The format is dataframe name, $, variable name. Note that a vector is returned.



All variables have an associated class. The class will determine what calculations are possible with them and how R should treat them. So far, our dataset offenders has variables of three different classes; integer, number, and character. Other useful types are factor, logical and date.
All variables have an associated class. The class will determine what calculations are possible with them and how R should treat them. So far, our dataset iris has variables of three different classes; integer, number, and character. Other useful types are factor, logical and date.



Expand All @@ -510,7 +510,7 @@ class(iris$Sepal.Length)



It's possible to coerce variables from one class to another. We can change the Sepal.Length variable in the offenders dataset to be an integer variable as follows:
It's possible to coerce variables from one class to another. We can change the Sepal.Length variable in the iris dataset to be an integer variable as follows:



Expand Down Expand Up @@ -546,7 +546,7 @@ iris

Another common class is factors.

Factors are for categorical variables involving different levels. So for example, in the dataset 'iris', there are 3 levels of Species: setosa, versicolor, virginica. We can see this now when looking at the environment tab (after clicking the arrow to the left of offenders) and also the order from using the following command:
Factors are for categorical variables involving different levels. So for example, in the dataset 'iris', there are 3 levels of Species: setosa, versicolor, virginica. We can see this now when looking at the environment tab (after clicking the arrow to the left of offender) and also the order from using the following command:



Expand Down
8 changes: 6 additions & 2 deletions session2/intro_to_R_session2.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -216,7 +216,7 @@ time_series_plot3
```


Now that we can see the plots, let's fit lines, to the points, and a straigh line for the trend
Now that we can see the plots, let's fit lines, to the points, and a straight line for the trend

```{r TimeSeriesPlot4}

Expand Down Expand Up @@ -247,7 +247,11 @@ time_series_plot5

```

Finally, we could perform the wrangling and plotting in a concise chunk
Finally, we could perform the wrangling and plotting in a concise chunk

Exercise:

See if you can accomplish the same thing as the chunks above to produce the time_series_plot5 object in one chunk

```{r AllAnalysis}
#Import
Expand Down
2 changes: 1 addition & 1 deletion session2/intro_to_R_session2_incomplete.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -252,7 +252,7 @@ time_series_plot5

```

Finally, we could perform the wrangling and plotting in a concise chunk
Finally, we could perform the wrangling and plotting in a concise chunk

Exercise:

Expand Down