Skip to content

Commit e1d2add

Browse files
committedJul 1, 2019
Merge branch 'master' of github.com:swcarpentry/r-novice-inflammation
2 parents b4e77f5 + 7aa4e9b commit e1d2add

File tree

1 file changed

+74
-26
lines changed

1 file changed

+74
-26
lines changed
 

‎_episodes_rmd/13-supp-data-structures.Rmd

+74-26
Original file line numberDiff line numberDiff line change
@@ -7,14 +7,18 @@ questions:
77
- "What are the different data structures in R?"
88
- "How do I access data within the various data structures?"
99
objectives:
10-
- "Expose learners to the different data types in R."
10+
- "Expose learners to the different data types in R and show how these data
11+
types are used in data structures."
1112
- "Learn how to create vectors of different types."
1213
- "Be able to check the type of vector."
1314
- "Learn about missing data and other special values."
14-
- "Getting familiar with the different data structures (lists, matrices, data frames)."
15+
- "Get familiar with the different data structures (lists, matrices, data frames)."
1516
keypoints:
1617
- "R's basic data types are character, numeric, integer, complex, and logical."
17-
- "R's basic data structures include the vector, list, matrix, data frame, and factors."
18+
- "R's basic data structures include the vector, list, matrix, data frame, and
19+
factors. Some of these structures require that all members be of the same data
20+
type (e.g. vectors, matrices) while others permit multiple data types (e.g.
21+
lists, data frames)."
1822
- "Objects may have attributes, such as name, dimension, and class."
1923
source: Rmd
2024
---
@@ -24,27 +28,30 @@ source("../bin/chunk-options.R")
2428
knitr_fig_path("13-supp-data-structures-")
2529
```
2630

27-
### Understanding Basic Data Types in R
31+
### Understanding Basic Data Types and Data Structures in R
2832

2933
To make the best of the R language, you'll need a strong understanding of the
30-
basic data types and data structures and how to operate on those.
34+
basic data types and data structures and how to operate on them.
3135

32-
Very important to understand because these are the objects you will manipulate
33-
on a day-to-day basis in R. Dealing with object conversions is one of the most
34-
common sources of frustration for beginners.
36+
Data structures are very important to understand because these are the objects you
37+
will manipulate on a day-to-day basis in R. Dealing with object conversions is one
38+
of the most common sources of frustration for beginners.
3539

3640
**Everything** in R is an object.
3741

38-
R has 6 (although we will not discuss the raw class for this workshop) atomic
39-
vector types.
42+
R has 6 basic data types. (In addition to the five listed below, there is also
43+
*raw* which will not be discussed in this workshop.)
4044

4145
* character
4246
* numeric (real or decimal)
4347
* integer
4448
* logical
4549
* complex
4650

47-
By *atomic*, we mean the vector only holds data of a single type.
51+
Elements of these data types may be combined to form data structures, such as
52+
atomic vectors. When we call a vector *atomic*, we mean that the vector only
53+
holds data of a single data type. Below are examples of atomic character vecotrs,
54+
numeric vectors, integer vectors, etc.
4855

4956
* **character**: `"a"`, `"swc"`
5057
* **numeric**: `2`, `15.5`
@@ -84,7 +91,7 @@ R has many __data structures__. These include
8491
* data frame
8592
* factors
8693

87-
### Atomic Vectors
94+
### Vectors
8895

8996
A vector is the most common and basic data structure in R and is pretty much the
9097
workhorse of R. Technically, vectors can be one of two types:
@@ -263,23 +270,52 @@ nchar("Software Carpentry")
263270

264271
In R matrices are an extension of the numeric or character vectors. They are not
265272
a separate type of object but simply an atomic vector with dimensions; the
266-
number of rows and columns.
273+
number of rows and columns. As with atomic vectors, the elements of a matrix must
274+
be of the same data type.
267275

268276
```{r}
269277
m <- matrix(nrow = 2, ncol = 2)
270278
m
271279
dim(m)
272280
```
273281

274-
You can check that matrices are vectors with a class attribute of `matrix` by using `class()` and `typeof()`.
282+
You can check that matrices are vectors with a class attribute of `matrix` by using
283+
`class()` and `typeof()`.
275284

276285
```{r}
277286
m <- matrix(c(1:3))
278287
class(m)
279288
typeof(m)
280289
```
281290

282-
While `class()` shows that m is a matrix, `typeof()` shows that fundamentally the matrix is an integer vector.
291+
While `class()` shows that m is a matrix, `typeof()` shows that fundamentally the
292+
matrix is an integer vector.
293+
294+
> ## Data types of matrix elements
295+
>
296+
> Consider the following matrix:
297+
>
298+
> ```{r matrix-typeof}
299+
> FOURS <- matrix(
300+
> c(4, 4, 4, 4),
301+
> nrow = 2,
302+
> ncol = 2)
303+
> ```
304+
>
305+
> Given that `typeof(FOURS[1])` returns `"double"`, what would you expect
306+
> `typeof(FOURS)` to return? How do you know this is the case even without
307+
> running this code?
308+
>
309+
> *Hint* Can matrices be composed of elements of different data types?
310+
>
311+
> > ## Solution
312+
> > We know that `typeof(FOURS)` will also return `"double"` since matrices
313+
> > are made of elements of the same data type. Note that you could do
314+
> > something like `as.character(FOURS)` if you needed the elements of `FOURS`
315+
> > *as characters*.
316+
>
317+
> {: .solution}
318+
{: .challenge}
283319
284320
Matrices in R are filled column-wise.
285321
@@ -296,7 +332,8 @@ dim(m) <- c(2, 5)
296332

297333
This takes a vector and transforms it into a matrix with 2 rows and 5 columns.
298334

299-
Another way is to bind columns or rows using `cbind()` and `rbind()`.
335+
Another way is to bind columns or rows using `rbind()` and `cbind()` ("row bind"
336+
and "column bind", respectively).
300337

301338
```{r}
302339
x <- 1:3
@@ -305,7 +342,8 @@ cbind(x, y)
305342
rbind(x, y)
306343
```
307344

308-
You can also use the `byrow` argument to specify how the matrix is filled. From R's own documentation:
345+
You can also use the `byrow` argument to specify how the matrix is filled.
346+
From R's own documentation:
309347

310348
```{r}
311349
mdat <- matrix(c(1, 2, 3, 11, 12, 13),
@@ -315,7 +353,8 @@ mdat <- matrix(c(1, 2, 3, 11, 12, 13),
315353
mdat
316354
```
317355

318-
Elements of a matrix can be referenced by specifying the index along each dimension (e.g. "row" and "column") in single square brackets.
356+
Elements of a matrix can be referenced by specifying the index along each
357+
dimension (e.g. "row" and "column") in single square brackets.
319358

320359
```{r}
321360
mdat[2, 3]
@@ -398,22 +437,25 @@ names(xlist)
398437
> {: .solution}
399438
{: .challenge}
400439
401-
Lists can be extremely useful inside functions. Because the functions in R are able to return only a single object, you can "staple" together lots
402-
of different kinds of results into a single object that a function can return.
440+
Lists can be extremely useful inside functions. Because the functions in R are
441+
able to return only a single object, you can "staple" together lots of different
442+
kinds of results into a single object that a function can return.
403443
404444
A list does not print to the console like a vector. Instead, each element of the
405445
list starts on a new line.
406446
407447
Elements are indexed by double brackets. Single brackets will still return
408-
a(nother) list. If the elements of a list are named, they can be referenced by the `$` notation (i.e. `xlist$data`).
448+
a(nother) list. If the elements of a list are named, they can be referenced by
449+
the `$` notation (i.e. `xlist$data`).
409450
410451
411452
### Data Frame
412453
413454
A data frame is a very important data type in R. It's pretty much the *de facto*
414455
data structure for most tabular data and what we use for statistics.
415456
416-
A data frame is a *special type of list* where every element of the list has same length (i.e. data frame is a "rectangular" list).
457+
A data frame is a *special type of list* where every element of the list has same
458+
length (i.e. data frame is a "rectangular" list).
417459
418460
Data frames can have additional attributes such as `rownames()`, which can be
419461
useful for annotating data, like `subject_id` or `sample_id`. But most of the
@@ -455,27 +497,33 @@ is.list(dat)
455497
class(dat)
456498
```
457499

458-
Because data frames are rectangular, elements of data frame can be referenced by specifying the row and the column index in single square brackets (similar to matrix).
500+
Because data frames are rectangular, elements of data frame can be referenced by specifying
501+
the row and the column index in single square brackets (similar to matrix).
459502

460503
```{r}
461504
dat[1, 3]
462505
```
463506

464-
As data frames are also lists, it is possible to refer to columns (which are elements of such list) using the list notation, i.e. either double square brackets or a `$`.
507+
As data frames are also lists, it is possible to refer to columns (which are elements of
508+
such list) using the list notation, i.e. either double square brackets or a `$`.
465509

466510
```{r}
467511
dat[["y"]]
468512
dat$y
469513
```
470514

471-
The following table summarizes the one-dimensional and two-dimensional data structures in R in relation to diversity of data types they can contain.
515+
The following table summarizes the one-dimensional and two-dimensional data structures in
516+
R in relation to diversity of data types they can contain.
472517

473518
| Dimensions | Homogenous | Heterogeneous |
474519
| ------- | ---- | ---- |
475520
| 1-D | atomic vector | list |
476521
| 2-D | matrix | data frame |
477522

478-
> Lists can contain elements that are themselves muti-dimensional (e.g. a lists can contain data frames or another type of objects). Lists can also contain elements of any length, therefore list do not necessarily have to be "rectangular". However in order for the list to qualify as a data frame, the lenghth of each element has to be the same.
523+
> Lists can contain elements that are themselves muti-dimensional (e.g. a lists can contain
524+
> data frames or another type of objects). Lists can also contain elements of any length,
525+
> therefore list do not necessarily have to be "rectangular". However in order for the list
526+
> to qualify as a data frame, the lenghth of each element has to be the same.
479527
{: .callout}
480528

481529
> ## Column Types in Data Frames

0 commit comments

Comments
 (0)
Please sign in to comment.