intro2geostats/lesson3.Rmd at master · guy-wafeu/intro2geostats · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
---
title: "Non-geostatistical prediction"
output:
  html_document:
    toc: true
    toc_float: true
    toc_collapsed: true
    toc_depth: 3
    number_sections: false
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```


## Data

Load the packages and data we will use.

```{r n1, message=FALSE, results= "hide"}
rm(list=ls(all=TRUE))
unlink(".RData")
pacman::p_load(sp, rgdal,rgeos,tmap,raster)
shp <- readOGR("./data/soil_data_CIAT.shp")
plot(shp, pch = 19, col = "red")
```

View coordinate reference system and extents of the shapefile.

```{r n2}
crs(shp)
extent(shp)
```


## Thiessen polygons

- Thiessen polygons are formed to assign boundaries of the areas closest to each unique point. Therefore, for every point in a dataset, it has a corresponding Thiessen polygon.
- We will create Thiessen polygons for the soil data, then use the polygons to map the Clay at their corresponding locations.
- The *spatstat* package provides the functionality to produce Thiessen polygons via its dirichlet tessellation of spatial point patterns function (dirichlet()).
- We also need to first convert the data to a ppp (point pattern) object class. The *maptools* package will help us achive this via  as.ppp() function.

Load necessary packages and convert data to point pattern.

```{r n3, message=FALSE}
library(spatstat)  # Used for the dirichlet tessellation function
library(maptools)  # Used for conversion from SPDF to ppp
library(raster)    # Used to clip out thiessen polygons

# Create a tessellated surface
dat.pp <- as(dirichlet(as.ppp(shp)), "SpatialPolygons")
dat.pp <- as(dat.pp,"SpatialPolygons")

```

Define the projection to the ppp object.

```{r n4, message=FALSE}
proj4string(dat.pp) <- crs(shp)
plot(dat.pp, pch = 19)
```

The Polygons are extended beyond points boundary so get the boundary and clip the Thiesen polygon

```{r n4b, message=FALSE, results= "hide"}
bdy <- readOGR("./data/soil_boundary.shp")
dat.pp <- intersect(dat.pp, bdy)
plot(dat.pp, pch = 19)

```


Assign to each polygon the Clay information.

```{r n5, message=FALSE}
int.Z <- over(dat.pp, shp[,"Clay"], fn=mean, na.rm=TRUE)

```

Map Thiessen polygons and Clay sampled points.

```{r n6}
tm_shape(dat.pp) +  tm_borders(alpha=.5, col = "black") +
  tm_shape(shp) + tm_dots(col = "red", scale = 2.5, shape = 16)

```

Create a *SpatialPolygonsDataFrame* object with Clay content,

```{r n6a}
thiessen <- SpatialPolygonsDataFrame(dat.pp, int.Z)
```


then visualize our point data using the newly formed Thiessen polygons.

```{r n7}
tm_shape(thiessen) + tm_fill(col = "Clay", style = "quantile", palette = "Reds", title = "Clay content") + tm_borders(alpha=.3, col = "black") +
tm_shape(shp) + tm_dots(col = "black", scale = 2.5) +
tm_layout(legend.position = c("left", "top"),  legend.text.size = 1.05, legend.title.size = 1.2, frame = FALSE)

```


## Inverse Distance Weighting (IDW)

- IDW is a means of converting point data of numerical values into a continuous surface to visualise how the *data* may be distributed across space.
- The technique interpolates point data by using a weighted average of a variable from nearby points to predict the value of that variable for each location (de Smith et al, 2018).
- The weighting of the points is determined by their inverse distances drawing on Tobler's first law of geography.

Load necessary packages and define sample grid based on the extent of data.

```{r idw1, webgl=TRUE, message=FALSE}
unloadNamespace("spatstat")
pacman::p_load(gstat)
grid <-spsample(bdy, type = 'regular', n = 10000)
```

Perform Clay content interpolation using IDW fucntion in Gstat package.

```{r idw2}
idw <- idw(shp$Clay ~ 1, shp, newdata= grid)
```


Before we visualize the IDW result we transform the data into a data frame object. We can then rename the column headers.

```{r idw3}
idw.output = as.data.frame(idw)
names(idw.output)[1:3] <- c("long", "lat", "prediction")
```


Convert the output into a raster in order to visualize the output.

```{r idw4, message=FALSE}
pacman::p_load(raster)
# create spatial points data frame
spg <- idw.output
coordinates(spg) <- ~ long + lat
# coerce to SpatialPixelsDataFrame
gridded(spg) <- TRUE
# coerce to raster
raster_idw <- raster(spg)
# sets projection to British National Grid
projection(raster_idw) <- crs(shp)
```

we can quickly plot the raster to check if its okay.

```{r idw5}
plot(raster_idw, main= "Clay content")
```

Even 3D plots can be used to visualize the predictions using the `raster::persp()` function..

```{r idw6}
persp(raster_idw, col=terrain.colors(100, rev = TRUE))
# this package lets us create interactive 3d visualisations
pacman::p_load(rgl)
# we need to first convert the raster to a matrix object type
idw2 <- as.matrix(raster_idw)
persp3d(idw2, col = "red")
```

Now we are ready to plot the raster. We will use the functionality of the tmap package, using the tm_raster() function. The instructions below will create a smoothed surface. In the example we have created 100 colour breaks and turned the legend off, this will create a smoothed colour effect. You can experiment by changing the breaks (n) to 7 and turning the legend back on (legend.show) to make the predicted values easier to interpret.


```{r idw7, message=FALSE}
pacman::p_load(tmap)
tm_shape(raster_idw) + tm_raster("prediction", style = "quantile", n = 100, palette = "Reds", legend.show = FALSE) +
#Overlay the  boundary to provide some orientation
tm_shape(bdy) + tm_borders(alpha=1, col = "blue")
```


Overlay the original Clay data as this provides a good demonstration of how the IDW is distributed.

```{r idw8}
tm_shape(raster_idw) + tm_raster("prediction", style = "quantile", n = 100, palette = "-Reds",legend.show = FALSE) +
tm_shape(bdy) + tm_borders(alpha=1, col="black") +
tm_shape(shp) + tm_bubbles(size = "Clay", col = "Clay", palette = "-Purples", contrast=1, style = "quantile", legend.size.show = FALSE, title.col = "Clay content (%)") +
tm_layout(legend.position = c("left", "top"),  legend.text.size = 1.1, legend.title.size = 1.4, frame = FALSE, legend.bg.color = "white", legend.bg.alpha = 0.5)

```


## Optimizing IDW predictions

The choice of power function `idp` can be subjective. To fine-tune the choice of the power parameter, you can perform a **leave-one-out cross validation** routine to measure the error in the interpolated values.

```{r o1, message=FALSE, results='hide'}
IDW.out <- vector(length = length(shp))
for (i in 1:length(shp)) {
  IDW.out[i] <- idw(Clay ~ 1, shp[-i,], shp[i,], idp=2.0)$var1.pred
}

```

Plot the differences in a scatterplot to evaluate accuracy. If not sufficient you can change the `idp` parameter and evaluate again.

```{r o2}
OP <- par(pty="s", mar=c(4,3,0,0))
  plot(IDW.out ~ shp$Clay, asp=1, xlab="Observed", ylab="Predicted", pch=16,
       col=rgb(0,0,0,0.5))
  abline(lm(IDW.out ~ shp$Clay), col="red", lw=2,lty=2)
  abline(0,1)
par(OP)

```

We can also compute Root Mean Square Error (RMSE) can be computed from `IDW.out`.

```{r rmse}
sqrt( sum((idw.output$prediction - shp$Clay)^2) / length(shp$Clay))

```

## Cross-validation

We can also create a 95% confidence interval map of the interpolation model to illustrate uncertainity around it. Here we’ll use jackknife technique to estimate a confidence interval at each unsampled point. Frist create an interpolation surface.

```{r cv1}
img <- gstat::idw(Clay~1, shp, newdata=grid, idp=2.0)
n   <- length(shp)
Zi  <- matrix(nrow = length(img$var1.pred), ncol = n)

```

Leave out a point then perform interpolation (do this *n* times for each point).

```{r cv2, message=FALSE, results='hide'}
st <- stack()
for (i in 1:n){
  Z1 <- gstat::idw(Clay~1, shp[-i,], newdata=grid, idp=2.0)
  # coerce to SpatialPixelsDataFrame
  gridded(Z1) <- TRUE
  # coerce to raster
  st <- addLayer(st, raster(Z1, layer=1))
  # Calculated pseudo-value Z at j
  Zi[,i] <- n * img$var1.pred - (n-1) * Z1$var1.pred
}

```


Compute jackknife estimator of parameter *Z* at location j.

```{r cv3}
Zj <- as.matrix(apply(Zi, 1, sum, na.rm=T) / n )

# Compute (Zi* - Zj)^2
c1 <- apply(Zi,2,'-',Zj)            # Compute the difference
c1 <- apply(c1^2, 1, sum, na.rm=T ) # Sum the square of the difference

# Compute the confidence interval
CI <- sqrt( 1/(n*(n-1)) * c1)

# Create (CI / interpolated value) raster
gridded(img) <- TRUE
img.sig   <- img
img.sig$v <- CI /img$var1.pred

```

Clip the confidence raster to boundary and plot the map.

```{r cv4}
r <- raster(img.sig,  layer="v")
r.m <- mask(r, bdy)

# Plot the map
tm_shape(r.m) + tm_raster(n = 10, style = "kmeans", palette = "-Blues",title="95% confidence interval \n(Clay [%])") +
  tm_shape(shp) + tm_dots(size=0.2) +
  tm_legend(legend.outside=TRUE)
```


## References

De Smith, M. J., Goodchild, M. F., & Longley, P. (2018). [*Geospatial analysis: a comprehensive guide to principles, techniques and software tools*](https://www.spatialanalysisonline.com/HTML/index.html). Troubador publishing ltd.