-
Notifications
You must be signed in to change notification settings - Fork 8
/
Copy path84-algebra.Rmd
174 lines (102 loc) · 5.63 KB
/
84-algebra.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
# Numerical Linear Algebra {#algebra}
In your algebra courses you would write $Ax=b$ and solve $x=A^{-1}b$.
This is useful to understand the algebraic properties of $x$, but a computer would never recover $x$ that way.
Even the computation of the sample variance, $S^2(x)=(n-1)^{-1}\sum (x_i-\bar x )^2$ is not solved that way in a computer, because of numerical and speed considerations.
In this chapter, we discuss several ways a computer solves systems of linear equations, with their application to statistics, namely, to OLS problems.
## LU Factorization
```{definition, label="lu", name="LU Factorization"}
For some matrix $A$, the LU factorization is defined as
\begin{align}
A = L U
\end{align}
where $L$ is unit lower triangular and $U$ is upper triangular.
```
The LU factorization is essentially the matrix notation for the [Gaussian elimination](https://en.wikipedia.org/wiki/Gaussian_elimination) you did in your introductory algebra courses.
For a square $n \times n$ matrix, the LU factorization requires $n^3/3$ operations, and stores $n^2+n$ elements in memory.
## Cholesky Factorization
```{definition, label="nonnegative", name="Non Negative Matrix"}
A matrix $A$ is said to be _non-negative_ if $x'Ax \geq 0$ for all $x$.
```
Seeing the matrix $A$ as a function, non-negative matrices can be thought of as functions that generalize the _squaring_ operation.
```{definition, label="cholesky", name="Cholesky Factorization"}
For some non-negative matrix $A$, the Cholesky factorization is defined as
\begin{align}
A = T' T
\end{align}
where $T$ is upper triangular with positive diagonal elements.
```
For obvious reasons, the Cholesky factorization is known as the _square root_ of a matrix.
Because Cholesky is less general than LU, it is also more efficient.
It can be computed in $n^3/6$ operations, and requires storing $n(n+1)/2$ elements.
## QR Factorization
```{definition, label="qr", name="QR Factorization"}
For some matrix $A$, the QR factorization is defined as
\begin{align}
A = Q R
\end{align}
where $Q$ is orthogonal and $R$ is upper triangular.
```
The QR factorization is very useful to solve the OLS problem as we will see in \@ref(solving-ols).
The QR factorization takes $2n^3/3$ operations to compute.
Three major methods for computing the QR factorization exist. These rely on _Householder transformations_, _Givens transformations_, and a (modified) _Gram-Schmidt procedure_ [@gentle2012numerical].
## Singular Value Factorization
```{definition, label="svd", name="SVD"}
For an arbitrary $n\times m$ matrix $A$, the _singular valued decomposition_ (SVD), is defined as \begin{align}
A = U \Sigma V'
\end{align}
where $U$ is an orthonormal $n \times n$ matrix, $V$ is an $m \times m$ orthonormal matrix, and $\Sigma$ is diagonal.
```
The SVD factorization is very useful for algebraic analysis, but less so for computations.
This is because it is (typically) solved via the QR factorization.
## Iterative Methods
The various matrix factorizations above may be used to solve a system of linear equations, and in particular, the OLS problem.
There is, however, a very different approach to solving systems of linear equations.
This approach relies on the fact that solutions of linear systems of equations, can be cast as optimization problems: simply find $x$ by minimizing $\Vert Ax-b \Vert$.
Some methods for solving (convex) optimization problems are reviewed in the Convex Optimization Chapter \@ref(convex).
For our purposes we will just mention that historically (this means in the `lm` function, and in the LAPACK numerical libraries) the factorization approach was preferred, and now optimization approaches are preferred.
This is because the optimization approach is more numerically stable, and easier to parallelize.
## Solving the OLS Problem {#solving-ols}
Recalling the OLS problem in Eq.\@ref(eq:ols-matrix): we wish to find $\beta$ such that
\begin{align}
\hat \beta:= argmin_\beta \{ \Vert y-X\beta \Vert^2_2 \}.
\end{align}
The solution, $\hat \beta$ that solves this problem has to satisfy
\begin{align}
X'X \beta = X'y.
(\#eq:normal-equations)
\end{align}
Eq.\@ref(eq:normal-equations) are known as the _normal equations_.
The normal equations are the link between the OLS problem, and the matrix factorization discussed above.
Using the QR decomposition in the normal equations we have that
\begin{align*}
\hat \beta = R_{(1:p,1:p)}^{-1} y,
\end{align*}
where $(R_{n\times p})=(R_{(1:p,1:p)},0_{(p+1:n,1:p)})$ is the
## Numerical Libraries for Linear Algebra
TODO.
### BLAS
#### OpenBLAS
#### Accelerate
For Apple computers.
#### AMD Optimizing CPU Libraries
For AMD processors.
#### Arm Performance Libraries
For ARM processors.
#### ATLAS
#### Intel Math Kernel Library (MKL)
For Intel processors.
#### nvBLAS
For nVIDIA GPUs.
### LAPACK
#### PLASMA
#### MAGMA
In the meanwhile:
[comparison of numerical libraries](https://en.wikipedia.org/wiki/Comparison_of_linear_algebra_libraries);
[installing MKL in Ubnutu](http://dirk.eddelbuettel.com/blog/2018/04/15/#018_mkl_for_debian_ubuntu);
[how to speed-up linear algebra in R](https://www.r-bloggers.com/why-is-r-slow-some-explanations-and-mklopenblas-setup-to-try-to-fix-this/);
[and another](https://www.r-bloggers.com/for-faster-r-use-openblas-instead-better-than-atlas-trivial-to-switch-to-on-ubuntu/);
[install Open-Blas](https://gist.github.com/pachamaltese/e4b819ccf537d465a8d49e6d60252d89);
## Bibliographic Notes
For an excellent introduction to numerical algorithms in statistics, see @weihs2013foundations.
For an emphasis on numerical linear algebra, see @gentle2012numerical, and @golub2012matrix.
## Practice Yourself