84-algebra.Rmd

# Numerical Linear Algebra {#algebra}

In your algebra courses you would write $Ax=b$ and solve $x=A^{-1}b$. 
This is useful to understand the algebraic properties of $x$, but a computer would never recover $x$ that way. 
Even the computation of the sample variance, $S^2(x)=(n-1)^{-1}\sum (x_i-\bar x )^2$ is not solved that way in a computer, because of numerical and speed considerations.

In this chapter, we discuss several ways a computer solves systems of linear equations, with their application to statistics, namely, to OLS problems.  


## LU Factorization

```{definition, label="lu", name="LU Factorization"}
For some matrix $A$, the LU factorization is defined as 
\begin{align}
 A = L U 
\end{align}
where $L$ is unit lower triangular and $U$ is upper triangular. 
```

The LU factorization is essentially the matrix notation for the [Gaussian elimination](https://en.wikipedia.org/wiki/Gaussian_elimination) you did in your introductory algebra courses. 

For a square $n \times n$ matrix, the LU factorization requires $n^3/3$ operations, and stores $n^2+n$ elements in memory. 


## Cholesky Factorization

```{definition, label="nonnegative", name="Non Negative Matrix"}
A matrix $A$ is said to be _non-negative_ if $x'Ax \geq 0$ for all $x$.
```

Seeing the matrix $A$ as a function, non-negative matrices can be thought of as functions that generalize the _squaring_ operation.


```{definition, label="cholesky", name="Cholesky Factorization"}
For some non-negative matrix $A$, the Cholesky factorization is defined as 
\begin{align}
 A = T' T 
\end{align}
where $T$ is upper triangular with positive diagonal elements.
```

For obvious reasons, the Cholesky factorization is known as the _square root_ of a matrix. 

Because Cholesky is less general than LU, it is also more efficient.
It can be computed in $n^3/6$ operations, and requires storing $n(n+1)/2$ elements.


## QR Factorization

```{definition, label="qr", name="QR Factorization"}
For some matrix $A$, the QR factorization is defined as 
\begin{align}
 A = Q R 
\end{align}
where $Q$ is orthogonal and $R$ is upper triangular. 
```

The QR factorization is very useful to solve the OLS problem as we will see in \@ref(solving-ols).
The QR factorization takes $2n^3/3$ operations to compute. 
Three major methods for computing the QR factorization exist. These rely on _Householder transformations_, _Givens transformations_, and a (modified) _Gram-Schmidt procedure_ [@gentle2012numerical].


## Singular Value Factorization

```{definition, label="svd", name="SVD"}
For an arbitrary $n\times m$ matrix  $A$, the _singular valued decomposition_ (SVD), is defined as \begin{align}
 A = U \Sigma V' 
\end{align}
where $U$ is an orthonormal $n \times n$ matrix, $V$ is an $m \times m$ orthonormal matrix, and $\Sigma$ is diagonal.
```

The SVD factorization is very useful for algebraic analysis, but less so for computations.
This is because it is (typically) solved via the QR factorization.


## Iterative Methods

The various matrix factorizations above may be used to solve a system of linear equations, and in particular, the OLS problem. 
There is, however, a very different approach to solving systems of linear equations. 
This approach relies on the fact that solutions of linear systems of equations, can be cast as optimization problems: simply find $x$ by minimizing  $\Vert Ax-b \Vert$.

Some methods for solving (convex) optimization problems are reviewed in the Convex Optimization Chapter \@ref(convex).
For our purposes we will just mention that historically (this means in the `lm` function, and in the LAPACK numerical libraries) the factorization approach was preferred, and now optimization approaches are preferred. 
This is because the optimization approach is more numerically stable, and easier to parallelize. 


## Solving the OLS Problem {#solving-ols}

Recalling the OLS problem in Eq.\@ref(eq:ols-matrix): we wish to find $\beta$ such that  
\begin{align}
  \hat \beta:= argmin_\beta \{ \Vert y-X\beta \Vert^2_2 \}.
\end{align}

The solution, $\hat \beta$ that solves this problem has to satisfy 
\begin{align}
  X'X \beta = X'y.
  (\#eq:normal-equations)
\end{align}
Eq.\@ref(eq:normal-equations) are known as the _normal equations_.
The normal equations are the link between the OLS problem, and the matrix factorization discussed above. 


Using the QR decomposition in the normal equations we have that 
\begin{align*}
  \hat \beta = R_{(1:p,1:p)}^{-1} y,
\end{align*}
where $(R_{n\times p})=(R_{(1:p,1:p)},0_{(p+1:n,1:p)})$ is the 


## Numerical Libraries for Linear Algebra

TODO. 

### BLAS

#### OpenBLAS

#### Accelerate 

For Apple computers. 

#### AMD Optimizing CPU Libraries

For AMD processors. 

#### Arm Performance Libraries

For ARM processors. 


#### ATLAS

#### Intel Math Kernel Library (MKL)

For Intel processors. 

#### nvBLAS

For nVIDIA GPUs. 

### LAPACK

#### PLASMA

#### MAGMA


In the meanwhile:
[comparison of numerical libraries](https://en.wikipedia.org/wiki/Comparison_of_linear_algebra_libraries);
[installing MKL in Ubnutu](http://dirk.eddelbuettel.com/blog/2018/04/15/#018_mkl_for_debian_ubuntu);
[how to speed-up linear algebra in R](https://www.r-bloggers.com/why-is-r-slow-some-explanations-and-mklopenblas-setup-to-try-to-fix-this/);
[and another](https://www.r-bloggers.com/for-faster-r-use-openblas-instead-better-than-atlas-trivial-to-switch-to-on-ubuntu/);
[install Open-Blas](https://gist.github.com/pachamaltese/e4b819ccf537d465a8d49e6d60252d89);


## Bibliographic Notes
For an excellent introduction to numerical algorithms in statistics, see @weihs2013foundations.
For an emphasis on numerical linear algebra, see @gentle2012numerical, and @golub2012matrix.

## Practice Yourself