docs rewording and reformatting

lostella · lostella · commit 407c8ff09db1 · 2018-03-11T20:09:55.000+01:00
diff --git a/docs/src/expressions.md b/docs/src/expressions.md
@@ -1,9 +1,9 @@
 # Expressions
 
-With `StructuredOptimization.jl` you can easily create mathematical expressions. 
-
-Firstly, [Variables](@ref) must be defined: various [Mappings](@ref) can then be applied 
-following the application of [Functions and constraints](@ref) to create the `Term`s  that define the optimization problem. 
+With `StructuredOptimization.jl` you can easily create mathematical expressions.
+Firstly, [Variables](@ref) must be defined: various [Mappings](@ref) can then
+be applied following the application of [Functions and constraints](@ref) to
+create the `Term`s  that define the optimization problem.
 
 ## Variables
 
@@ -12,9 +12,9 @@ following the application of [Functions and constraints](@ref) to create the `Te
 ```@docs
 Variable
 ```
-!!! note 
+!!! note
 
-    `StructuredOptimization.jl` supports complex variables. It is possible to create them by specifying the type 
+    `StructuredOptimization.jl` supports complex variables. It is possible to create them by specifying the type
     `Variable(Complex{Float64}, 10)` or by initializing them with a complex array `Variable(randn(10)+im*randn(10))`.
 
 ### Utilities
@@ -39,11 +39,11 @@ eltype
 
 ## Mappings
 
-As shown in the [Quick tutorial guide](@ref) it is possible to apply different mappings to the variables 
-using a simple syntax. 
+As shown in the [Quick tutorial guide](@ref) it is possible to apply different mappings to the variables
+using a simple syntax.
 
-Alternatively, as shown in [Multiplying expressions](@ref), it is possible to define the mappings using 
-[`AbstractOperators.jl`](https://github.com/kul-forbes/ProximalAlgorithms.jl) and to apply them 
+Alternatively, as shown in [Multiplying expressions](@ref), it is possible to define the mappings using
+[`AbstractOperators.jl`](https://github.com/kul-forbes/ProximalAlgorithms.jl) and to apply them
 to the variable (or expression) through multiplication.
 
 ### Basic mappings
@@ -80,12 +80,11 @@ sigmoid
 
 ## Utilities
 
-It is possible to access the variables, mappings and displacement of an expression. 
+It is possible to access the variables, mappings and displacement of an expression.
 Notice that these commands work also for the `Term`s described in [Functions and constraints](@ref).
 
 ```@docs
 variables
 operator
 displacement
 ```
-
diff --git a/docs/src/functions.md b/docs/src/functions.md
@@ -1,8 +1,7 @@
 # Functions and constraints
 
-Once an expression is created it is possible to create the `Term`s defining the optimization problem. 
-
-These can consists of either [Smooth functions](@ref),  [Nonsmooth functions](@ref), [Inequality constraints](@ref) 
+Once an expression is created it is possible to create the `Term`s defining the optimization problem.
+These can consists of either [Smooth functions](@ref),  [Nonsmooth functions](@ref), [Inequality constraints](@ref)
 or [Equality constraints](@ref).
 
 ## Smooth functions
@@ -39,21 +38,22 @@ hingeloss
 
 ## Smoothing
 
-Sometimes the optimization problem might involve only non-smooth terms which do not lead to efficient proximal mappings. It is possible to *smooth* this terms by means of the *Moreau envelope*.
+Sometimes the optimization problem might involve non-smooth terms which
+do not have efficiently computable proximal mappings.
+It is possible to *smoothen* these terms by means of the *Moreau envelope*.
 
 ```@docs
 smooth
 ```
 
 ## Duality
 
-In some cases it is more convenient to solve the *dual problem* instead of the primal problem. 
-
-It is possible to convert the primal problem into its dual form by means of the *convex conjugate*. 
+In some cases it is more convenient to solve the *dual problem* instead
+of the primal problem. It is possible to convert a problem into its dual
+by means of the *convex conjugate*.
 
 See the [Total Variation demo](https://github.com/kul-forbes/StructuredOptimization.jl/blob/master/demos/TotalVariationDenoising.ipynb) for an example of such procedure.
 
 ```@docs
 conj
 ```
-
diff --git a/docs/src/solvers.md b/docs/src/solvers.md
@@ -8,21 +8,17 @@
 
 !!! note "Problem warm-starting"
 
-    By default *warm-starting* is always enabled. 
-
-    For example, if two problems that utilize the same variables are solved consecutively, 
+    By default *warm-starting* is always enabled.
+    For example, if two problems that utilize the same variables are solved consecutively,
     the second one will be automatically warm-started by the solution of the first one.
-
-    That is because the variables are always linked to their respective data vectors. 
-
-    If one wants to avoid this, the optimization variables needs to be manually re-initialized 
+    That is because the variables are always linked to their respective data vectors.
+    If one wants to avoid this, the optimization variables needs to be manually re-initialized
     before solving the second problem e.g. to a vector of zeros: `~x .= 0.0`.
 
 
 ## Specifying solver and options
 
 As shown above it is possible to choose the type of algorithm and specify its options by creating a `Solver` object.
-
 Currently, the following algorithms are supported:
 
 * *Proximal Gradient (PG)* [[1]](http://www.mit.edu/~dimitrib/PTseng/papers/apgm.pdf), [[2]](http://epubs.siam.org/doi/abs/10.1137/080716542)
@@ -39,8 +35,7 @@ PANOC
 
 ## Build and solve
 
-The macro [`@minimize`](@ref) automatically parse and solve the problem. 
-
+The macro [`@minimize`](@ref) automatically parse and solve the problem.
 An alternative syntax is given by the function [`problem`](@ref) and [`solve`](@ref).
 
 ```@docs
@@ -50,12 +45,10 @@ solve
 
 It is important to stress out that the `Solver` objects created using
 the functions above ([`PG`](@ref), [`FPG`](@ref), etc.)
-specify only the type of algorithm to be used together with its options. 
-
-The actual solver 
-(namely the one of [`ProximalAlgorithms.jl`](https://github.com/kul-forbes/ProximalAlgorithms.jl)) 
-is constructed altogether with the problem formulation. 
-
+specify only the type of algorithm to be used together with its options.
+The actual solver
+(namely the one of [`ProximalAlgorithms.jl`](https://github.com/kul-forbes/ProximalAlgorithms.jl))
+is constructed altogether with the problem formulation.
 The problem parsing procedure can be separated from the solver application using the functions [`build`](@ref) and [`solve!`](@ref).
 
 ```@docs
diff --git a/docs/src/tutorial.md b/docs/src/tutorial.md
@@ -18,9 +18,7 @@ The *least absolute shrinkage and selection operator* (LASSO) belongs to this cl
 \underset{ \mathbf{x} }{\text{minimize}} \ \tfrac{1}{2} \| \mathbf{A} \mathbf{x} - \mathbf{y} \|^2+ \lambda \| \mathbf{x} \|_1.
 ```
 
-Here the squared norm $\tfrac{1}{2} \| \mathbf{A} \mathbf{x} - \mathbf{y} \|^2$ is a *smooth* function $f$ wherelse the $l_1$-norm is a *nonsmooth* function $g$.
-
-This problem can be solved with only few lines of code:
+Here the squared norm $\tfrac{1}{2} \| \mathbf{A} \mathbf{x} - \mathbf{y} \|^2$ is a *smooth* function $f$ wherelse the $l_1$-norm is a *nonsmooth* function $g$. This problem can be solved with only few lines of code:
 
 ```julia
 julia> using StructuredOptimization
@@ -53,16 +51,12 @@ julia> ~x                             # inspect solution
 
 
 It is possible to access to the solution by typing `~x`.
-
 By default variables are initialized by `Array`s of zeros.
-
 Different initializations can be set during construction `x = Variable( [1.; 0.; ...] )` or by assignement `~x .= [1.; 0.; ...]`.
 
 ## Constrained optimization
 
-Constrained optimization is also encompassed by the [Standard problem formulation](@ref):
-
-for a nonempty set $\mathcal{S}$ the constraint of
+Constrained optimization is also encompassed by the [Standard problem formulation](@ref): for a nonempty set $\mathcal{S}$ the constraint of
 
 ```math
 \begin{align*}
@@ -81,9 +75,7 @@ g(\mathbf{x}) = \delta_{\mathcal{S}} (\mathbf{x}) =  \begin{cases}
 ```
 
 to obtain the standard form. Constraints are treated as *nonsmooth functions*.
-
 This conversion is automatically performed by `StructuredOptimization.jl`.
-
 For example, the non-negative deconvolution problem:
 
 ```math
@@ -110,28 +102,24 @@ julia> @minimize ls(conv(x,h)-y) st x >= 0.
 !!! note
 
     The convolution mapping was applied to the variable `x` using `conv`.
-
-    `StructuredOptimization.jl` provides a set of functions that can be used to apply
-    specific operators to variables and create mathematical expression.
-
-    The available functions can be found in [Mappings](@ref).
-
+    `StructuredOptimization.jl` provides a set of functions that can be
+    used to apply specific operators to variables and create mathematical
+    expression. The available functions can be found in [Mappings](@ref).
     In general it is more convenient to use these functions instead of matrices,
-    as these functions apply efficient algorithms for the forward and adjoint mappings leading to
-    *matrix free optimization*.
+    as these functions apply efficient algorithms for the forward and adjoint
+    mappings leading to *matrix free optimization*.
 
 ## Using multiple variables
 
-It is possible to use multiple variables which are allowed to be matrices or even tensors.
-
-For example a non-negative matrix factorization problem:
+It is possible to use multiple variables which are allowed to be matrices or even tensors. For example a non-negative matrix factorization problem:
 
 ```math
 \begin{align*}
 \underset{ \mathbf{X}_1, \mathbf{X}_2  }{\text{minimize}} \ &  \tfrac{1}{2} \| \mathbf{X}_1 \mathbf{X}_2 - \mathbf{Y} \| \\
 \text{subject to} \ & \mathbf{X}_1 \geq 0,  \ \mathbf{X}_2 \geq 0,
 \end{align*}
 ```
+
 can be solved using the following code:
 
 ```julia
@@ -146,57 +134,55 @@ julia> @minimize ls(X1*X2-Y) st X1 >= 0., X2 >= 0.
 
 ## Limitations
 
-Currently `StructuredOptimization.jl` supports only *Proximal Gradient (aka Forward Backward) algorithms*, which require specific properties of the nonsmooth functions and costraint to be applicable.
-
-In particular, the nonsmooth functions must lead to an *efficiently computable proximal mapping*.
+Currently `StructuredOptimization.jl` supports only *proximal gradient algorithms* (i.e., *forward-backward splitting* base), which require specific properties of the nonsmooth functions and costraint to be applicable. In particular, the nonsmooth functions must have an *efficiently computable proximal mapping*.
 
 If we express the nonsmooth function $g$ as the composition of
 a function $\tilde{g}$ with a linear operator $A$:
+
 ```math
 g(\mathbf{x}) =
 \tilde{g}(A \mathbf{x})
 ```
-then a proximal mapping of $g$ is efficiently computable if it satisifies the following properties:
 
-1. the mapping $A$ must be a *tight frame*  namely it must satisfy $A A^* = \mu Id$, where $\mu \geq 0$ and $A^*$ is the adjoint of $A$ and $Id$ is the identity operator.
+then the proximal mapping of $g$ is efficiently computable if either of the following hold:
 
-2. if $A$ is not a tight frame, than it must be possible write $g$ as a *separable* sum $g(\mathbf{x}) =  \sum_j h_j (B_j \mathbf{x}_j)$ with $\mathbf{x}_j$ being a non-overlapping slices of $\mathbf{x}$ and $B_j$ being tight frames.
+1. Operator $A$ is a *tight frame*, namely it satisfies $A A^* = \mu Id$, where $\mu \geq 0$, $A^*$ is the adjoint of $A$, and $Id$ is the identity operator.
 
-Let us analyze these rules with a series of examples.
+2. Function $g$ is the *separable sum* $g(\mathbf{x}) = \sum_j h_j (B_j \mathbf{x}_j)$, where $\mathbf{x}_j$ are non-overlapping slices of $\mathbf{x}$, and $B_j$ are tight frames.
 
+Let us analyze these rules with a series of examples.
 The LASSO example above satisfy the first rule:
+
 ```julia
 julia> @minimize ls( A*x - y ) + λ*norm(x, 1)
-
 ```
-since the non-smooth function $\lambda \| \cdot \|_1$ is not composed with any operator (or equivalently is composed with $Id$ which is a tight frame).
 
+since the non-smooth function $\lambda \| \cdot \|_1$ is not composed with any operator (or equivalently is composed with $Id$ which is a tight frame).
 Also the following problem would be accepted:
+
 ```julia
 julia> @minimize ls( A*x - y ) + λ*norm(dct(x), 1)
-
 ```
-since the discrete cosine transform (DCT) is orthogonal and is therefore a tight frame.
 
-On the other hand, the following problem
+since the discrete cosine transform (DCT) is orthogonal and is therefore a tight frame. On the other hand, the following problem
+
 ```julia
 julia> @minimize ls( A*x - y ) + λ*norm(x, 1) st x >= 1.0
-
 ```
+
 cannot be solved through proximal gradient algorithms, since the second rule would be violated.
 Here the constraint would be converted into an indicator function and the nonsmooth function $g$ can be written as the sum:
 
 ```math
 g(\mathbf{x}) =\lambda \| \mathbf{x} \|_1 + \delta_{\mathcal{S}} (\mathbf{x})
 ```
 
-which is not separable.
+which is not separable. On the other hand this problem would be accepted:
 
-On the other hand this problem would be accepted:
 ```julia
 julia> @minimize ls( A*x - y ) + λ*norm(x[1:div(n,2)], 1) st x[div(n,2)+1:n] >= 1.0
-
 ```
+
 as not the optimization variables $\mathbf{x}$ are partitioned into non-overlapping groups.
 
 !!! note