Florian Oswald Sciences Po, 2019
1. Problem Specification
2. Initial Design
3. Optimization Proceedure:
a) Evaluate Performance
b) Good?
i. yes: final design
ii. no:
* Change design
* go back to a)
We want to automate step 3.
We can define first and second order necessary conditions, FONC and SONC. This definition is to point out that those conditions are not sufficient for optimality (only necessary).
using Plots
gr() # used to choose plotlyjs backend, but does not survive html export...
v=collect(range(-2,stop = 2, length = 30)) # values
mini = [x^2 + y^2 for x in v, y in v]
maxi = -mini # max is just negative min
saddle = [x^2 + y^3 for x in v, y in v];
surface(v,v,maxi,title="local max",fillalpha=0.5,leg=false,fillcolor=:heat)
surface(v,v,mini,title="local min",fillalpha=0.5,leg=false,fillcolor=:heat)
surface(v,v,saddle,title="saddle",fillalpha=0.7,leg=false,fillcolor=:heat)
A well-known test function for numerical optimization algorithms is the Rosenbrock banana function developed by Rosenbrock in 1960. it is defined by
$$ f(\mathbf{x}) = (1-x_1)^2 + 5(x_2-x_1^2)^2 $$# let's get a picture of this
rosenbrock(x; a=1, b=5) = (a-x[1])^2 + b*(x[2] - x[1]^2)^2
x=y=collect(range(-2,stop = 2, length = 100)) # x and y axis
f = [rosenbrock([ix,iy]) for ix in x, iy in y] # f evaluations
# plotting
wireframe(x,y,f,linecolor=:grey)
surface!(x,y,f,fillcolor=:darkrainbow,colorbar=false)
using SymEngine
x = symbols("x");
f = x^2 + x/2 - sin(x)/x; diff(f, x)
For example, let's compute it for $f(\mathbf{x}) = x_1 x_2$ at $\mathbf{x} = [2, 0]$ in direction $\mathbf{s}=[-1,-1]$
$$ \begin{aligned} \nabla f(\mathbf{x}) & = \left[ ,\frac{\partial f(\mathbf{x})}{\partial x_1},\frac{\partial f(\mathbf{x})}{\partial x_2} \right] = [x_2,x_1 ] \\ \nabla_\mathbf{s} f(\mathbf{x}) & = \nabla f(\mathbf{x})^\top \mathbf{s} = \left[\begin{array}{cc}0& 2\end{array}\right] \left[\begin{array}{c}-1\\-1\end{array} \right] = -2 \end{aligned} $$
The idea here is to literally take our definition for a derivative from above, and compute it for small $h$: $$ f'(x) \approx \underbrace{\frac{f(x+h)-f(x)}{h}}_{\text{forward diff}}= \underbrace{\frac{f(x+h/2)-f(x-h/2)}{h}}_{\text{central diff}}=\underbrace{\frac{f(x)-f(x-h)}{h}}_{\text{backward diff}}$$
eps()
.# the Calculus.jl package implements finite differences
using Calculus
derivative(x->x^2,1.0) # standard signature of function
println("forward = $(Calculus.finite_difference(x->x^2,1.0,:forward))")
println("central = $(Calculus.finite_difference(x->x^2,1.0,:central))")
println("complex = $(Calculus.finite_difference(x->x^2,1.0,:complex))")
println("")
println("forward = $(Calculus.finite_difference( x->sin(x^2) ,π/2,:forward))")
println("central = $(Calculus.finite_difference( x->sin(x^2) ,π/2,:central))")
println("complex = $(Calculus.finite_difference( x->sin(x^2) ,π/2,:complex))")
# also can compute gradients for multidim functions
Calculus.gradient(x->x[1]^2 * exp(3x[2]),ones(2))
Calculus.hessian(x->x[1]^2 * exp(3x[2]),ones(2))
# there is another problem apart from numerical issues with small h:
f1 = function(x)
println("evaluation of f1")
x[1]^2 * exp(3x[2])
end
Calculus.gradient(f1,ones(2))
# for an f that is expensive to compute, this method quickly becomes infeasible.
code
that defines a function and performs elementary differentiation rules, after disecting expressions via the chain rule:
$$\frac{d}{dx}f(g(x)) = \frac{df}{dg}\frac{dg}{dx}$$Consider the function $f(x,y) = \ln(xy + \max(x,2))$. Let's get the partial derivative wrt $x$:
$$ \begin{aligned} \frac{\partial f}{\partial x} &= \frac{1}{xy + \max(x,2)} \frac{\partial}{\partial x}(xy + \max(x,2)) \\ &= \frac{1}{xy + \max(x,2)} \left[\frac{\partial(xy)}{\partial x} + \frac{\partial\max(x,2)}{\partial x} \right]\\ &= \frac{1}{xy + \max(x,2)} \left[\left(y\frac{\partial(x)}{\partial x} + x\frac{\partial(y)}{\partial x}\right) + \left(\mathbf{1}(2>x)\frac{\partial 2}{\partial x} + \mathbf{1}(2<x)\frac{\partial x}{\partial x} \right)\right] \\ &= \frac{1}{xy + \max(x,2)} \left[y + \mathbf{1}(2<x)\right] \end{aligned} $$where the indicator function $\mathbf{1}(r)=1$ if $r$ evaluates to true, 0 otherwise.
This can be illustrated in a call graph as below:
QUESTION: WHY HAVE WE NOT BEEN DOING THIS FOR EVER?! ANSWER: Because it was tedious.
What do you need to implement AD?
We need what is called dual numbers. This is similar to complex numbers, in that each number has 2 components: a standard value, and a derivative
This is what it takes to define a Dual
number type in julia:
struct Dual
v
∂
end
Base.:+(a::Dual, b::Dual) = Dual(a.v + b.v, a.∂ + b.∂)
Base.:*(a::Dual, b::Dual) = Dual(a.v * b.v, a.v*b.∂ + b.v*a.∂)
Base.log(a::Dual) = Dual(log(a.v), a.∂/a.v)
function Base.max(a::Dual, b::Dual)
v = max(a.v, b.v)
∂ = a.v > b.v ? a.∂ : a.v < b.v ? b.∂ : NaN
return Dual(v, ∂)
end
function Base.max(a::Dual, b::Int)
v = max(a.v, b)
∂ = a.v > b ? a.∂ : a.v < b ? 1 : NaN
return Dual(v, ∂)
end
# ForwardDiff.jl is a julia package for ... Forward AD!
using ForwardDiff
x = ForwardDiff.Dual(3,1);
y = ForwardDiff.Dual(2,0);
log(x*y + max(x,2))
Expr
ession:mutable struct Expr <: Any
Fields:
head :: Symbol
args :: Array{Any,1}
typ :: Any
println("create an explicit expression by `quoting` it with `:`")
expr = :(x + y)
println("typeof(expr)=$(typeof(expr))")
println("\ncan evaluate an expression")
x = 2;y=3
println(eval(expr))
println("\nand we can pick it apart:")
println("expr.head=$(expr.head)")
println("expr.args=$(expr.args)")
# our example was
ex = :(log(x*y + max(x,2)))
# we can access every piece of the call graph, e.g.
println("the first elemt of args is $(ex.args[1])")
println("let's dump the entire callgraph")
dump(ex)
Julia
¶http://www.juliaopt.org
NLopt.jl
. One could use NLopt
without constraints in an unconstrained problem.Roots.jl
: Simple algorithms that find the zeros of a univariate function.Optim.jl
# let's optimize rosenbrock's function without any gradient/hessian info:
using Optim
rosenbrock(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
result = optimize(rosenbrock, zeros(2), BFGS())
function g!(G, x)
G[1] = -2.0 * (1.0 - x[1]) - 400.0 * (x[2] - x[1]^2) * x[1]
G[2] = 200.0 * (x[2] - x[1]^2)
end
function h!(H, x)
H[1, 1] = 2.0 - 400.0 * x[2] + 1200.0 * x[1]^2
H[1, 2] = -400.0 * x[1]
H[2, 1] = -400.0 * x[1]
H[2, 2] = 200.0
end
optimize(rosenbrock, g!, h!, zeros(2), Newton())