Florian Oswald Sciences Po, 2019
1. Problem Specification
2. Initial Design
3. Optimization Proceedure:
a) Evaluate Performance
b) Good?
i. yes: final design
ii. no:
* Change design
* go back to a)
We want to automate step 3.
We can define first and second order necessary conditions, FONC and SONC. This definition is to point out that those conditions are not sufficient for optimality (only necessary).
using Plots
gr() # used to choose plotlyjs backend, but does not survive html export...
v=collect(range(-2,stop = 2, length = 30)) # values
mini = [x^2 + y^2 for x in v, y in v]
maxi = -mini # max is just negative min
saddle = [x^2 + y^3 for x in v, y in v];
surface(v,v,maxi,title="local max",fillalpha=0.5,leg=false,fillcolor=:heat)
surface(v,v,mini,title="local min",fillalpha=0.5,leg=false,fillcolor=:heat)
surface(v,v,saddle,title="saddle",fillalpha=0.7,leg=false,fillcolor=:heat)
A well-known test function for numerical optimization algorithms is the Rosenbrock banana function developed by Rosenbrock in 1960. it is defined by
$$ f(\mathbf{x}) = (1-x_1)^2 + 5(x_2-x_1^2)^2 $$# let's get a picture of this
rosenbrock(x; a=1, b=5) = (a-x[1])^2 + b*(x[2] - x[1]^2)^2
x=y=collect(range(-2,stop = 2, length = 100)) # x and y axis
f = [rosenbrock([ix,iy]) for ix in x, iy in y] # f evaluations
# plotting
wireframe(x,y,f,linecolor=:grey)
surface!(x,y,f,fillcolor=:darkrainbow,colorbar=false)
using SymEngine
x = symbols("x");
f = x^2 + x/2 - sin(x)/x; diff(f, x)
1/2 + 2*x + sin(x)/x^2 - cos(x)/x
For example, let's compute it for $f(\mathbf{x}) = x_1 x_2$ at $\mathbf{x} = [2, 0]$ in direction $\mathbf{s}=[-1,-1]$
$$ \begin{aligned} \nabla f(\mathbf{x}) & = \left[ ,\frac{\partial f(\mathbf{x})}{\partial x_1},\frac{\partial f(\mathbf{x})}{\partial x_2} \right] = [x_2,x_1 ] \\ \nabla_\mathbf{s} f(\mathbf{x}) & = \nabla f(\mathbf{x})^\top \mathbf{s} = \left[\begin{array}{cc}0& 2\end{array}\right] \left[\begin{array}{c}-1\\-1\end{array} \right] = -2 \end{aligned} $$
The idea here is to literally take our definition for a derivative from above, and compute it for small $h$: $$ f'(x) \approx \underbrace{\frac{f(x+h)-f(x)}{h}}_{\text{forward diff}}= \underbrace{\frac{f(x+h/2)-f(x-h/2)}{h}}_{\text{central diff}}=\underbrace{\frac{f(x)-f(x-h)}{h}}_{\text{backward diff}}$$
eps()
.# the Calculus.jl package implements finite differences
using Calculus
derivative(x->x^2,1.0) # standard signature of function
println("forward = $(Calculus.finite_difference(x->x^2,1.0,:forward))")
println("central = $(Calculus.finite_difference(x->x^2,1.0,:central))")
println("complex = $(Calculus.finite_difference(x->x^2,1.0,:complex))")
println("")
println("forward = $(Calculus.finite_difference( x->sin(x^2) ,π/2,:forward))")
println("central = $(Calculus.finite_difference( x->sin(x^2) ,π/2,:central))")
println("complex = $(Calculus.finite_difference( x->sin(x^2) ,π/2,:complex))")
forward = 2.000000014901161 central = 1.9999999999829379 complex = 2.0 forward = -2.45424963163794 central = -2.4542495409833656 complex = -2.4542495411512917
# also can compute gradients for multidim functions
Calculus.gradient(x->x[1]^2 * exp(3x[2]),ones(2))
Calculus.hessian(x->x[1]^2 * exp(3x[2]),ones(2))
2×2 Array{Float64,2}: 40.171 120.513 120.513 180.77
# there is another problem apart from numerical issues with small h:
f1 = function(x)
println("evaluation of f1")
x[1]^2 * exp(3x[2])
end
Calculus.gradient(f1,ones(2))
# for an f that is expensive to compute, this method quickly becomes infeasible.
evaluation of f1 evaluation of f1 evaluation of f1 evaluation of f1
2-element Array{Float64,1}: 40.17107384604091 60.25661077199484
code
that defines a function and performs elementary differentiation rules, after disecting expressions via the chain rule:
$$\frac{d}{dx}f(g(x)) = \frac{df}{dg}\frac{dg}{dx}$$Consider the function $f(x,y) = \ln(xy + \max(x,2))$. Let's get the partial derivative wrt $x$:
$$ \begin{aligned} \frac{\partial f}{\partial x} &= \frac{1}{xy + \max(x,2)} \frac{\partial}{\partial x}(xy + \max(x,2)) \\ &= \frac{1}{xy + \max(x,2)} \left[\frac{\partial(xy)}{\partial x} + \frac{\partial\max(x,2)}{\partial x} \right]\\ &= \frac{1}{xy + \max(x,2)} \left[\left(y\frac{\partial(x)}{\partial x} + x\frac{\partial(y)}{\partial x}\right) + \left(\mathbf{1}(2>x)\frac{\partial 2}{\partial x} + \mathbf{1}(2<x)\frac{\partial x}{\partial x} \right)\right] \\ &= \frac{1}{xy + \max(x,2)} \left[y + \mathbf{1}(2<x)\right] \end{aligned} $$where the indicator function $\mathbf{1}(r)=1$ if $r$ evaluates to true, 0 otherwise.
This can be illustrated in a call graph as below:
QUESTION: WHY HAVE WE NOT BEEN DOING THIS FOR EVER?! ANSWER: Because it was tedious.
What do you need to implement AD?
We need what is called dual numbers. This is similar to complex numbers, in that each number has 2 components: a standard value, and a derivative
This is what it takes to define a Dual
number type in julia:
struct Dual
v
∂
end
Base.:+(a::Dual, b::Dual) = Dual(a.v + b.v, a.∂ + b.∂)
Base.:*(a::Dual, b::Dual) = Dual(a.v * b.v, a.v*b.∂ + b.v*a.∂)
Base.log(a::Dual) = Dual(log(a.v), a.∂/a.v)
function Base.max(a::Dual, b::Dual)
v = max(a.v, b.v)
∂ = a.v > b.v ? a.∂ : a.v < b.v ? b.∂ : NaN
return Dual(v, ∂)
end
function Base.max(a::Dual, b::Int)
v = max(a.v, b)
∂ = a.v > b ? a.∂ : a.v < b ? 1 : NaN
return Dual(v, ∂)
end
# ForwardDiff.jl is a julia package for ... Forward AD!
using ForwardDiff
x = ForwardDiff.Dual(3,1);
y = ForwardDiff.Dual(2,0);
log(x*y + max(x,2))
Dual{Nothing}(2.1972245773362196,0.3333333333333333)
Expr
ession:mutable struct Expr <: Any
Fields:
head :: Symbol
args :: Array{Any,1}
typ :: Any
println("create an explicit expression by `quoting` it with `:`")
expr = :(x + y)
println("typeof(expr)=$(typeof(expr))")
println("\ncan evaluate an expression")
x = 2;y=3
println(eval(expr))
println("\nand we can pick it apart:")
println("expr.head=$(expr.head)")
println("expr.args=$(expr.args)")
create an explicit expression by `quoting` it with `:` typeof(expr)=Expr can evaluate an expression 5 and we can pick it apart: expr.head=call expr.args=Any[:+, :x, :y]
# our example was
ex = :(log(x*y + max(x,2)))
# we can access every piece of the call graph, e.g.
println("the first elemt of args is $(ex.args[1])")
println("let's dump the entire callgraph")
dump(ex)
the first elemt of args is log let's dump the entire callgraph Expr head: Symbol call args: Array{Any}((2,)) 1: Symbol log 2: Expr head: Symbol call args: Array{Any}((3,)) 1: Symbol + 2: Expr head: Symbol call args: Array{Any}((3,)) 1: Symbol * 2: Symbol x 3: Symbol y 3: Expr head: Symbol call args: Array{Any}((3,)) 1: Symbol max 2: Symbol x 3: Int64 2
Julia
¶http://www.juliaopt.org
NLopt.jl
. One could use NLopt
without constraints in an unconstrained problem.Roots.jl
: Simple algorithms that find the zeros of a univariate function.Optim.jl
# let's optimize rosenbrock's function without any gradient/hessian info:
using Optim
rosenbrock(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
result = optimize(rosenbrock, zeros(2), BFGS())
Results of Optimization Algorithm * Algorithm: BFGS * Starting Point: [0.0,0.0] * Minimizer: [0.9999999926033423,0.9999999852005353] * Minimum: 5.471433e-17 * Iterations: 16 * Convergence: true * |x - x'| ≤ 0.0e+00: false |x - x'| = 3.47e-07 * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false |f(x) - f(x')| = 1.20e+03 |f(x)| * |g(x)| ≤ 1.0e-08: true |g(x)| = 2.33e-09 * Stopped by an increasing objective: false * Reached Maximum Number of Iterations: false * Objective Calls: 53 * Gradient Calls: 53
function g!(G, x)
G[1] = -2.0 * (1.0 - x[1]) - 400.0 * (x[2] - x[1]^2) * x[1]
G[2] = 200.0 * (x[2] - x[1]^2)
end
function h!(H, x)
H[1, 1] = 2.0 - 400.0 * x[2] + 1200.0 * x[1]^2
H[1, 2] = -400.0 * x[1]
H[2, 1] = -400.0 * x[1]
H[2, 2] = 200.0
end
optimize(rosenbrock, g!, h!, zeros(2), Newton())
Results of Optimization Algorithm * Algorithm: Newton's Method * Starting Point: [0.0,0.0] * Minimizer: [0.9999999999999994,0.9999999999999989] * Minimum: 3.081488e-31 * Iterations: 14 * Convergence: true * |x - x'| ≤ 0.0e+00: false |x - x'| = 3.06e-09 * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false |f(x) - f(x')| = 3.03e+13 |f(x)| * |g(x)| ≤ 1.0e-08: true |g(x)| = 1.11e-15 * Stopped by an increasing objective: false * Reached Maximum Number of Iterations: false * Objective Calls: 44 * Gradient Calls: 44 * Hessian Calls: 14