7. Gradient-based optimization#

7.1. Linear vs Nonlinear#

In general, optimization problems look like

\begin{align*} \mini &f(x) \\ \stf &g_i(x)\leq 0, i=1,\dots,m \\ &h_i(x)=0, i=1,\dots,l \\ &x\in X. \end{align*}

Depending on the properties of \(f, g, h\), one can use linear programming or MILP methods. In this page, we will focus on problems where \(f\) is nonlinear, and how one can optimize such functions in the presence of constraints or not.

Even if \(f\) is not linear, we will assume some structure about it, specifically differentiability. In absence of any structure, efficient optimization may be very difficult (but not necessarily impossible, via various derivative-free methods like genetic algorithms).

Intuitively, a function is differentiable when a linear function can approximate it well in any given locality. In the case of scalar-valued functions, this means lines.

Or in the case of two variables, the linear function becomes a plane.

Not every function is differentiable, and there can be different reasons why. On the left plot below, we have a function that is undefined below \(x<0\), so it is not well-defined how one could approximate it at the point \(x=0\). On the right, the function is continuous, but the approximation at \(x=0\) changes significantly if one is approaching from the left or the right.

Differentiability of a function \(f\) is important because it allows one access to its gradient \(\nabla f\), which at any point gives the direction (and rate) of fastest increase. This is very useful information when one is maximizing an objective function. When minimizing, we can still use the gradient since \(g=-f\) will have \(\nabla g=-\nabla f\), implying that the negative gradient is the direction of steepest descent.

7.2. Convex vs Nonconvex#

The usefulness of the gradient begs a question. Suppose we are maximizing a function \(f\) and we are at a point \(x\) such that \(\nabla f(x)=0\). What does this mean? Since the direction of fastest increase is 0, is \(x\) the maximum we were seeking?

The answer to this question depends on whether the problem in (7.1) is convex or not. A convex problem is one where the objective is a convex function and the feasible region imposed by the constraints is a convex set.

TODO: Add explanation/illustration

Convexity effectively means that local information can be used globally and that every local optimum is also a global optimum.

7.3. Gradient descent and variants#

Adapted from Emilien Dupont’s code..

7.4. Newton’s Method#

Need to debug BFGS

7.5. Playground#

WIP

Considerations:

security: This is all client-side so I expect it should be fine to evaluate arbitrary input from users. The only concern I can imagine is that if the expression is for some reason very complicated, it could take up too much CPU and freeze. This may be prevented with something like this.
domain: The function domain is right now hardcoded to [-2,2]**2. I think it may be annoying to make it

Enter a math expression: