The multi-dimensional optimization routine to which the title of this post links seeks a minimum of a multivariate polynomial of four independent variables with no constraints.
F = F(w, x, y, z)
The interface for the routine offers fields for the input of up to twenty coefficients, although it is unlikely that non-zero values for all coefficients will be needed for a given problem. Twenty fields are included to make the program applicable to a wider range of problems than if, say, only five fields were provided. In other words, instead of being restricted to a function definition such as F = A w² + B x² + C y² + D z² + E wxyz, users are given the capability to form many more definitions of F. In fact, the function presented is just one example of the type of problem that can be solved by the underlying computer sub-routine. The underlying routine can be used to minimize a function with multiple independent variables defined by polynomials, transcendental operators, a combination of the two, or defined in yet some other way. The routine can be applied to any problem that fits the program's purpose: to seek a minimum of a multi-dimensional function with no constraints.
The program correctly finds the optimal solution x* = [1, 1, 1, ... , 1]T, at which point the function value is F(x*) = 1.
Some Optimization Background
In the field of mathematics and computer programming, optimization refers to the minimization or maximization of a function, F, often called the Objective Function.
In practice, most computer algorithms designed for this task seek a minimum of a function; however, these programs are equally effective for seeking a maximum of a function because maximizing F is the same as minimizing -F.
In the case of one independent variable (i.e., one dimension), many techniques have been developed. In the simplest cases, techniques of calculus may be applied: the function may be differentiated with respect to the independent variable, the derivative set to zero, and the resulting equation solved for the solution.
Consider an example. Say f = -2 x ² + 3 x + 4
A portion of this function is shown below:
Differentiating this function: df/dx = -4 x + 3
Setting the derivative equal to 0: 4x = 3
Solving yields the result: x = 0.75
So, this function takes its maximum value at x = 0.75.
At this point, f = f(0.75) = 5.125
We have now determined where this function takes its maximum value, and its value at this point.
Most optimization problems cannot be solved so easily. An analytic derivative of the function may not exist. Or, if it does, it may not be possible to analytically solve it for a zero. So, many sophisticated techniques have been developed to numerically find the minimum of a function using iterative methods.
When a function is defined by more than one independent variable (i.e., multi-dimensional), the problem becomes more complicated. Some of the basic concepts from the one dimensional case can be transferred over to multi-dimensional cases, but other factors come into play.
Much information is available on the topic of optimization, so won't be repeated here, but I will mention two of my favorite first visits when seeking information on mathematics topics: Wikipedia and MathWorld. Instead, I will highlight some differences between the original FORTRAN version of UNCMND and my versions.
1) The options for providing an analytic Gradient Vector and Hessian Matrix have been removed.
Say the Objective Function is one of three dimensions, e.g.,
f = f(x1, x2, x3) = x²1 + 2 x1x2 - 0.5 x1 x²3 - 1.6 x²2x3
In this case, partial derivatives of this function could be found analytically by the standard definition of the derivative of f:
The Gradient Vector would be
Gradient Vector = [∂f/x1, ∂f/x2, ∂f/x3,]T
with each gradient component found as follows:
∂f/x1 = 2x1 + 2 x2 - 0.5 x²3
∂f/x2 = 2 x1 - 2 * 1.6 x2x3
∂f/x3 = -2 * 0.5 x1 x3 - 1.6 x²2
However, I decided that the underlying routine will rarely be applied to functions with analytic derivatives available--and at low computational cost. In general, the Objective Function is not known ahead of time; it may or may not have an analytic derivative. If it does, a new sub-routine for that particular function would have to be supplied by the user: written and debugged. To keep the routine as generic as possible, the option of computing the Gradient Vector by analytic derivatives was removed; instead, it is computed by finite difference only.
Likewise, the Hessian Matrix is defined by
Again, instead of requiring users to input another sub-routine (if an analytic definition were to exist), the Hessian Matrix is computed by secant updates.
2) In the original FORTRAN code, one of the possible values the error code INFO can return with is 0: "INFO = 0: Optimal solution found." However, I saw no way for the algorithm to actually return with INFO having a zero value, so I removed mention of the INFO = 0 option.
3) A few Boolean variables were added to the program to avoid some computations and speed up execution. For example, one Boolean variable keeps track of whether program execution is in the first loop or not. Since the Hessian Matrix is computed by secant updates, and initialized to the Identity Matrix, the first time through the main loop several vectors can be assigned explicitly; computations through a Forward Solve and Backward Solve can be skipped.
4) Some computations were taken out of loops. New constants were declared and passed into routines to avoid computing the same quantities repeatedly.