David's Blog

Living a quiet life in Coquitlam, B.C.

Location: Coquitlam, British Columbia, Canada

Friday, March 08, 2019

Multi-dimensional Root-finder - Two Nonlinear Equations in Two Unknowns

The multi-dimensional root-finder to which the title of this post links seeks a root of a system of two non-linear equations in two unknowns:

f1(x, y) = 0
f2(x, y ) = 0


where the aij and ci are constant coefficients.

The interface for the program offers fields for the input of up to sixteen coefficients (eight for each function), although it is unlikely that non-zero values for all coefficients will be needed for a given problem. Sixteen fields are included to make the program applicable to a wider range of problems than if, say, only three fields for each were provided. In other words, instead of being restricted to a function definition such as
f1 = a00 x + a01 xy + a02 x², users are given the capability to form many more definitions of f1. The same goes for f2. In fact, the system presented is just one example of the type of problem that can be solved by the underlying computer routine. The underlying routine can be used to seek the roots of many different systems of N Non-linear Equations in N Unknowns.

The routine underlying the program is a translation of the Fortran routine SNSQE, written by K. L. Hiebert (SNLA), which is posted on the SLATEC site as SNSQE. SNSQE was translated into C++ and then into JavaScript for execution on a web page. It should be noted that this version is not a verbatim transliteration of Fortran into C++. Perhaps the most significant difference between the two programs is the definition of the Jacobian Matrix: the Fortran program uses a Jacobian Matrix whose elements are defined as Jij = ∂fi / ∂xj whereas this C++ version defines the elements as Jij = ∂fj / ∂xi (i.e., the transpose. More on that below.)

In addition, several loops were removed, operations taken out of loops, and several variables eliminated (e.g., SING, NJEV, ML, MU, MODE, FACTOR, NPRINT, DIAG.) SING is a Boolean variable that is set to TRUE if a particular array element is zero--but it is never used. It neither influences program flow nor is it passed out of the routine as a parameter. So it was removed from this C++ version of the program and all references to it deleted. NJEV keeps track of how many times the Jacobian matrix is computed, but it is not used so it, too, was eliminated. And SNSQE passes many variables into sub-routines with the same--unchanging--values, so these values were hard-coded into the program and the variables, themselves, deleted. Overall, I tried to tighten up the code.

My C++ translation of SNSQE is posted on GitHub, incorporated into a main program for a similar system.

Solving Systems of Non-linear Equations

A few previous blog posts have examined root-finding for single non-linear equations in one variable, for example, Kepler's Equation of Elliptical Motion, or solving the equations cos(x) = x or tan(x) = x, to name a few. The equations are rearranged such that all terms are on one side of the equation and equated to 0 (i.e., tan(x) - x = 0.) The task then is to find a zero, or root, of the equation. Note that in some cases, like
tan(x) - x = 0, multiple solutions exist. Because the function tan(x) is periodic, its plot intersects the line f(x) = x an infinite number of times. So, if a particular point of intersection is sought, a specific interval of examination needs to be input to the computer program--or it may converge to the wrong zero.

The same concepts that are employed to solve non-linear equations in one dimension are also applicable to solving systems of non-linear equations. And the same caveats apply:

      (i) the program may not find a solution, or

      (ii) if the system has multiple solutions, the program might converge to one that is not being sought. Generally, if a user knows approximately where the desired root exists, a point close by is supplied to the computer program as the starting point with the intent that the program will converge to the desired root.

The process of solving multi-dimensional systems is rather complicated and involves many steps; the entire process will not be reviewed in this post. However, I would like to briefly elaborate on the computation of the Jacobian Matrix. Algorithms for solving non-linear equations in one dimension often use the derivative of the function. To solve multi-dimensional non-linear systems, the Jacobian Matrix, which is made up of the partial derivatives of the functions with respect to (w.r.t.) each variable, plays an analogous role. In the akiti.ca program, the Jacobian matrix is computed as Jij = ∂fj / ∂xi:

This definition of the Jacobian Matrix is the transpose of most other definitions, which define it as Jij = ∂fi / ∂xj. However, C++ and JavaScript use row-major memory allocation, so defining the Jacobian in this way allowed certain blocks of code in the program to be simplified.

Also note that, for human readers, systems are often described in terms of x, y, z, w, etc. However, when converting the systems to matrix formats, especially for use within computer programs, the independent variables are instead referred to as x1, x2, x3, x4, etc. and saved in an array, [x]T = [x1, x2, x3, x4, ...].

In general, there are two approaches for computing the Jacobian Matrix:
(i) differentiating the functions to yield analytic expressions, and
(ii) computing discrete partial derivatives.

Option 1 - Differentiating the Functions

Consider an example. Say a system is described by the following equations:

0.3 * exp(0.2 * x) + 0.5 * x * sin(2.0 * y) + 0.75 * cos (z) = 0.739
0.333 * x2 * z - cos(0.95 * y) - 1.5 * exp(-0.25 * z) = 1.5
tanh(x) - 0.5 * sin(y) * z + exp(-0.5 * z) = 4.5

This system would be rearranged so that each equation is equal to 0:

f1 = 0.3 * exp(0.2 * x) + 0.5 * x * sin(2.0 * y) + 0.75 * cos (z) - 0.739 = 0
f2 = 0.333 * x2 * z - cos(0.95 * y) - 1.5 * exp(-0.25 * z) - 1.5 = 0
f3 = tanh(x) - 0.5 * sin(y) * z + exp(-0.5 * z) - 4.5 = 0

The system now represents a problem of finding a root of three non-linear equations in three unknowns.

The Jacobian Matrix for this system is formed by differentiating each function w.r.t. each variable:

J11 = ∂f1 / ∂x1 = ∂f1 / ∂x = 0.3 * 0.2 * exp(0.2 * x) + 0.5 * sin(2.0 * y)

J12 = ∂f2 / ∂x1 = ∂f2 / ∂x = 2 * 0.333 * x * z

J13 = ∂f3 / ∂x1 = ∂f3 / ∂x = sech2(x)


J21 = ∂f1 / ∂x2 = ∂f1 / ∂y = 2.0 * 0.5 * x * cos(2.0 * y)

J22 = ∂f2 / ∂x2 = ∂f2 / ∂y = 0.95 * sin(0.95 * y)

J23 = ∂f3 / ∂x2 = ∂f3 / ∂y = -0.5 * cos(y) * z

J31 = ∂f1 / ∂x3 = ∂f1 / ∂z = -0.75 * sin(z)

J32 = ∂f2 / ∂x3 = ∂f2 / ∂z = 0.333 * x2 + 1.5 * 0.25 * exp(-0.25 * z)

J33 = ∂f3 / ∂x3 = ∂f3 / ∂z = -0.5 * sin(y) - 0.5 * exp(-0.5 * z)

In this example, explicit expressions for the partial derivatives of each function w.r.t. each independent variable are easily found. Writing a small sub-routine that computes each Jij at a given (x, y, z) point to form the complete Jacobian Matrix would be simple.

However, some systems may be comprised of functions whose partial derivatives are not found easily, if at all. Or the partial derivatives may be so expensive, computationally, to calculate that it is easier to compute discrete partial derivatives instead.

Option 2 - Computing Discrete Partial Derivatives

Continuing with the present example, say we have values for the present xi points and the corresponding values of the functions, fi:

[x0]T = [x0, y0, z0]
[F0]T = [f10, f20, f30]

To compute discrete partial derivatives, we basically ask the question, how does a small change of each independent variable xi affect each function fi? In this case, we will compute the discrete partial derivatives by Forward Difference Approximation. (There are other methods for computing discrete derivatives.)

To compute the first row of the Jacobian Matrix, the x independent variable is incremented a small amount Δx, the new values of the functions are calculated, and then the slopes between the new points and the original points are computed.
(Δx is usually scaled according to x0:   Δx = abs(x0) * √DBL_EPSILON.)

xnew = x0 + Δx

f1new = f1(xnew, y0, z0)

J11 = (f1new - f10) / Δx

This process is repeated for f2 and f3:

f2new = f2(xnew, y0, z0)

J12 = (f2new - f20) / Δx

f3new = f3(xnew, y0, z0)

J13 = (f3new - f30) / Δx

At the point [x0, y0, z0], we now have an approximate rate of change w.r.t. x for each function. The process is then repeated for y and z, to yield the complete Jacobian Matrix.

Similar to Option 1, the block of code controlling this process can be written into a sub-routine that loops through each variable xi, makes the appropriate increment Δxi, calls the sub-routine that evaluates each fi, computes the slopes, and returns the full Jacobian. Note that this technique requires the computation of the system N times (assuming a System of N Non-linear Equations in N Unknowns) each time the Jacobian Matrix is computed. In general, Option 1 is more accurate and executes faster. However, Option 2 offers the convenience of being more general (i.e., if you don't know what the function is ahead of time, or the program uses one of several functions.)

Labels: , , , , , ,